Toolkit research plays an important role in the field of HCI, as it can heavily influence both the design and implementation of interactive systems. For publication, the HCI community typically expects that research to include an evaluation component. The problem is that toolkit evaluation is challenging, as it is often unclear what ‘evaluating’ a toolkit means and what methods are appropriate. To address this problem, we analyzed 68 published toolkit papers. From that analysis, we provide an overview of, reflection on, and discussion of evaluation methods for toolkit contributions. We identify and discuss the value of four toolkit evaluation strategies, including the associated techniques each employs. We offer a categorization of evaluation strategies for toolkit researchers, along with a discussion of the value, potential biases, and trade-offs associated with each strategy.