Abstract
Tree ensembles such as random forests and boosted trees are accurate but difficult to understand. In this work, we provide the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions. The inTrees framework can be applied to multiple types of tree ensembles, e.g., random forests, regularized random forests, and boosted trees. We implemented the inTrees algorithms in the “inTrees” R package.
Similar content being viewed by others
References
Adnan, M.N., Islam, M.Z.: Forex++: a new framework for knowledge discovery from decision forests. Austral. J. Inf. Syst. https://doi.org/10.3127/ajis.v21i0.1539 (2017)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499 (1994)
Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. arXiv preprint arXiv:1706.09773 (2017)
Bastani, O., Kim, C., Bastani, H.: Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504 (2017)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Breiman, L., Shang, N.: Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report (1996)
Deng, H.: Guided random forest in the RRF package. arXiv preprint arXiv:1306.0237 (2013)
Deng, H.: Interpreting tree ensembles with in trees. arXiv preprint arXiv:1408.5456 (2014)
Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn. 46(12), 3483–3489 (2013)
Deng, H., Runger, G., Tuv, E., Bannister, W.: CBC: An associative classifier with a small number of rules. Decis. Support Syst. 59, 163–170 (2014)
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 98–106. Morgan Kaufmann (1997)
Eskandarian, S., Bahrami, P., Kazemi, P.: A comprehensive data mining approach to estimate the rate of penetration: application of neural network, rule based models and feature ranking. J. Pet. Sci. Eng. 156, 605–615 (2017)
Fokkema, M.: PRE: an R package for fitting prediction rule ensembles. arXiv preprint arXiv:1707.07149 (2017)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2, 916–954 (2008)
Gallego-Ortiz, C., Martel, A.L.: Using quantitative features extracted from t2-weighted MRI to improve breast MRI computer-aided diagnosis (CAD). PLoS ONE 12(11), e0187501 (2017)
Gargett, A., Barnden, J.: Modeling the interaction between sensory and affective meanings for detecting metaphor. In: Proceedings of the Third Workshop on Metaphor in NLP, pp. 21–30 (2015)
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., Giannotti, F.: A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)
Gurrutxaga, I., Pérez, J.M., Arbelaitz, O., Muguerza, J., Martín, J.I., Ansuategi, A.: CTC: an alternative to extract explanation from bagging. In: Conference of the Spanish Association for Artificial Intelligence, pp. 90–99. Springer (2007)
Hahsler, M., Grün, B., Hornik, K.: Introduction to a rules—mining association rules and frequent item sets. SIGKDD Explorations (2007)
Hara, S., Hayashi, K.: Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390 (2016)
Hara, S., Hayashi, K.: Making tree ensembles interpretable: a bayesian model selection approach. arXiv preprint arXiv:1606.09066 (2016)
Khalid, M.H., Tuszynski, P.K., Szlek, J., Jachowicz, R., Mendyk, A.: From black-box to transparent computational intelligence models: a pharmaceutical case study. In: 2015 13th International Conference on Frontiers of Information Technology (FIT), pp. 114–118. IEEE (2015)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceeding of the 1998 International Conference on Knowledge Discovery and Data Mining, pp. 80–86. ACM (1998)
Meinshausen, N.: Node harvest. Ann. Appl. Stat. 4, 2049–2072 (2010)
Miraboutalebi, S.M., Kazemi, P., Bahrami, P.: Fatty acid methyl ester (FAME) composition used for estimation of biodiesel cetane number employing random forest and artificial neural networks: a new approach. Fuel 166, 143–151 (2016)
Narayanan, I., Wang, D., Jeon, M., Sharma, B., Caulfield, L., Sivasubramaniam, A., Cutler, B., Liu, J., Khessib, B., Vaid, K.: Ssd failures in datacenters: What? when? and why? In: Proceedings of the 9th ACM International on Systems and Storage Conference, p. 7. ACM (2016)
Ridgeway, G., et al.: GBM: Generalized boosted regression models. R Package Version 1(3), 55 (2006)
Szlęk, J., Pacławski, A., Lau, R., Jachowicz, R., Kazemi, P., Mendyk, A.: Empirical search for factors affecting mean particle size of PLGA microspheres containing macromolecular drugs. Comput. Methods Programs Biomed. 134, 137–147 (2016)
Therneau, T.M., Atkinson, B., Ripley, B.: RPART: Recursive partitioning. R Package Version 3(3.8) (2010)
Vandewiele, G., Lannoye, K., Janssens, O., Ongenae, F., De Turck, F., Van Hoecke, S.: A genetic algorithm for interpretable model extraction from decision tree ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 104–115. Springer (2017)
Wang, X., Lin, P., Ho, J.W.: Discovery of cell-type specific dna motif grammar in cis-regulatory elements using random forest. BMC Genom. 19(1), 929 (2018)
Zhou, Y., Hooker, G.: Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036 (2016)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deng, H. Interpreting tree ensembles with inTrees. Int J Data Sci Anal 7, 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-018-0144-8