Interpreting tree ensembles with inTrees

Deng, Houtao

doi:10.1007/s41060-018-0144-8

Interpreting tree ensembles with inTrees

Regular Paper
Published: 11 July 2018

Volume 7, pages 277–287, (2019)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Houtao Deng ORCID: orcid.org/0000-0003-3194-4592¹

2304 Accesses
125 Citations
Explore all metrics

Abstract

Tree ensembles such as random forests and boosted trees are accurate but difficult to understand. In this work, we provide the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions. The inTrees framework can be applied to multiple types of tree ensembles, e.g., random forests, regularized random forests, and boosted trees. We implemented the inTrees algorithms in the “inTrees” R package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adnan, M.N., Islam, M.Z.: Forex++: a new framework for knowledge discovery from decision forests. Austral. J. Inf. Syst. https://doi.org/10.3127/ajis.v21i0.1539 (2017)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499 (1994)
Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. arXiv preprint arXiv:1706.09773 (2017)
Bastani, O., Kim, C., Bastani, H.: Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504 (2017)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)
MATH Google Scholar
Breiman, L., Shang, N.: Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report (1996)
Deng, H.: Guided random forest in the RRF package. arXiv preprint arXiv:1306.0237 (2013)
Deng, H.: Interpreting tree ensembles with in trees. arXiv preprint arXiv:1408.5456 (2014)
Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn. 46(12), 3483–3489 (2013)
Article Google Scholar
Deng, H., Runger, G., Tuv, E., Bannister, W.: CBC: An associative classifier with a small number of rules. Decis. Support Syst. 59, 163–170 (2014)
Article Google Scholar
Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 98–106. Morgan Kaufmann (1997)
Eskandarian, S., Bahrami, P., Kazemi, P.: A comprehensive data mining approach to estimate the rate of penetration: application of neural network, rule based models and feature ranking. J. Pet. Sci. Eng. 156, 605–615 (2017)
Article Google Scholar
Fokkema, M.: PRE: an R package for fitting prediction rule ensembles. arXiv preprint arXiv:1707.07149 (2017)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2, 916–954 (2008)
Article MathSciNet MATH Google Scholar
Gallego-Ortiz, C., Martel, A.L.: Using quantitative features extracted from t2-weighted MRI to improve breast MRI computer-aided diagnosis (CAD). PLoS ONE 12(11), e0187501 (2017)
Article Google Scholar
Gargett, A., Barnden, J.: Modeling the interaction between sensory and affective meanings for detecting metaphor. In: Proceedings of the Third Workshop on Metaphor in NLP, pp. 21–30 (2015)
Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., Giannotti, F.: A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)
Gurrutxaga, I., Pérez, J.M., Arbelaitz, O., Muguerza, J., Martín, J.I., Ansuategi, A.: CTC: an alternative to extract explanation from bagging. In: Conference of the Spanish Association for Artificial Intelligence, pp. 90–99. Springer (2007)
Hahsler, M., Grün, B., Hornik, K.: Introduction to a rules—mining association rules and frequent item sets. SIGKDD Explorations (2007)
Hara, S., Hayashi, K.: Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390 (2016)
Hara, S., Hayashi, K.: Making tree ensembles interpretable: a bayesian model selection approach. arXiv preprint arXiv:1606.09066 (2016)
Khalid, M.H., Tuszynski, P.K., Szlek, J., Jachowicz, R., Mendyk, A.: From black-box to transparent computational intelligence models: a pharmaceutical case study. In: 2015 13th International Conference on Frontiers of Information Technology (FIT), pp. 114–118. IEEE (2015)
Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002)
Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceeding of the 1998 International Conference on Knowledge Discovery and Data Mining, pp. 80–86. ACM (1998)
Meinshausen, N.: Node harvest. Ann. Appl. Stat. 4, 2049–2072 (2010)
Article MathSciNet MATH Google Scholar
Miraboutalebi, S.M., Kazemi, P., Bahrami, P.: Fatty acid methyl ester (FAME) composition used for estimation of biodiesel cetane number employing random forest and artificial neural networks: a new approach. Fuel 166, 143–151 (2016)
Article Google Scholar
Narayanan, I., Wang, D., Jeon, M., Sharma, B., Caulfield, L., Sivasubramaniam, A., Cutler, B., Liu, J., Khessib, B., Vaid, K.: Ssd failures in datacenters: What? when? and why? In: Proceedings of the 9th ACM International on Systems and Storage Conference, p. 7. ACM (2016)
Ridgeway, G., et al.: GBM: Generalized boosted regression models. R Package Version 1(3), 55 (2006)
MathSciNet Google Scholar
Szlęk, J., Pacławski, A., Lau, R., Jachowicz, R., Kazemi, P., Mendyk, A.: Empirical search for factors affecting mean particle size of PLGA microspheres containing macromolecular drugs. Comput. Methods Programs Biomed. 134, 137–147 (2016)
Article Google Scholar
Therneau, T.M., Atkinson, B., Ripley, B.: RPART: Recursive partitioning. R Package Version 3(3.8) (2010)
Vandewiele, G., Lannoye, K., Janssens, O., Ongenae, F., De Turck, F., Van Hoecke, S.: A genetic algorithm for interpretable model extraction from decision tree ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 104–115. Springer (2017)
Wang, X., Lin, P., Ho, J.W.: Discovery of cell-type specific dna motif grammar in cis-regulatory elements using random forest. BMC Genom. 19(1), 929 (2018)
Article Google Scholar
Zhou, Y., Hooker, G.: Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036 (2016)

Download references

Author information

Authors and Affiliations

San Francisco, USA
Houtao Deng

Authors

Houtao Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houtao Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, H. Interpreting tree ensembles with inTrees. Int J Data Sci Anal 7, 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8

Download citation

Received: 09 December 2017
Accepted: 03 July 2018
Published: 11 July 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s41060-018-0144-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interpreting tree ensembles with inTrees

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A random forest guided tour

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interpreting tree ensembles with inTrees

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A random forest guided tour

Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation