Skip to main content
Log in

Interpreting tree ensembles with inTrees

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

Tree ensembles such as random forests and boosted trees are accurate but difficult to understand. In this work, we provide the interpretable trees (inTrees) framework that extracts, measures, prunes, selects, and summarizes rules from a tree ensemble, and calculates frequent variable interactions. The inTrees framework can be applied to multiple types of tree ensembles, e.g., random forests, regularized random forests, and boosted trees. We implemented the inTrees algorithms in the “inTrees” R package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  1. Adnan, M.N., Islam, M.Z.: Forex++: a new framework for knowledge discovery from decision forests. Austral. J. Inf. Syst. https://doi.org/10.3127/ajis.v21i0.1539 (2017)

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499 (1994)

  3. Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. arXiv preprint arXiv:1706.09773 (2017)

  4. Bastani, O., Kim, C., Bastani, H.: Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504 (2017)

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)

    MATH  Google Scholar 

  7. Breiman, L., Shang, N.: Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report (1996)

  8. Deng, H.: Guided random forest in the RRF package. arXiv preprint arXiv:1306.0237 (2013)

  9. Deng, H.: Interpreting tree ensembles with in trees. arXiv preprint arXiv:1408.5456 (2014)

  10. Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn. 46(12), 3483–3489 (2013)

    Article  Google Scholar 

  11. Deng, H., Runger, G., Tuv, E., Bannister, W.: CBC: An associative classifier with a small number of rules. Decis. Support Syst. 59, 163–170 (2014)

    Article  Google Scholar 

  12. Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 98–106. Morgan Kaufmann (1997)

  13. Eskandarian, S., Bahrami, P., Kazemi, P.: A comprehensive data mining approach to estimate the rate of penetration: application of neural network, rule based models and feature ranking. J. Pet. Sci. Eng. 156, 605–615 (2017)

    Article  Google Scholar 

  14. Fokkema, M.: PRE: an R package for fitting prediction rule ensembles. arXiv preprint arXiv:1707.07149 (2017)

  15. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Friedman, J.H., Popescu, B.E.: Predictive learning via rule ensembles. Ann. Appl. Stat. 2, 916–954 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Gallego-Ortiz, C., Martel, A.L.: Using quantitative features extracted from t2-weighted MRI to improve breast MRI computer-aided diagnosis (CAD). PLoS ONE 12(11), e0187501 (2017)

    Article  Google Scholar 

  18. Gargett, A., Barnden, J.: Modeling the interaction between sensory and affective meanings for detecting metaphor. In: Proceedings of the Third Workshop on Metaphor in NLP, pp. 21–30 (2015)

  19. Guidotti, R., Monreale, A., Turini, F., Pedreschi, D., Giannotti, F.: A survey of methods for explaining black box models. arXiv preprint arXiv:1802.01933 (2018)

  20. Gurrutxaga, I., Pérez, J.M., Arbelaitz, O., Muguerza, J., Martín, J.I., Ansuategi, A.: CTC: an alternative to extract explanation from bagging. In: Conference of the Spanish Association for Artificial Intelligence, pp. 90–99. Springer (2007)

  21. Hahsler, M., Grün, B., Hornik, K.: Introduction to a rules—mining association rules and frequent item sets. SIGKDD Explorations (2007)

  22. Hara, S., Hayashi, K.: Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390 (2016)

  23. Hara, S., Hayashi, K.: Making tree ensembles interpretable: a bayesian model selection approach. arXiv preprint arXiv:1606.09066 (2016)

  24. Khalid, M.H., Tuszynski, P.K., Szlek, J., Jachowicz, R., Mendyk, A.: From black-box to transparent computational intelligence models: a pharmaceutical case study. In: 2015 13th International Conference on Frontiers of Information Technology (FIT), pp. 114–118. IEEE (2015)

  25. Liaw, A., Wiener, M.: Classification and regression by random forest. R News 2(3), 18–22 (2002)

    Google Scholar 

  26. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  27. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceeding of the 1998 International Conference on Knowledge Discovery and Data Mining, pp. 80–86. ACM (1998)

  28. Meinshausen, N.: Node harvest. Ann. Appl. Stat. 4, 2049–2072 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  29. Miraboutalebi, S.M., Kazemi, P., Bahrami, P.: Fatty acid methyl ester (FAME) composition used for estimation of biodiesel cetane number employing random forest and artificial neural networks: a new approach. Fuel 166, 143–151 (2016)

    Article  Google Scholar 

  30. Narayanan, I., Wang, D., Jeon, M., Sharma, B., Caulfield, L., Sivasubramaniam, A., Cutler, B., Liu, J., Khessib, B., Vaid, K.: Ssd failures in datacenters: What? when? and why? In: Proceedings of the 9th ACM International on Systems and Storage Conference, p. 7. ACM (2016)

  31. Ridgeway, G., et al.: GBM: Generalized boosted regression models. R Package Version 1(3), 55 (2006)

    MathSciNet  Google Scholar 

  32. Szlęk, J., Pacławski, A., Lau, R., Jachowicz, R., Kazemi, P., Mendyk, A.: Empirical search for factors affecting mean particle size of PLGA microspheres containing macromolecular drugs. Comput. Methods Programs Biomed. 134, 137–147 (2016)

    Article  Google Scholar 

  33. Therneau, T.M., Atkinson, B., Ripley, B.: RPART: Recursive partitioning. R Package Version 3(3.8) (2010)

  34. Vandewiele, G., Lannoye, K., Janssens, O., Ongenae, F., De Turck, F., Van Hoecke, S.: A genetic algorithm for interpretable model extraction from decision tree ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 104–115. Springer (2017)

  35. Wang, X., Lin, P., Ho, J.W.: Discovery of cell-type specific dna motif grammar in cis-regulatory elements using random forest. BMC Genom. 19(1), 929 (2018)

    Article  Google Scholar 

  36. Zhou, Y., Hooker, G.: Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houtao Deng.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deng, H. Interpreting tree ensembles with inTrees. Int J Data Sci Anal 7, 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0144-8

Keywords

Navigation