Background: Understanding factors which predict progression of renal failure is of great interest to clinicians.
Objectives: We examined machine learning methods to predict the composite outcome of death, dialysis or doubling of serum creatinine using the modification of diet in renal disease (MDRD) data set.
Methods: We specifically evaluated a generalized linear model, a support vector machine, a decision tree, a feed-forward neural network and a random forest evaluated within the context of 10 fold validation using the CARET package available within the open source architecture R program.
Results: We found that using clinical parameters available at entry into the study, these computer learning methods trained on 70% of the MDRD population had prediction accuracies ranging from 66-77% on the remaining 30%. Although the support vector machine methodology appeared to have the highest accuracy, all models studied worked relatively well.
Conclusions: These results illustrate the utility of employing machine learning methods within R to address the prediction of long term clinical outcomes using initial clinical measurements.
Conflict(s) of Interest
References with DOI
1. Levey AS, Greene T, Schluchter MD, Cleary PA, Teschan PE, Lorenz RA, et al. Glomerular filtration rate measurements in clinical trials. Modification of Diet in Renal Disease Study Group and the Diabetes Control and Complications Trial Research Group. Journal of the American Society of Nephrology : JASN. 1993;4(5):1159-71.
2. Levey AS, Gassman JJ, Hall PM, Walker WG. Assessing the progression of renal disease in clinical studies: effects of duration of follow-up and regression to the mean. Modification of Diet in Renal Disease (MDRD) Study Group. Journal of the American Society of Nephrology : JASN. 1991;1(9):1087-94.
3. Levey AS, Greene T, Sarnak MJ, Wang X, Beck GJ, Kusek JW, et al. Effect of dietary protein restriction on the progression of kidney disease: long-term follow-up of the Modification of Diet in Renal Disease (MDRD) Study. Am J Kidney Dis. 2006;48(6):879-88. https://doi.org/10.1053/j.ajkd.2006.08.023
4. Levey AS, Adler S, Caggiula AW, England BK, Greene T, Hunsicker LG, et al. Effects of dietary protein restriction on the progression of advanced renal disease in the Modification of Diet in Renal Disease Study. Am J Kidney Dis. 1996;27(5):652-63. https://doi.org/10.1016/s0272-6386(96)90099-2
5. Levey AS, Berg RL, Gassman JJ, Hall PM, Walker WG. Creatinine filtration, secretion and excretion during progressive renal disease. Modification of Diet in Renal Disease (MDRD) Study Group. Kidney international Supplement. 1989;27:S73-80.
6. Levey AS, Greene T, Beck GJ, Caggiula AW, Kusek JW, Hunsicker LG, et al. Dietary protein restriction and the progression of chronic renal disease: what have all of the results of the MDRD study shown? Modification of Diet in Renal Disease Study group. Journal of the American Society of Nephrology: JASN. 1999;10(11):2426-39.
7. CA G, MJ M, JI S, BL M. Predicting Medical Student Success on Licensure Exams. Med Sci Educ. 2015;25:447-53. https://doi.org/10.1007/s40670-015-0179-6
8. Tirelli T, Gamba M, Pessani D. Support vector machines to model presence/absence of Alburnus alburnus alborella (Teleostea, Cyprinidae) in North-Western Italy: comparison with other machine learning techniques. C R Biol. 2012;335(10-11):680-6. https://doi.org/10.1016/j.crvi.2012.09.001
9. Chen T, Cao Y, Zhang Y, Liu J, Bao Y, Wang C, et al. Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection. Evid Based Complement Alternat Med. 2013;2013:298183. https://doi.org/10.1155/2013/298183
10. Khondoker MR, Bachmann TT, Mewissen M, Dickinson P, Dobrzelecki B, Campbell CJ, et al. Multifactorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. J Bioinform Comput Biol. 2010;8(6):945-65. https://doi.org/10.1142/s0219720010005063
11. Zhang Z. A gentle introduction to artificial neural networks. Ann Transl Med. 2016;4(19):370. https://doi.org/10.21037/atm.2016.06.20
12. Tsiliki G, Munteanu CR, Seoane JA, Fernandez-Lozano C, Sarimveis H, Willighagen EL. RRegrs: an R package for computer-aided model selection with multiple regression models. J Cheminform. 2015;7:46. https://doi.org/10.1186/s13321-015-0094-2
13. Liu R, Li X, Zhang W, Zhou HH. Comparison of Nine Statistical Model Based Warfarin Pharmacogenetic Dosing Algorithms Using the Racially Diverse International Warfarin Pharmacogenetic Consortium Cohort Database. PLoS One. 2015;10(8):e0135784. https://doi.org/10.1371/journal.pone.0135784
14. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. https://doi.org/10.1186/1471-2105-12-77
15. Emir B, Johnson K, Kuhn M, Parsons B. Predictive Modeling of Response to Pregabalin for the Treatment of Neuropathic Pain Using 6-Week Observational Data: A Spectrum of Modern Analytics Applications. Clin Ther. 2017;39(1):98-106. https://doi.org/10.1016/j.clinthera.2016.11.015
16. Hengl T, Mendes de Jesus J, Heuvelink GB, Ruiperez Gonzalez M, Kilibarda M, Blagotic A, et al. SoilGrids250m: Global gridded soil information based on machine learning. PLoS One. 2017;12(2):e0169748. https://doi.org/10.1371/journal.pone.0169748
17. Gallo S, Hazell T, Vanstone CA, Agellon S, Jones G, L'Abbe M, et al. Vitamin D supplementation in breastfed infants from Montreal, Canada: 25-hydroxyvitamin D and bone health effects from a follow-up study at 3 years of age. Osteoporos Int. 2016. https://doi.org/10.1007/s00198-016-3549-z
Khitan, Zeid; Shapiro, Anna P.; Shah, Preeya T.; Sanabria, Juan R.; Santhanam, Prasanna; Sodhi, Komal; Abraham, Nader G.; and Shapiro, Joseph I.
"Predicting Adverse Outcomes in Chronic Kidney Disease Using Machine Learning Methods: Data from the Modification of Diet in Renal Disease,"
Marshall Journal of Medicine:
4, Article 10.
Available at: https://mds.marshall.edu/mjm/vol3/iss4/10