Background Data editing with elimination of “outliers” is commonly performed in the biomedical sciences. The effects of this type of data editing could influence study results, and with the vast and expanding amount of research in medicine, this effect would be magnified.
Methods and Results We first performed an anonymous survey of medical school faculty at institutions across the United States and found that indeed some form of outlier exclusion was performed by a large percentage of the respondents to the survey. We next performed Monte Carlo simulations of excluding high and low values from samplings from the same normal distribution. We found that removal of one pair of “outliers”, specifically removal of the high and low values of the two samplings, respectively had measurable effects on the type I error as the sample size was increased into the thousands. We developed an adjustment to the t score that accounts for the anticipated alteration of the type I error (tadj=tobs-2(log(n)^0.5/n^0.5)), and propose that this be used when outliers are eliminated prior to parametric analysis.
Conclusion Data editing with elimination of outliers that includes removal of high and low values from two samples, respectively, can have significant effects on the occurrence of type 1 error. This type of data editing could have profound effects in high volume research fields, particularly in medicine, and we recommend an adjustment to the t score be used to reduce the potential for error.
Conflict(s) of Interest
Dr. Gress, Dr. Denvir, and Dr. Shapiro have nothing to disclose.
References with DOI
1. Altman DG. The scandal of poor medical research. BMJ. 1994;308(6924):283-4. https://doi.org/10.1136/bmj.308.6924.283
2. Hanin L. Why statistical inference from clinical trials is likely to generate false and irreproducible results. BMC Med Res Methodol. 2017;17(1):127. https://doi.org/10.1186/s12874-017-0399-0
3. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124. https://doi.org/10.1371/journal.pmed.0020124
4. Wade N, Broad WJ. Betrayers of the truth. First ed: New York: Simon and Schuster; 1983.
5. Altman N, Krzywinski M. Analyzing outliers: influential or nuisance? Nat Methods. 2016;13(4):281-2. https://doi.org/10.1038/nmeth.3812
6. Tabatabaee H, Ghahramani F, Choobineh A, Arvinfar M. Investigation of outliers of evaluation scores among school of health instructors using outlier - determination indices. J Adv Med Educ Prof. 2016;4(1):21-5.
7. Beath KJ. A finite mixture method for outlier detection and robustness in meta-analysis. Res Synth Methods. 2014;5(4):285-93. https://doi.org/10.1002/jrsm.1114
8. Fomenko I, Durst M, Balaban D. Robust regression for high throughput drug screening. Comput Methods Programs Biomed. 2006;82(1):31-7. https://doi.org/10.1016/j.cmpb.2006.01.008
9. Jamrozik J, Stranden I, Schaeffer LR. Random regression test-day models with residuals following a Student's-t distribution. J Dairy Sci. 2004;87(3):699-705. https://doi.org/10.3168/jds.s0022-0302(04)73213-0
10. Ben-Gal I. Outlier Detection [w:] Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, red. O. Maimon, L. Rokach. Kluwer Academic Publishers, Boston; 2005. https://doi.org/10.1007/978-0-387-09823-4_7
11. Pukelsheim F. The three sigma rule. The American Statistician. 1994;48(2):88-91. https://doi.org/10.2307/2684253
12. Tukey JW. Exploratory data analysis: Reading, Mass.; 1977.
13. Biglu M-H, Ghavami M, Biglu S. Cardiovascular diseases in the mirror of science. Journal of Cardiovascular and Thoracic Research. 2016;8(4):158-63. https://doi.org/10.15171/jcvtr.2016.32
14. Quick JM. Statistical analysis with R beginners guide: take control of your data and produced superior statistical analysis with R. Birmingham: Packt Publ.; 2010.
15. Santner TJ, Duffy DE. The statistical analysis of discrete data. Springer texts in statistics. www.springer.com: Springer; 1991:367. https://doi.org/10.1007/978-1-4612-1017-7
Gress, Todd W.; Denvir, James; and Shapiro, Joseph I.
"Effect of Removing Outliers on Statistical Inference: Implications to Interpretation of Experimental Data in Medical Research,"
Marshall Journal of Medicine:
2, Article 9.
Available at: https://mds.marshall.edu/mjm/vol4/iss2/9