Utilizing Statistical Tests for Comparing Machine Learning Algorithms

Hozan Khalid Hamarashid

doi:10.24017/science.2021.1.8

Authors

Hozan Khalid Hamarashid Information Technology Department, Computer Science Institute, Sulaimani Polytechnic University, Sulaimani, Iraq

Abstract

The mean result of machine learning models is determined by utilizing k-fold cross-validation. The algorithm with the best average performance should surpass those with the poorest. But what if the difference in average outcomes is the consequence of a statistical anomaly? To conduct whether or not the mean result differences between two algorithms is genuine then statistical hypothesis test is utilized. Using statistical hypothesis testing, this study will demonstrate how to compare machine learning algorithms. The output of several machine learning algorithms or simulation pipelines is compared during model selection. The model that performs the best based on your performance measure becomes the last model, which can be utilized to make predictions on new data. With classification and regression prediction models it can be conducted by utilizing traditional machine learning and deep learning methods. The difficulty is to identify whether or not the difference between two models is accurate.

Keywords:

machine learning, machine learning assessment, statistical tests, machine learning algorithm, machine learning comparison.

References

[1] W. Daniel, C. Cross. 2013, "Biostatistics: Basic Concepts and Methodology for the Health Sciences", 10th Edition International Student Version, chapter 6, ISBN: 978-1-118-65291-6, 2013.
[2] H. K. Hamarashid, S. A. Saeed, and T. A. Rashid, "Next word prediction based on the Ngram model for Kurdish Sorani and Kurmanji". Neural Computing and Applications, 33(9), 4547-4566. 2021
https://doi.org/10.1007/s00521-020-05245-3
[3] F. Emmert-Streib and M. Dehmer. "Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference", https://doi.org/10.3390/make1030054, 2019.
https://doi.org/10.3390/make1030054
[4] S. Hartshorn. "Hypothesis Testing: A Visual Introduction To Statistical Significance. USA, ASIN : B019N212NE. 2015.
[5] L. Surhone, M. Timpledon, and S. Marseken. "P-Value", ISBN 6130502370, 9786130502379, VDM publishing.2010.
[6] S. McLeod. "What a p-value tells you about statistical significance, 2019, [online] available at: https://www.simplypsychology.org/p-value.html. Accessed on (02/04/2021).
[7] T. Dahiru. "P - value, a true test of statistical significance? A cautionary note. Annals of Ibadan postgraduate medicine", 6(1), 21-26. https://doi.org/10.4314/aipm.v6i1.64038, 2008.
https://doi.org/10.4314/aipm.v6i1.64038
[8] V. Johnson. "Revised standards for statistical evidence". Proc Natl Acad Sci 110(48):19313-19317, 2013.
https://doi.org/10.1073/pnas.1313476110
[9] R. Nuzzo. "Statistical errors: P values, the 'gold standard'of statistical validity, are not as reliable as many scientists assume". Nature 506:150-152. 2014.
https://doi.org/10.1038/506150a
[10] R. Wasserstein, N. Lazar. "The ASA's statement on p-values: context, process, and purpose". Am Stat 70(2):129-133. 2016.
https://doi.org/10.1080/00031305.2016.1154108