Dealing with the Outlier Problem in Multivariate Linear Regression Analysis Using the Hampel Filter
https://doi.org/10.24017/science.2025.1.1
Abstract views: 0 / PDF downloads: 0Abstract
Outliers in multivariate linear regression models can significantly distort parameter estimates, leading to biased results and reduced predictive accuracy. These outliers may occur in the dependent variable or both independent and dependent variables, resulting in large residual values that compromise model reliability. Addressing outliers is essential for improving the accuracy and robustness of regression models. In this study, proposes a Hampel filter-modified algorithm to dynamically detect and mitigate extreme values, enhancing parameter estimation and predictive performance. The algorithm optimizes window size and threshold parameters to minimize mean square errors, making it a robust approach for handling outliers in multivariate regression analysis. To assess its effectiveness, simulations and real datasets were analyzed using a MATLAB-based implementation. The algorithm was compared with the classical Hampel approach to evaluate improvements in outlier detection and suppression. The results indicate that the proposed method effectively identifies and removes extreme values, leading to improved parameter estimation accuracy, enhanced model stability, and greater predictive performance and the performance was analyzed using the Mean Squared Error (MSE). The adaptive nature of the filter minimizes the impact of outliers, ensuring a more reliable regression model. The Hampel filter-modified algorithm provides an effective and adaptive solution for handling outliers in multivariate regression models. By dynamically identifying and mitigating extreme values, it enhances model accuracy, strengthens predictive capabilities, and ensures greater resilience against data variability. This approach offers a valuable tool for researchers and practitioners working with outlier-prone datasets, significantly improving the reliability of multivariate regression analysis.
Keywords:
References
A. Omer, B. Sedeeq and T. Ali, "A proposed hybrid method for multivariate linear regression model and multivariate wavelets (simulation study)," Polytechnic Journal of Humanities and Social Sciences, vol. 5, no. 1, pp. 112-124, 2024.DOI: 10.25156/ptjhss.v5n1y2024.pp112-124
A. Rencher and W. Christensen, “Methods of multivariate analysis”, New York: Wiley series in probability and statis-tics, 3rd ed. Wiley, 2012. DOI: 10.1002/9781118391686 DOI: https://doi.org/10.1002/9781118391686
D. Montgomery, E. Peck and G. Vining, introduction to linear regression analysis, New York: John Wiley & Sons, 2021. https://www.wiley.com/ -.
S. Phuttisen and S. Wuttichai, "Detection of outliers method in grouped multivariate data: a method based on multiple linear regression," Pakistan Journal of Statistics and Operation Research, pp. 445--453, 2024. DOI: 10.18187/pjsor.v20i3.4575 DOI: https://doi.org/10.18187/pjsor.v20i3.4575
F. Hampel, E. Ronchetti, P. Rousseeuw and W. Stahel, Robust statistics: The approach based on influence functions, New York: Wiley, 2005. http://dx.doi.org/10.1002/9781118186435. DOI: https://doi.org/10.1002/9781118186435
F. Hampel, "The influence curve and its role in robust estimation," Journal of the American Statistical Association, vol. 69, no. 346, pp. 383-393, 1974. https://doi.org/10.1080/01621459.1974.10482962. DOI: https://doi.org/10.1080/01621459.1974.10482962
V. Calabrese, G. L. Tripepi, D. Santoro, V. Cernaro, V. A. Panuccio, S. Mezzatesta, F. Mattace-Raso and C. Torino, "Im-pact of serum phosphate on hemoglobin level: a longitudinal analysis on a large cohort of dialysis patients," Journal of Clinical Medicine, vol. 13, no. 19, p. 5657, 2024. https://doi.org/10.3390/jcm13195657. DOI: https://doi.org/10.3390/jcm13195657
A. Kochersberger, A. Coakley, L. Millheiser, J. R. Morris, C. Manneh, A. Jackson, J. L. Garrison, and E. Hariton, "The association of race, ethnicity, and socioeconomic status on the severity of menopause symptoms: a study of 68,864 women," Menopause, pp. 10-1097, 2023. https://doi.org/10.1097/gme.0000000000002349 DOI: https://doi.org/10.1101/2023.12.21.23299398
A. P. Rubbio, A. Sisinni, A. Moroni, M. Adamo, C. Grasso, M. Casenghi and M. B. Tusa. E.A., "Impact of extra-mitral valve cardiac involvement in patients with primary mitral regurgitation undergoing transcatheter edge-to-edge re-pair," Journal of EuroPCR and the European Association of Percutaneous Cardiovascular Interventions, vol. 19, no. 11, pp. 926-936, 2023. https://doi.org/10.4244/eij-d-23-00548. DOI: https://doi.org/10.4244/EIJ-D-23-00548
I. Deftereos, J. M. C. Yeung, J. Arslan, . V. M. Carter, E. Isenring, N. Kiss, "Assessment of nutritional status and nutrition impact symptoms in patients undergoing resection for upper gastrointestinal cancer: results from the multi-centre nourish point prevalence study," Nutrients, vol. 13, no. 10, p. 3349, 2021. https://doi.org/10.3390/nu13103349. DOI: https://doi.org/10.3390/nu13103349
A. Volfart, K. L. McMahon and G. I. de Zubicaray, "A comparison of denoising approaches for spoken word produc-tion related artefacts in continuous multiband fMRI data," Neurobiology of Language, vol. 5, no. 4, pp. 901-921, 2024. https://doi.org/10.1162/nol_a_00151. DOI: https://doi.org/10.1162/nol_a_00151
N. A. Ramli, Z. Zahid, S. A. S. Hussin and N. A. Ramli, "Comparison of classification models for breast cancer disease using multivariate analysis and data mining approaches," Applied Mathematics and Computational Intelligence, vol. 12, no. 4, pp. 1-12, 2023. http://dx.doi.org/10.58915/amci.v12i4.348. DOI: https://doi.org/10.58915/amci.v12i4.348
M. S. H. Talukder and R. B. Sulaiman, "Comparative analysis of epileptic seizure prediction: exploring diverse pre-processing techniques and machine learning models," Electrical Engineering and Systems Science, pp. 1-14, 2023. https://doi.org/10.48550/arXiv.2308.05176.
G. Karthikeyan and P. Balasubramanie, "A novel attribute-based dynamic clustering with schedule-based rotation method for outlier detection," International Journal of Business Intelligence and Data Mining, vol. 16, no. 2, pp. 214-230, 2020. http://dx.doi.org/10.1504/IJBIDM.2020.104741. DOI: https://doi.org/10.1504/IJBIDM.2020.104741
M. Verdonck, H. Carvalho, J. Berghmans, P. Forget and J. Poelaert, "Exploratory outlier detection for acceleromyo-graphic neuromuscular monitoring: machine learning approach," Journal of medical Internet Research, vol. 23, no. 6, p. e25913, 2021. https://doi.org/10.2196/25913. DOI: https://doi.org/10.2196/25913
J. Hair, W. Black, B. Babin, and R. Anderson, Multivariate data analysis: Global edition, Saddle River, New Jersey: Prentice-Hall, 2010. https://www.drnishikantjha.com/papersCollection/Multivariate%20Data%20Analysis.pdf.
W. Sauerbrei, A. Perperoglou, M. Schmid, M. Abrahamowicz, H. Becher, H. Binder, D. Dunkler, F. E. Harrell Jr, P. Royston and G. Heinze, "State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues," Diagnostic and prognostic research, vol. 4, no. 3, pp. 1-18, 2020. https://doi.org/10.1186/s41512-020-00074-3. DOI: https://doi.org/10.1186/s41512-020-00074-3
U. Knief and W. Forstmeier, "Violating the normality assumption may be the lesser of two evils," Behavior Research Methods, vol. 53, no. 6, pp. 2576-2590, 2021 https://doi.org/10.3758/s13428-021-01587-5. DOI: https://doi.org/10.3758/s13428-021-01587-5
T. H. Ali, N. S. Albarwari and D. L. Ramadhan, "Using the hybrid proposed method for quantile regression and mul-tivariate wavelet in estimating the linear model parameters," Iraqi Journal of Statistical Sciences, vol. 20, no. 1, pp. 9-24, 2023. DOI: 10.33899/IQJOSS.2023.178679. DOI: https://doi.org/10.33899/iqjoss.2023.0178679
P. Huber and E. Ronchetti, Robust statistics, New York: John Wiley & Sons., 1981. https://onlinelibrary.wiley.com/doi/book/10.1002/0471725250.
T. Ali, B. Sedeeq, D. Saleh and A. Rahim, "Robust multivariate quality control charts for enhanced variability monitor-ing," Quality and Reliability Engineering International, vol. 40, no. 3, pp. 1369-1381, 2024. https://doi.org/10.1002/qre.3472. DOI: https://doi.org/10.1002/qre.3472
R. Pearson, "Outliers in process modeling and identification," IEEE Transactions on Control Systems Technology, vol. 10, no. 1, pp. 55-63, 2002. https://doi.org/10.1109/87.974338. DOI: https://doi.org/10.1109/87.974338
H. Pehlivan, "A novel outlier detection method based on bayesian change point analysis and hampel identifier for gnss coordinate time series," EURASIP Journal on Advances in Signal Processing, vol. 2024, no. 1, p. 44, 2024. http://dx.doi.org/10.1186/s13634-023-01097-w. DOI: https://doi.org/10.1186/s13634-023-01097-w
J. H. Sullivan, M. Warkentin and L. Wallace, "So many ways for assessing outliers: what really works and does it mat-ter?," Journal of Business Research, vol. 132, pp. 530-543, 2021. https://doi.org/10.1016/j.jbusres.2021.03.066. DOI: https://doi.org/10.1016/j.jbusres.2021.03.066
J. Raymaekers and P. J. Rousseeuw, "Challenges of cellwise outliers," Econometrics and Statistics, 2024. https://doi.org/10.1016/j.ecosta.2024.02.002. DOI: https://doi.org/10.1016/j.ecosta.2024.02.002
T. Li, G. Kou, Y. Peng, and P. S. Yu, "An integrated cluster detection, optimization, and interpretation approach for financial data," IEEE Transactions on Cybernetics, vol. 52, no. 12, pp. 13848-13861, 2021. https://doi.org/10.1109/TCYB.2021.3109066. DOI: https://doi.org/10.1109/TCYB.2021.3109066
S. Zhang, R. Yao, C. Du, E. Essah, and B. Li, "Analysis of outlier detection rules based on the ashrae global thermal comfort database," Building and Environment, vol. 234, p. 110155, 2023. http://dx.doi.org/10.1016/j.buildenv.2023.110155. DOI: https://doi.org/10.1016/j.buildenv.2023.110155
M. Mayrhofer and P. Filzmoser, "Multivariate outlier explanations using shapley values and mahalanobis distances," Econometrics and Statistics, 2023. https://doi.org/10.1016/j.ecosta.2023.04.003. DOI: https://doi.org/10.1016/j.ecosta.2023.04.003
A. Smiti, "A critical overview of outlier detection methods," Computer Science Review, vol. 38, p. 100306, 2020. https://doi.org/10.1016/j.cosrev.2020.100306. DOI: https://doi.org/10.1016/j.cosrev.2020.100306
S. Y. Woo and S. Kim, "Determination of Cutoff Values for Biomarkers in Clinical Studies," Precision and Future Medi-cine, vol. 4, no. 1, pp. 2-8, 2020. https://doi.org/10.23838/pfm.2019.00135. DOI: https://doi.org/10.23838/pfm.2019.00135
E. Cabana, R. E. Lillo and H. Laniado, "multivariate outlier detection based on a robust mahalanobis distance with shrinkage estimators," Statistical papers, vol. 62, pp. 1583-1609, 2021. https://doi.org/10.1007/s00362-019-01148-1. DOI: https://doi.org/10.1007/s00362-019-01148-1
M. Mashuri, M. Ahsan, M. H. Lee, D. D. Prastyo and Wibawati, "PCA-based hotelling's T2 chart with fast minimum covariance determinant (FMCD) estimator and kernel density estimation (KDE) for network intrusion detection," Com-puters & Industrial Engineering, vol. 158, p. 107447, 2021. https://doi.org/10.1016/j.cie.2021.107447. DOI: https://doi.org/10.1016/j.cie.2021.107447
B. Sedeeq, Z. Muhammad, I. Ali and T. Ali, "Construction robust-chart and compare it with Hotelling’s T2-chart," Zanco Journal of Human Sciences, vol. 28, no. 1, pp. 140-157, 2024. https://doi.org/10.21271/zjhs.28.1.11. DOI: https://doi.org/10.21271/zjhs.28.11
S. Thudumu, P. Branch, J. Jin and J. Singh, "A comprehensive survey of anomaly detection techniques for high dimen-sional big data," Journal of Big Data, vol. 7, pp. 1-30, 2020. https://doi.org/10.1186/s40537-020-00320-x. DOI: https://doi.org/10.1186/s40537-020-00320-x
L. Davies and U. Gather, "The Identification of multiple outliers," Journal of the American Statistical Association, vol. 88, no. 423, pp. 782-792, 1993. https://doi.org/10.1080/01621459.1993.10476339. DOI: https://doi.org/10.1080/01621459.1993.10476339
V. Barnett and T. Lewis, Outliers in statistical data, New York: Wiley, 1994. ISBN: 978-0-471-93094-5.
M. M. Garcez Duarte and M. Sakr, "An experimental study of existing tools for outlier detection and cleaning in trajec-tories," GeoInformatica, pp. 1-21, 2024. http://dx.doi.org/10.1007/s10707-024-00522-y. DOI: https://doi.org/10.1007/s10707-024-00522-y
S. Jalal, D. Saleh, B. Sedeeq and T. Ali, "Construction of the Daubechies wavelet chart for quality control of the single value," Iraqi Journal of Statistical Sciences, vol. 21, no. 1, pp. 160-169, 2024. http://dx.doi.org/10.33899/iqjoss.2024.183257. DOI: https://doi.org/10.33899/iqjoss.2024.183257
A. S. Yaro, F. Maly, P. Prazak, and K. Malý, "Outlier detection performance of a modified z-score method in time-series rss observation with hybrid scale estimators," IEEE Access, vol. 12, pp. 12785 - 12796, 2024. https://doi.org/10.1109/ACCESS.2024.3356731. DOI: https://doi.org/10.1109/ACCESS.2024.3356731
T. H. Ali, "Modification of the adaptive Nadaraya-Watson kernel method for nonparametric regression (simulation study)," Communications in Statistics-Simulation and Computation, vol. 51, no. 2, pp. 391-403, 2022. https://doi.org/10.1080/03610918.2019.1652319. DOI: https://doi.org/10.1080/03610918.2019.1652319
Downloads
How to Cite
Article Metrics
Published
Issue
Section
License
Copyright (c) 2025 Amira Wali Omer, Taha Hussein Ali (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.