A Novel Approach for Stock Price Prediction Using Gradient Boosting Machine with Feature Engineering (GBM-wFE)

Rebwar M. Nabi; Soran Ab. M. Saeed; Habibollah Harron

doi:10.24017/science.2020.1.3

Authors

Rebwar M. Nabi Technical College of Informatics, Sulaimani Polytechnic University, Sulaimani, Iraq
Soran Ab. M. Saeed VP for Scientific Affairs, Sulaimani Polytechnic University, Sulaimani, Iraq
Habibollah Harron University of Technology Malaysia, Johor, Malaysia

Abstract

The prediction of stock prices has become an exciting area for researchers as well as academicians due to its economic impact and potential business profits. This study proposes a novel multiclass classification ensemble learning approach for predicting stock prices based on historical data using feature engineering. The proposed approach comprises four main steps, which are pre-processing, feature selection, feature engineering, and ensemble methods. We use 11 datasets from Nasdaq and S&P 500 to ensure the accuracy of the proposed approach. Furthermore, eight feature selection algorithms are studied and implemented. More importantly, a feature engineering concept is applied to construct two new features, which are appears to be very auspicious in terms of improving classification accuracy, and this is considered the first study to use feature engineering for multiclass classification using ensemble methods. Finally, seven ensemble machine learning (ML) algorithms are used and compared to discover the ultimate collaboration prediction model. Besides, the best feature selection algorithm is proposed. This study proposes a novel multiclass classification approach called Gradient Boosting Machine with Feature Engineering (GBM-wFE) and Principal Component Analysis (PCA) as the feature selection. We find that GBM-wFE outperforms the previous studies and the overall prediction results are auspicious, as MAPE of 0.0406% is achieved, which is considered the best result compared to the available studies in the literature.

Keywords:

Stock Market Forecasting, Feature Engineering Feature Selection Machine Learning Predictive Analysis Predictable Movement Multiclass Classification

References

[1] E. F. Fama, "The Behavior of Stock-Market Prices," J. Bus., vol. 38, no. 1, pp. 34-105, 1965.
https://doi.org/10.1086/294743
[2] E. F. Fama, L. Fisher, M. C. Jensen, and R. Roll, "The Adjustment of Stock Prices to New Information," Int. Econ. Rev. (Philadelphia)., vol. 10, no. 1, pp. 1-21, 1969.
https://doi.org/10.2307/2525569
[3] J. Bollen, H. Mao, and X.-J. Zeng, "Twitter mood predicts the stock market."
[4] M. Ballings, D. Van den Poel, N. Hespeels, and R. Gryp, "Evaluating multiple classifiers for stock price direction prediction," Expert Syst. Appl., vol. 42, no. 20, pp. 7046-7056, 2015.
https://doi.org/10.1016/j.eswa.2015.05.013
[5] Y. Chen and Y. Hao, "A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction," Expert Syst. Appl., vol. 80, pp. 340-355, Sep. 2017.
https://doi.org/10.1016/j.eswa.2017.02.044
[6] T. A., "Improvement on Classification Models of Multiple Classes through Effectual Processes," Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 7, 2015.
https://doi.org/10.14569/IJACSA.2015.060709
[7] E. Chong, C. Han, and F. C. Park, "Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies," Expert Syst. Appl., vol. 83, pp. 187-205, Oct. 2017.
https://doi.org/10.1016/j.eswa.2017.04.030
[8] R. T. Farias Nazário, J. L. e Silva, V. A. Sobreiro, and H. Kimura, "A literature review of technical analysis on stock markets," Q. Rev. Econ. Financ., vol. 66, pp. 115-126, 2017.
https://doi.org/10.1016/j.qref.2017.01.014
[9] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, "Learning from class-imbalanced data: Review of methods and applications," Expert Syst. Appl., vol. 73, pp. 220-239, May 2017.
https://doi.org/10.1016/j.eswa.2016.12.035
[10] L. Wang, Z. Wang, S. Zhao, and S. Tan, "Stock market trend prediction using dynamical Bayesian factor graph," Expert Syst. Appl., vol. 42, no. 15, pp. 6267-6275, 2015.
https://doi.org/10.1016/j.eswa.2015.01.035
[11] A. H. Moghaddam, M. H. Moghaddam, and M. Esfandyari, "Stock market index prediction using artificial neural network," J. Econ. Financ. Adm. Sci., vol. 21, no. 41, pp. 89-93, 2016.
https://doi.org/10.1016/j.jefas.2016.07.002
[12] A. Nayak, M. M. M. Pai, and R. M. Pai, "Prediction Models for Indian Stock Market," Procedia Comput. Sci., vol. 89, pp. 441-449, 2016.
https://doi.org/10.1016/j.procs.2016.06.096
[13] B. Weng, M. A. Ahmed, and F. M. Megahed, "Stock market one-day ahead movement prediction using disparate data sources," Expert Syst. Appl., vol. 79, pp. 153-163, Aug. 2017.
https://doi.org/10.1016/j.eswa.2017.02.041
[14] Y. Zhao, J. Li, and L. Yu, "A deep learning ensemble approach for crude oil price forecasting," Energy Econ., vol. 66, pp. 9-16, 2017.
https://doi.org/10.1016/j.eneco.2017.05.023
[15] L. Zhou, Y. W. Si, and H. Fujita, "Predicting the listing statuses of Chinese-listed companies using decision trees combined with an improved filter feature selection method," Knowledge-Based Syst., vol. 128, pp. 93-101, 2017.
https://doi.org/10.1016/j.knosys.2017.05.003
[16] J. Sun, H. Fujita, P. Chen, and H. Li, "Dynamic financial distress prediction with concept drift based on time weighting combined with Adaboost support vector machine ensemble," Knowledge-Based Syst., vol. 120, pp. 4-14, 2017.
https://doi.org/10.1016/j.knosys.2016.12.019
[17] L. Khaidem, S. Saha, and S. R. Dey, "Predicting the direction of stock market prices using random forest," vol. 00, no. 00, pp. 1-20, 2016.
[18] Z. Dong, "Dynamic Advisor-Based Ensemble (dynABE): Case study in stock trend prediction of critical metal companies," 2019.
https://doi.org/10.1371/journal.pone.0214339
[19] S.-B. Chen, Y.-M. Zhang, C. H. Q. Ding, J. Zhang, and B. Luo, "Extended adaptive Lasso for multi-class and multi-label feature selection," Knowledge-Based Syst., vol. 173, pp. 28-36, Jun. 2019.
https://doi.org/10.1016/j.knosys.2019.02.021
[20] B. Weng, L. Lu, X. Wang, F. M. Megahed, and W. Martinez, "Predicting short-term stock prices using ensemble methods and online data sources," Expert Syst. Appl., vol. 112, pp. 258-273, 2018.
https://doi.org/10.1016/j.eswa.2018.06.016
[21] U. Khurana, D. Turaga, H. Samulowitz, and S. Parthasrathy, "Cognito: Automated feature engineering for supervised learning," in 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), 2016, pp. 1304-1307.
https://doi.org/10.1109/ICDMW.2016.0190
[22] W. Long, Z. Lu, and L. Cui, "Deep learning-based feature engineering for stock price movement prediction," Knowledge-Based Syst., vol. 164, pp. 163-173, 2019.
https://doi.org/10.1016/j.knosys.2018.10.034
[23] P. S. Panigrahy, D. Santra, and P. Chattopadhyay, "Feature engineering in fault diagnosis of induction motor," in 2017 3rd International Conference on Condition Assessment Techniques in Electrical Systems, CATCON 2017 - Proceedings, 2018, vol. 2018-Janua, pp. 306-310.
https://doi.org/10.1109/CATCON.2017.8280234
[24] Y. J. Liu, K. L. Lai, G. Dai, and M. M. F. Yuen, "A semantic feature model in concurrent engineering," IEEE Trans. Autom. Sci. Eng., vol. 7, no. 3, pp. 659-665, Jul. 2010.
https://doi.org/10.1109/TASE.2009.2039996
[25] R. Punmiya and S. Choe, "Energy theft detection using gradient boosting theft detector with feature engineering-based preprocessing," IEEE Trans. Smart Grid, vol. 10, no. 2, pp. 2326-2329, Mar. 2019.
https://doi.org/10.1109/TSG.2019.2892595
[26] J. Huang, X. Wang, S. Yong, and Y. Feng, "A feature enginering framework for short-term earthquake prediction based on AETA data," in Proceedings of 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2019, 2019, pp. 563-566.
https://doi.org/10.1109/ITAIC.2019.8785773
[27] Y. Sun and G. Yang, "Feature engineering for search advertising recognition," in Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, 2019, pp. 1859-1864.
https://doi.org/10.1109/ITNEC.2019.8729471
[28] R. M. Nabi et al., "Ultimate Prediction of Stock Market Price Movement," J. Comput. Sci. 2019, Vol. 15, Page 1795, vol. 15, no. 12, pp. 1795-1808, Dec. 2019.
https://doi.org/10.3844/jcssp.2019.1795.1808
[29] L. Zhou and H. Fujita, "Posterior probability based ensemble strategy using optimizing decision directed acyclic graph for multi-class classification," Inf. Sci. (Ny)., vol. 400-401, pp. 142-156, 2017.
https://doi.org/10.1016/j.ins.2017.02.059
[30] L. Zhou, Q. Wang, and H. Fujita, "One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies," Inf. Fusion, vol. 36, pp. 80-89, 2017.
https://doi.org/10.1016/j.inffus.2016.11.009
[31] J. Sun, H. Fujita, P. Chen, and H. Li, "Dynamic financial distress prediction with concept drift based on time weighting combined with Adaboost support vector machine ensemble," Knowledge-Based Syst., vol. 120, pp. 4-14, 2017.
https://doi.org/10.1016/j.knosys.2016.12.019
[32] H.-F. Yu et al., "Feature engineering and classifier ensemble for KDD cup 2010," JMLR Work. Conf. Proc., pp. 1-12, 2010.
[33] E. Frank, M. A. Hall, and I. H. Witten, The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques." 2016.
[34] L. Breiman, "Random Forests," 2001.
[35] Y. Freund, R. E. Schapire, and others, "Experiments with a new boosting algorithm," in icml, 1996, vol. 96, pp. 148-156.
[36] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119-139, 1997.
https://doi.org/10.1006/jcss.1997.1504
[37] N. Landwehr, M. Hall, and E. Frank, "Logistic model trees," Mach. Learn., vol. 59, no. 1-2, pp. 161-205, 2005.
https://doi.org/10.1007/s10994-005-0466-3
[38] M. Swamynathan, Mastering Machine Learning with Python in Six Steps - review and good into in ML and NN approaches and basics + Python samples --Each topic has two parts: the first part will cover the theoretical concepts and the second part will cover practical impleme, vol. 19, no. 2. 2017.
https://doi.org/10.1007/978-1-4842-2866-1
[39] S. J. Russell and P. Norvig, Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited, 2016.
[40] S. Sun, Y. Wei, and S. Wang, "AdaBoost-LSTM Ensemble Learning for Financial Time Series Forecasting," in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018.
https://doi.org/10.1007/978-3-319-93713-7_55
[41] M. S. Hegde, G. Krishna, and R. Srinath, "An Ensemble Stock Predictor and Recommender System," in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 1981-1985.
https://doi.org/10.1109/ICACCI.2018.8554424
[42] R. Dash, S. Samal, R. Dash, and R. Rautray, "An integrated TOPSIS crow search based classifier ensemble: In application to stock index price movement prediction," Appl. Soft Comput. J., vol. 85, p. 105784, Dec. 2019.
https://doi.org/10.1016/j.asoc.2019.105784
[43] S. A. Gyamerah, P. Ngare, and D. Ikpe, "On Stock Market Movement Prediction Via Stacking Ensemble Learning Method," in CIFEr 2019 - IEEE Conference on Computational Intelligence for Financial Engineering and Economics, 2019, pp. 1-8.
https://doi.org/10.1109/CIFEr.2019.8759062
[44] K. S. Gan, K. O. Chin, P. Anthony, and S. V. Chang, "Homogeneous ensemble feedforward neural network in CIMB stock price forecasting," in Proceedings - 2018 IEEE International Conference on Artificial Intelligence in Engineering and Technology, IICAIET 2018, 2019, pp. 111-116.
https://doi.org/10.1109/IICAIET.2018.8638452