Real Time Intrusion Detection System Based on Web Log File Analysis

https://doi.org/10.24017/

Abstract views: 0 / PDF downloads: 0

Authors

Abstract

Web log data have a wealth of useful data about a website. They contain the history of all users’ activities while accessing websites.  Some log files contain records of various intrusion types that refer to unauthorized or malicious activities recorded during website access. System and network logs are examined as part of log file analysis for Intrusion Detection Systems (IDS) to identify suspicious activities and possible security risks. Many existing IDS systems suffer from false positives and false negatives, which can either fail to identify real dangers or overwhelm administrators with unnecessary alarms. Real-time cyberattacks are common, and any delay in detection can lead to serious consequences like data breaches and system outages. In this paper, we developed a real time IDS based on weblog analysis which is used to predict if the user’s request is an attack, normal, or suspicious. This can be done by utilizing the contents of the Apache access log data, considering some of the hyper text transfer protocol request features obtained by analyzing the user’s requests.  In this work, various data preprocessing techniques are applied, and key features are extracted, enhancing the system's ability to effectively detect intrusions. The model was constructed using four machine learning algorithms: gradient-boosted trees, decision tree, random forest, and support vector machine. According to the results obtained, the proposed model with the random forest algorithm produces the most accurate model among the others. It attained 99.66% precision, 99.66% recall, and 99.83% accuracy score.

Keywords:

IDS, Real Time System, Web Log File, Feature Engineering, Web Usage Analysis

References

P. Ryciak, K. Wasielewska, and A. Janicki, “Anomaly detection in log files using selected natural language pro-cessing methods,” Applied Sciences, vol. 12, no. 10, p. 5089, May 2022, doi: 10.3390/app12105089.

N. Jones, “Computer science: The learning machines,” Nature, vol. 505, no. 7482, pp. 146–148, Jan. 2014, doi: 10.1038/505146a.

A. Abbas, M. A. Khan, S. Latif, M. Ajaz, A. A. Shah, and J. Ahmad, “A new ensemble-based intrusion detection system for internet of things,” Arabian Journal for Science and Engineering, vol. 47, no. 2, pp. 1805–1819, Feb. 2022, doi: 10.1007/s13369-021-06086-5.

M. A. Latib, S. A. Ismail, H. M. Sarkan, and R. C. Yusoff., “Analyzing log in big data environment: A review,” ARPN Journal of Engineering and Applied Sciences, vol. 10, no. 23, pp. 17777–17784, 2015.

S. He, J. Zhu, P. He, and M. R. Lyu, “Experience report: system log analysis for anomaly detection,” in 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), IEEE, Oct. 2016, pp. 207–218. doi: 10.1109/ISSRE.2016.21.

A. K. Jumaa, A. A. Abudalrahman, R. R. Aziz, and A. A. Shaltooki, “Protect sensitive knowledge in data mining clustering algorithm,” Journal of Theoretical Applied Information Technology, vol. 95, no. 15, 2017.

M. Siwach and S. Mann, “Anomaly detection for web log data analysis: A review,” Journal of Algebraic Statistics, vol. 13, no. 1, pp. 129–148, May 2022.

A. R. Abdulla and N. G. M. Jameel, “a review on iot intrusion detection systems using supervised machine learning: techniques, datasets, and algorithms,” UHD Journal of Science and Technology, vol. 7, no. 1, pp. 53–65, Mar. 2023, doi: 10.21928/uhdjst.v7n1y2023.pp53-65.

H. J. Liao, C. H. Richard Lin, Y. C. Lin, and K. Y. Tung, “Intrusion detection system: A comprehensive review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16–24, Jan. 2013, doi: 10.1016/j.jnca.2012.09.004.

R. R. Abdalla and A. K. Jumaa, “Log file analysis based on machine learning: A survey,” UHD Journal of Science and Technology, vol. 6, no. 2, pp. 77–84, Oct. 2022, doi: 10.21928/uhdjst.v6n2y2022.pp77-84.

A. Brandao and P. Georgieva, “Log files analysis for network intrusion detection,” in 2020 IEEE 10th Internation-al Conference on Intelligent Systems (IS), IEEE, Aug. 2020, pp. 328–333. doi: 10.1109/IS48319.2020.9199976.

V. Chitraa and A. S. Davamani, “A survey on preprocessing methods for web usage data,” International Journal of Computer Science and Information Security, vol. 7, no. 3, Apr. 2010.

C. R. Varnagar, N. N. Madhak, T. M. Kodinariya, and J. N. Rathod, “Web usage mining: A review on process, methods and techniques,” in 2013 International Conference on Information Communication and Embedded Systems (ICICES), IEEE, Feb. 2013, pp. 40–46. doi: 10.1109/ICICES.2013.6508399.

R. K. Jain, Dr. R. S. Kasana, and S. Jain, “Efficient web log mining using doubly linked tree,” International Journal of Computer Science and Information Security, vol. 3, no. 1, Jul. 2009.

R. Roy and G. Appa, “Survey on pre-processing web log files in web usage mining.,” International Journal of Ad-vanced Science and Technology, vol. 29, no. 3, pp. 682–691, Mar. 2020.

P. Svec, L. Benko, M. Kadlecik, J. Kratochvil, and M. Munk, “Web usage mining: data pre-processing impact on found knowledge in predictive modelling,” Procedia Computer Science. vol. 171, pp. 168–178, 2020, doi: 10.1016/j.procs.2020.04.018.

A. Gupta, M. Atawnia, R. Wadhwa, S. Mahar, and V. Rohilla, “Comparative analysis of web usage mining,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 6, no. 4, pp. 324–328, Apr. 2017, doi: 10.17148/IJARCCE.2017.6461.

M.-T. Nguyen, T.-D. Diep, T. Hoang Vinh, T. Nakajima, and N. Thoai, “Analyzing and visualizing web server access log file,” 2018, pp. 349–367. doi: 10.1007/978-3-030-03192-3_27.

T. A. Al-asadi and A. J. Obaid, “Discovering similar user navigation behavior in Web log data,” International Journal of Applied Engineering Research, vol. 11, no. 16, pp. 8797–8805, 2016.

S. G Tadesse and D. E Dedefa., “Layer based log analysis for enhancing security of enterprise datacenter,” Inter-national Journal of Computer Science and Information Security, vol. 14, no. 7, pp. 158–165, 2016.

A. K. Alhadithy and A. A. Omar, “online database intrusion detection system based on query signatures,” Jour-nal of University of Human Development, vol. 3, no. 1, p. 282, Mar. 2017, doi: 10.21928/juhd.v3n1y2017.pp282-287.

J. Kim, M. Park, H. Kim, S. Cho, and P. Kang, “Insider threat detection based on user behavior modeling and anomaly detection algorithms,” Applied Sciences, vol. 9, no. 19, p. 4018, Sep. 2019, doi: 10.3390/app9194018.

D. C. Le, N. Zincir-Heywood, and M. I. Heywood, “Analyzing data granularity levels for insider threat detection using machine learning,” IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 30–44, Mar. 2020, doi: 10.1109/TNSM.2020.2967721.

J. Xu, F. Xu, F. Ma, L. Zhou, S. Jiang, and Z. Rao, “Mining web usage profiles from proxy logs: user identifica-tion,” in 2021 IEEE Conference on Dependable and Secure Computing (DSC), IEEE, Jan. 2021, pp. 1–6. doi: 10.1109/DSC49826.2021.9346276.

Y. Li, S. Yao, R. Zhang, and C. Yang, “Analyzing host security using D‐S evidence theory and multisource in-formation fusion,” International Journal of Intelligent Systems, vol. 36, no. 2, pp. 1053–1068, Feb. 2021, doi: 10.1002/int.22330.

M. Zhong, Y. Zhou, and G. Chen, “A security log analysis scheme using deep learning algorithm for IDSs in social network,” Security and Communication Networks, vol. 2021, pp. 1–13, Mar. 2021, doi: 10.1155/2021/5542543.

K. A. Cahyanto, M. A. Al Hilmi, and M. Mustamiin, “pengujian rule-based pada dataset log server menggunakan support vector machine berbasis linear discriminat analysis untuk deteksi malicious activity,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 9, no. 2, pp. 245–254, Feb. 2022, doi: 10.25126/jtiik.2022924107.

M. A. Al Hilmi, K. A. Cahyanto, and M. Mustamiin, “Apache web server - access log pre-processing for web intrusion detection,” 2020, IEEE Dataport.

R. L. Wasserstein and N. A. Lazar, “The ASA statement on p -values: context, process, and purpose,” Am Stat, vol. 70, no. 2, pp. 129–133, Apr. 2016, doi: 10.1080/00031305.2016.1154108.

Downloads

How to Cite

[1]
R. R. Abdalla, A. K. . Jumaa, and A. F. Fadhil, “Real Time Intrusion Detection System Based on Web Log File Analysis”, KJAR, vol. 10, no. 1, pp. 35–49, Feb. 2025, doi: 10.24017/.

Article Metrics

Published

26-02-2025

Issue

Section

Pure and Applied Science