Fine-tuning SBERT for Semantic Research Title Classification in Trilingual University Repository

Havan Wahid Rashid; Sarkar Hasan Ahmed

doi:10.24017/science.2025.2.9

Authors

Havan Wahid Rashid Information Technology Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaymaniyah, Iraq. https://orcid.org/0009-0007-9861-456X
Sarkar Hasan Ahmed Computer Network Department, Technical College of Informatics, Sulaimani Polytechnic University, Sulaymaniyah, Iraq. https://orcid.org/0000-0001-5729-073X

Abstract

Recommendation systems are essential for automatically surfacing relevant content from large datasets, reducing search time, and facilitating discovery. In academia, content-based recommendation systems are beneficial when only brief titles are available and multilingual text is standard. Universities in the Kurdistan Regional Government currently lack a centralized research repository, with records scattered across different institutions and often manually maintained. This makes it difficult for students and faculty to find related topics, potential supervisors, or cross-disciplinary connections. This paper presents a trilingual (English, Arabic, and Kurdish) recommendation system for academic research titles. Three key contributions are made: (1) the creation of the first integrated dataset of 4,257 research titles from Sulaimani Polytechnic University publicly available; (2) the development of a web-based platform for semantic search and title-level recommendations to support research discovery and student–supervisor matching; and (3) an evaluation between Sentence-BERT models—all-MiniLM-L6-v2 and paraphrase-multilingual-MiniLM-L12-v2—before and after fine-tuning with a domain-specific taxonomy and cosine embedding loss. Performance is assessed using Precision@5, Mean Reciprocal Rank, and NDCG@5 with expert-annotated relevance judgments for 20 query titles. Fine-tuning resulted in performance improvements, with paraphrase-multilingual-MiniLM-L12-v2 achieving Precision@5 of 0.94 and NDCG@5 of 0.991. The English-only model also showed improvements, Precision@5: 0.79→0.82; NDCG@5: 0.885→0.922.

Keywords:

Recommendation system, SBERT, Kurdistan higher education, Paper classification., Kurdish higher education, Content-based filtering

References

C. A. Mahringer, F. Baessler, M. F. Gerchen, C. Haack, K. Jacob, and S. Mayer, “Benefits and obstacles of interdiscipli-nary research: Insights from members of the Young Academy at the Heidelberg Academy of Sciences and Humani-ties,” iScience, vol. 26, no. 12, Dec. 2023, doi: 10.1016/j.isci.2023.108508. DOI: https://doi.org/10.1016/j.isci.2023.108508

D. Dasri, A. Annisa, and T. Haryanto, “Two-way thesis supervisor recommendation system using MapReduce K-Skyband View Queries,” JOIV International Journal on Informatics Visualization, 2025, Accessed: Jun. 10, 2025. [Online]. Available: www.joiv.org/index.php/joiv. DOI: https://doi.org/10.62527/joiv.9.1.2800

H. Ko, S. Lee, Y. Park, and A. Choi, “A survey of recommendation systems: recommendation models, techniques, and application fields,” Electronics (Basel), vol. 11, no. 1, p. 141, Jan. 2022, doi: 10.3390/electronics11010141. DOI: https://doi.org/10.3390/electronics11010141

W. M. Thackston, Sorani Kurdish: A Reference Grammar with Selected Readings. 2006, 250 pp. [Online]. Available: https://archive.org/details/thackston-2006-sorani-grammar-readings

E. Öpengin and G. Haig, “Introduction to special issue - Kurdish: A critical research overview,” Kurdish Studies, vol. 2, pp. 99–134, 2022, Accessed: Jun. 08, 2025. [Online]. Available: www.kurdishstudies.net.

S. Ahmadi, "A tokenization system for the kurdish language," in Proc. 7th Workshop on NLP for Similar Languages, Varie-ties and Dialects (VarDial 2020), Barcelona, Spain, 2020, pp. 96-101. [Online]. Available: https://aclanthology.org/2020.vardial-1.11

S. Ahmadi, “KLPT – Kurdish language processing toolkit,” in Proceedings of Second Workshop for NLP Open Source Soft-ware (NLP-OSS), Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 72–84. doi: 10.18653/v1/2020.nlposs-1.11. DOI: https://doi.org/10.18653/v1/2020.nlposs-1.11

M. Hafiz Ismail, T. Rosli Razak, M. Arif Hashim, and A. Faisal Ibrahim, “A simple recommender engine for matching final-year project student with supervisor,” CCMSE, 2015, Accessed: Jun. 01, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.1908.03475.

H. Amaad, N. Jhamat, K. Riaz, and Z. Arshad, “Context-aware and sequential pattern mining based recommenda-tions for research papers: a hybrid approach,” Journal of Information Communication Technologies and Robotic Applica-tions, pp. 57–76, Dec. 2020, doi: 10.51239/jictra.v0i0.240. DOI: https://doi.org/10.51239/jictra.v0i0.240

V. Stergiopoulos, M. Vassilakopoulos, E. Tousidou, and A. Corral, “An academic recommender system on large cita-tion data based on clustering, graph modeling and deep learning,” Knowledge Information System, vol. 66, no. 8, pp. 4463–4496, Aug. 2024, doi: 10.1007/s10115-024-02094-7. DOI: https://doi.org/10.1007/s10115-024-02094-7

K. Church, O. Alonso, P. Vickers, J. Sun, A. Ebrahimi, and R. Chandrasekar, “Academic article recommendation using multiple perspectives,” Jul. 2024, Accessed: Jun. 01, 2025. [Online]. Available: http://arxiv.org/abs/2407.05836.

D. Mohamed, A. El-Kilany, and H. M. O. Mokhtar, "Academic articles recommendation using concept-based represen-tation," in Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2, Cham, Switzerland: Springer, 2021, pp. 733–744, doi: 10.1007/978-3-030-55187-2_52. DOI: https://doi.org/10.1007/978-3-030-55187-2_52

C. Albusac, L. M. de Campos, J. M. Fernández-Luna, and J. F. Huete, “Content-based recommendation for academic expert finding,” in Proceedings of the 5th Spanish Conference on Information Retrieval, New York, NY, USA: ACM, Jun. 2018, pp. 1–8. doi: 10.1145/3230599.3230607. DOI: https://doi.org/10.1145/3230599.3230607

S. Gheewala, S. Xu, and S. Yeom, “In-depth survey: deep learning in recommender systems—exploring prediction and ranking models, datasets, feature analysis, and emerging trends,” Neural Computing and Applications, vol. 37, no. 17, pp. 10875–10947, Jun. 2025, doi: 10.1007/s00521-024-10866-z. DOI: https://doi.org/10.1007/s00521-024-10866-z

A. Rodriguez and R. Vuppala, “A recommendation system for scientific papers through Bayesian nonparametric hybrid filtering,” 2014, pp. 20–41. doi: 10.4018/978-1-4666-5063-3.ch002. DOI: https://doi.org/10.4018/978-1-4666-5063-3.ch002

R. Singh, G. Gaonkar, V. Bandre, N. Sarang, and S. Deshpande, “Scientific paper recommendation system,” in 2023 IEEE 8th International Conference for Convergence in Technology (I2CT), IEEE, Apr. 2023, pp. 1–4. doi: 10.1109/I2CT57861.2023.10126196. DOI: https://doi.org/10.1109/I2CT57861.2023.10126196

A. L. Lezama-Sánchez, M. Tovar Vidal, and J. A. Reyes-Ortiz, “An approach based on semantic relationship embed-dings for text classification,” Mathematics, vol. 10, no. 21, p. 4161, Nov. 2022, doi: 10.3390/math10214161. DOI: https://doi.org/10.3390/math10214161

M. Fateen and T. Mine, “Using similarity learning with SBERT to optimize teacher report embeddings for academic performance prediction,” in Communications in Computer and Information Science, vol. 1831, pp. 720–726, 2023, doi: 10.1007/978-3-031-36336-8_111. DOI: https://doi.org/10.1007/978-3-031-36336-8_111

N. Yang, J. Jo, M. Jeon, W. Kim, and J. Kang, “Semantic and explainable research-related recommendation system based on semi-supervised methodology using BERT and LDA models,” Expert Systems with Applications, vol. 190, p. 116209, Mar. 2022, doi: 10.1016/j.eswa.2021.116209. DOI: https://doi.org/10.1016/j.eswa.2021.116209

C. Yin and Z. Zhang, "A study of sentence similarity based on the all-minilm-l6-v2 model with 'same semantics, dif-ferent structure' after fine tuning," in Proc. 2024 2nd Int. Conference Image, Algorithms and Artificial Intelligence (ICIAAI), 2024, pp. 677–684, doi: 10.2991/978-94-6463-540-9_69. DOI: https://doi.org/10.2991/978-94-6463-540-9_69

H. A. Mohamed, F. Gasparetti, and G. Sansonetti, "BERT, ELMo, USE and InferSent sentence encoders: the panacea for research-paper recommendation?" in Proceedings 13th ACM Conf. Recommender Systems,” 2019. Accessed: Jun. 02, 2025. [Online]. Available: https://www.researchgate.net/publication/335555312.

B. Juarto and A. Suganda Girsang, “Neural collaborative with sentence BERT for news recommender system,” JOIV : International Journal on Informatics Visualization, vol. 5, no. 4, p. 448, Dec. 2021, doi: 10.30630/joiv.5.4.678. DOI: https://doi.org/10.30630/joiv.5.4.678

S. S. Roy, A. Kumar, and R. Suresh Kumar, “Metadata and review-based hybrid apparel recommendation system us-ing cascaded large language models,” IEEE Access, vol. 12, pp. 140053–140071, 2024, doi: 10.1109/ACCESS.2024.3462793. DOI: https://doi.org/10.1109/ACCESS.2024.3462793

K. Sarode and S. R. Javaji, "Multi-BERT for embeddings for recommendation system," arXiv preprint arXiv:2308.13050, Aug. 2023. [Online]. Available: https://arxiv.org/abs/2308.13050

X. Li, X. Wang, and H. Liu, “Research on fine-tuning strategy of sentiment analysis model based on BERT,” in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), IEEE, May 2021, pp. 798–802. doi: 10.1109/CISCE52179.2021.9445882. DOI: https://doi.org/10.1109/CISCE52179.2021.9445882

J. Zhang, W. Chang, H. Yu, and I. S. Dhillon, “Fast multi-resolution transformer fine-tuning for extreme multi-label text classification,” in Proc. 35th Conference Neural Information Processing Systems (NeurIPS), 2021. [Online]. Available: http://arxiv.org/abs/2110.00685.

B. Nguyen and S. Ji, “Fine-tuning pretrained language models with label attention for biomedical text classification,” arXiv, arXiv:2108.11809, 2022. [Online]. Available: http://arxiv.org/abs/2108.11809.

J. Mücke, D. Waldow, L. Metzger, P. Schauz, M. Hoffman, N. Lell, and A. Scherp, “Fine-Tuning Language Models for Scientific Writing Support,” in Machine Learning and Knowledge Extraction (CD-MAKE 2023), Benevento, Italy, Aug. 29–Sep. 1, 2023, A. Holzinger et al., Eds., Lecture Notes in Computer Science, vol. 14065. Cham, Switzerland: Springer, 2023, pp. 301–318, doi: 10.1007/978-3-031-40837-3_18 DOI: https://doi.org/10.1007/978-3-031-40837-3_18

T. Dhamecha, R. Murthy, S. Bharadwaj, K. Sankaranarayanan, and P. Bhattacharyya, “Role of language relatedness in multilingual fine-tuning of language models: A case study in Indo-Aryan languages,” in Proceedings of the 2021 Con-ference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA: Association for computational lin-guistics, 2021, pp. 8584–8595. doi: 10.18653/v1/2021.emnlp-main.675. DOI: https://doi.org/10.18653/v1/2021.emnlp-main.675

Y. Ma, H. Chen, Q. Wang, and X. Zheng, “Text classification model based on CNN and BiGRU fusion attention mech-anism,” ITM Web of Conferences, vol. 47, no. 02040, 2022, doi: 10.1051/itmconf/20224702040.

D. E. Cahyani and I. Patasik, "Performance comparison of TF-IDF and Word2Vec models for emotion text classifica-tion," Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5, pp. 2878-2886, Oct. 2021, doi: 10.11591/EEI.V10I5.3157. DOI: https://doi.org/10.11591/eei.v10i5.3157

J. Ye and H. Tian, “Learning to research: learning to ranking the similar papers via BERT fine-tuning,” Advances in Engineering Technology Research, vol. 5, no. 1, p. 349, May 2023, doi: 10.56028/aetr.5.1.349.2023. DOI: https://doi.org/10.56028/aetr.5.1.349.2023

P. Gao, J. Zhao, Y. Ma, A. Tanvir, and B. Jin, “HFT-ONLSTM: Hierarchical and Fine-Tuning Multi-label Text Classifica-tion,” arXiv, arXiv:2204.08115, Apr. 2022. [Online]. Available: https://arxiv.org/abs/2204.08115

N. Pal and O. Dahiya, “Analysis of educational recommender system techniques for enhancing student’s learning outcomes,” in 2023 3rd International Conference on Innovative Practices in Technology and Management (ICIPTM), IEEE, Feb. 2023, pp. 1–5. doi: 10.1109/ICIPTM57143.2023.10118132. DOI: https://doi.org/10.1109/ICIPTM57143.2023.10118132

M. Timmi, “Educational Video recommender system,” International Journal of Information and Education Technology, vol. 14, no. 3, pp. 362–371, 2024, doi: 10.18178/ijiet.2024.14.3.2058. DOI: https://doi.org/10.18178/ijiet.2024.14.3.2058

Alicia McNett, “Recommender systems research and theory in higher education: a systematic literature review,” Issues In Information Systems, 2022, doi: 10.48009/3_iis_2022_113. DOI: https://doi.org/10.48009/3_iis_2022_113

G. Muzdybayeva, D. Khashimova, A. Amirzhanov, and S. Kadyrov, “A Matrix factorization-based collaborative filter-ing framework for course recommendations in higher education,” in 2023 17th International Conference on Electronics Computer and Computation (ICECCO), IEEE, Jun. 2023, pp. 1–4. doi: 10.1109/ICECCO58239.2023.10147152. DOI: https://doi.org/10.1109/ICECCO58239.2023.10147152

J. Kim, T. Kim, and B. Yun, “Development and application of an ai-based personalized research-paper recommenda-tion system: an example from k university,” Korean Association for Educational Information and Media, vol. 29, no. 3, pp. 705–730, Sep. 2023, doi: 10.15833/KAFEIAM.29.3.705. DOI: https://doi.org/10.15833/KAFEIAM.29.3.705

A. Zhao and Y. Ma, “Research on recommendation of big data for higher education based on deep learning,” Scientific Programming, vol. 2022, pp. 1–8, May 2022, doi: 10.1155/2022/5448442. DOI: https://doi.org/10.1155/2022/5448442

N. Reimers and I. Gurevych, “Sentence-BERT: sentence embeddings using Siamese BERT-Networks,” International Joint Conference on Natural Language Processing, Aug. 2019. [Online]. Available: http://arxiv.org/abs/1908.10084. DOI: https://doi.org/10.18653/v1/D19-1410

Sentence Transformers, “Pretrained models — sentence transformers documentation.” Accessed: Jun. 07, 2025. [Online]. Available: https://www.sbert.net/docs/sentence_transformer/pretrained_models.html

D. Liao, “sentence embeddings using supervised contrastive learning,” arXiv preprint, Jun. 2021, Accessed: Jun. 03, 2025. [Online]. Available: http://arxiv.org/abs/2106.04791.