Evaluation of Data Mining Features, Features Taxonomies and their Applications

https://doi.org/10.24017/science.2017.3.3

Abstract views: 1129 / PDF downloads: 861

Authors

  • Shirin Noekhah Faculty of Computing, Universiti Teknologi of Malaysia, UTM, 81300, Johor, Malaysia
  • Naomie binti Salim Faculty of Computing, Universiti Teknologi of Malaysia, UTM, 81300, Johor, Malaysia
  • Nor Hawaniah Zakaria Faculty of Computing, Universiti Teknologi of Malaysia, UTM, 81300, Johor, Malaysia

Abstract

The World Wide Web has brought an enormous improvement in the lives of people, during the last couple of decades. E-commerce is a new area arisen during this evolutionary period and has changed the traditional trading approaches for selling products and services. It uses different techniques to discover a market trend and analyze the competitor’s activities by exploiting reviews’ information. On the other hand, potential customers, also, use the online opinion to make their purchase decision. Opinion mining and sentiment analysis are the most critical and fundamental domains of data mining which can be useful for variety its sub-domains such as opinion summarization, recommendation system and opinion spam detection.  Opinion mining and all its sub-branches can be performed efficiently when there is a comprehensive understanding of the most effective features applied in those domains. To achieve the best results, we need to use the most proper set of features for different case studies in order to classification or clustering. To the best of our knowledge, there is no extensive study and taxonomy of variety range of features and their applications in opinion mining. In this paper, we do comprehensive investigation on various types of features exploited in variety sub-branches of opinion mining domain. We present the most frequent features’ sets including structural, linguistic and relation-based features as a complete reference for further opinion mining research. The results proved that using multiple types of features improve the accuracy of opinion mining applications.

Keywords:

Opinion mining, Feature selection, Opinion spam, Recommendation system, Meta-data and content-based Features

References

[1] NN. Ho-Dac, SJ. Carson, and WL. Moore, The effects of positive and negative online customer reviews: do brand strength and category maturity matter?, Journal of Marketing, pp.37-53, 2013.
https://doi.org/10.1509/jm.11.0011
[2] F. Zhu and X. Zhang, Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics, Journal of marketing, pp.133-148, 2010.
https://doi.org/10.1509/jmkg.74.2.133
[3] JW. Pennebaker and King LA, Linguistic styles: language use as an individual difference, Journal of personality and social psychology, 1999.
https://doi.org/10.1037/0022-3514.77.6.1296
[4] D.Shapiro, Psychotherapy of neurotic character, Basic Books, 1999.
[5] J. Evelyn, Online shopping-Unabridged Guide, Emereo Publishing, 2012.
[6] SP. Algur, AP. Patil, PS. Hiremath and S. Shivashankar, Conceptual level similarity measure based review spam detection, In Signal and Image Processing (ICSIP), International Conference, pp. 416-423, 2010.
https://doi.org/10.1109/ICSIP.2010.5697509
[7] A. McCallum and K. Bow, A toolkit for statistical language modeling, text retrieval, classification and clustering, 1998.
[8] M.F. Porter, An algorithm for suffix stripping, In Program, volume 14, pp. 130-137, 1980.
https://doi.org/10.1108/eb046814
[9] K. Dave, S. Lawrence and DM. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, In Proceedings of the 12th international conference on World Wide Web, ACM, pp. 519-528, 2003.
https://doi.org/10.1145/775152.775226
[10] N. Jindal and B. Liu, Analyzing and detecting review spam, In Data Mining, ICDM, Seventh IEEE International Conference, pp. 547-552, 2007.
https://doi.org/10.1109/ICDM.2007.68
[11] G. Wang, S. Xie, B. Liu and SY, Philip, Review graph based online store review spammer detection, In Data mining (icdm), IEEE 11th international conference, pp. 1242-1247, 2011.
https://doi.org/10.1109/ICDM.2011.124
[12] A. Ghose, PG. Ipeirotis and A. Sundararajan, Opinion mining using econometrics: A case study on reputation systems, In annual meeting-association for computational linguistics, p. 416, 2007.
[13] L. Akoglu, R. Chandy and C. Faloutsos, Opinion Fraud Detection in Online Reviews by Network Effects, ICWSM, 2013.
[14] A. A. Hammad and A. El-Halees, An approach for detecting spam in arabic opinion reviews, International Arab Journal of Information Technology, vol. 12, no. 1, pp. 10-16, 2015.
[15] J. D'onfro and A Whopping, 20% Of Yelp Reviews Are Fake, http://read.bi/1M03jxl, 2013.
[16] YR. Chen and HH. Chen, Opinion spam detection in web forum: a real case study, In Proceedings of the 24th International Conference on World Wide Web, pp. 173-183, 2015.
https://doi.org/10.1145/2736277.2741085
[17] N. Jindal and B. Liu, Review spam detection, In Proceedings of the 16th international conference on World Wide Web, pp. 1189-1190, 2007.
https://doi.org/10.1145/1242572.1242759
[18] J. Li, M. Ott, C. Cardie and EH. Hovy, Towards a General Rule for Identifying Deceptive Opinion Spam, pp. 1566-1576, 2014.
https://doi.org/10.3115/v1/P14-1147
[19] YR. Chen and HH. Chen, Opinion spammer detection in web forum, In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 759-762, 2015.
https://doi.org/10.1145/2766462.2767766
[20] F. Li, M. Huang, Y. Yang and X. Zhu, Learning to identify review spam, In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p. 2488, 2011.
[21] KH. Yoo and U. Gretzel, Comparison of deceptive and truthful travel reviews, Information and communication technologies in tourism, pp. 37-47, 2009.
https://doi.org/10.1007/978-3-211-93971-0_4
[22] ML. Newman, JW. Pennebaker, DS. Berry and JM. Richards. Lying words: Predicting deception from linguistic styles. Personality and social psychology bulletin, pp. 665-75, 2003.
https://doi.org/10.1177/0146167203029005010
[23] Y. Lu, L. Zhang, Y. Xiao and Y. Li, Simultaneously detecting fake reviews and review spammers using factor graph model, In Proceedings of the 5th annual ACM web science conference, pp. 225-233, 2013.
https://doi.org/10.1145/2464464.2464470
[24] JG. Thanikkal, M. Danish, JG. Thanikkal and M. Danish, A novel approach to improve spam detection using SDS algorithm, International Journal, 2015.
[25] A. Mukherjee, V. Venkataraman, B. Liu and NS. Glance, What yelp fake review filter might be doing?, In ICWSM, 2013.
[26] S.-M. Kim, P. Pantel, T. Chklovski and M. Pennacchiotti, Automatically assessing review helpfulness, In EMNLP, 2006.
https://doi.org/10.3115/1610075.1610135
[27] EP. Lim, VA. Nguyen, N. Jindal, B. Liu and HW. Lauw, Detecting product review spammers using rating behaviors, In Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 939-948, 2010.
https://doi.org/10.1145/1871437.1871557
[28] A-M. Popescu and O. Etzioni, Extracting Product Features and Opinions from Reviews. EMNLP-05, 2005.
https://doi.org/10.3115/1220575.1220618
[29] M. Ott, Y. Choi, C. Cardie and JT. Hancock, Finding deceptive opinion spam by any stretch of the imagination, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 309-319, 2011.
[30] H. Sun, A. Morales and X. Yan, Synthetic review spamming and defense, In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1088-1096, 2013.
https://doi.org/10.1145/2487575.2487688
[31] A. Mukherjee, B. Liu and N. Glance, Spotting fake reviewer groups in consumer reviews, In Proceedings of the 21st international conference on World Wide Web, pp. 191-200, 2012.
https://doi.org/10.1145/2187836.2187863
[32] Z. Zhang and B. Varadarajan, Utility scoring of product reviews, In Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 51-57, 2006.
https://doi.org/10.1145/1183614.1183626
[33] H. Li, Z. Chen, B. Liu, X. Wei and J. Shao, Spotting fake reviews via collective PU learning. In ICDM, 2014.
https://doi.org/10.1109/ICDM.2014.47
[34] A. Mukherjee and V. Venkataraman, Opinion spam detection: An unsupervised approach using generative models, Technical Report. UH, 2014.
[35] T. Wang and H. Zhu, Voting for Deceptive Opinion Spam Detection, arXiv preprint arXiv:1409.4504, 2014.
[36] G. Fei, A. Mukherjee, B. Liu M. Hsu, M. Castellanos and R. Ghosh, Exploiting Burstiness in Reviews for Review Spammer Detection, ICWSM, 2013.
[37] A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanos and R. Ghosh, Spotting opinion spammers using behavioral footprints. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 632-640, 2013.
https://doi.org/10.1145/2487575.2487580
[38] Y. Xu, B. Shi, W. Tian and W. Lam, A unified model for unsupervised opinion spamming detection incorporating text generality, In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[39] Y. Lin, T. Zhu, X. Wang, J. Zhang and A. Zhou, Towards online review spam detection, In Proceedings of the 23rd International Conference on World Wide Web, pp. 341-342, 2014.
https://doi.org/10.1145/2567948.2577293
[40] Y. Lin, T. Zhu, H. Wu, J. Zhang, X. Wang and A. Zhou, Towards online anti-opinion spam: Spotting fake reviews from the review sequence, In Advances in Social Networks Analysis and Mining (ASONAM), IEEE/ACM International Conference, pp. 261-264, 2014.
https://doi.org/10.1109/ASONAM.2014.6921594
[41] M. Ott, C. Cardie and J. Hancock, Estimating the prevalence of deception in online review communities, In Proceedings of the 21st international conference on World Wide Web, pp. 201-210, 2012.
https://doi.org/10.1145/2187836.2187864
[42] N. Jindal and B. Liu, Opinion spam and analysis, In Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219-230, 2008.
https://doi.org/10.1145/1341531.1341560
[43] M. Hu and B. Liu, Mining and summarizing customer reviews, KDD'2004.
https://doi.org/10.1145/1014052.1014073
[44] X. Ding, B. Liu and PS. Yu, A holistic lexicon-based approach to opinion mining, In Proceedings of the 2008 international conference on web search and data mining, pp. 231-240, 2008.
https://doi.org/10.1145/1341531.1341561
[45] R. Patel and P. Thakkar, Opinion spam detection using feature selection, In Computational Intelligence and Communication Networks (CICN), International Conference, pp. 560-564, 2014.
https://doi.org/10.1109/CICN.2014.127
[46] S. Xie, G. Wang, S. Lin and PS. Yu, Review spam detection via temporal pattern discovery, In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 823-83, 2012.
https://doi.org/10.1145/2339530.2339662
[47] C. Dellarocas, Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior, In ACM EC, 2000.
https://doi.org/10.1145/352871.352889
[48] G. Wu, D. Greene, B. Smyth and P. Cunningham, Distortion as a validation criterion in the identification of suspicious reviews, Technical Report UCD-CSI-2010-04, University College Dublin, 2010.
https://doi.org/10.1145/1964858.1964860
[49] A. Mukherjee, B. Liu, J. Wang, N. Glance and N. Jindal, Detecting group review spam, In Proceedings of the 20th international conference companion on World Wide Web, pp. 93-94, 2011.
https://doi.org/10.1145/1963192.1963240
[50] M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian Approach to Filtering Junk {E}-Mail, AAAI Technical Report WS-98-05, 1998.
[51] H. Li, Z. Chen, A. Mukherjee, B. Liu and J. Shao, Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns, In ICWSM, pp. 634-637, 2015.
https://doi.org/10.1609/icwsm.v9i1.14652
[52] PD. Turney, Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews, In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 417-424, 2002 .
https://doi.org/10.3115/1073083.1073153
[53] K. Costa, P. Ribeiro, A. Camargo, V. Rossi, H. Martins, M. Neves and JP. Papa, Comparison of the Intelligent Techniques for Data Mining in Spam Detection to Computer Networks, 2014.
https://doi.org/10.1109/INTECH.2013.6653725
[54] G. Piatetsky-Shapiro, Advances in knowledge discovery and data mining, AAAI press, 1996.
[55] CR. Narendran, Data Mining-Classification Algorithm-Evaluation, 2009.
[56] LW. Ku, HW. Ho, HH. Chen, Opinion mining and relationship discovery using CopeOpi opinion analysis system, Journal of the Association for Information Science and Technology, 2009.
https://doi.org/10.1002/asi.21067

Downloads

How to Cite

[1]
S. Noekhah, N. binti Salim, and N. H. Zakaria, “Evaluation of Data Mining Features, Features Taxonomies and their Applications”, KJAR, vol. 2, no. 3, pp. 131–141, Aug. 2017, doi: 10.24017/science.2017.3.3.

Article Metrics

Published

27-08-2017

Issue

Section

Pure and Applied Science