Artificial and Natural Topic Detection in Online Social Networks

Sylvio Barbon Jr, Gabriel Marques Tavares, Guilherme Sakaji Kido


Online Social Networks (OSNs), such as Twitter, offer attractive means of social interactions and communications, but also raise privacy and security issues. The OSNs provide valuable information to marketing and competitiveness based on users posts and opinions stored inside a huge volume of data from several themes, topics, and subjects. In order to mining the topics discussed on an OSN we present a novel application of Louvain method for TopicModeling based on communities detection in graphs by modularity. The proposed approach succeeded in finding topics in five different datasets composed of textual content from Twitter and Youtube. Another important contribution achieved was about the presence of texts posted by spammers. In this case, a particular behavior observed by graph community architecture (density and degree) allows the indication of a topic strength and the classification of it as natural or artificial. The later created by the spammers on OSNs.

Texto completo:

PDF (English)


Abu-Nimeh, S., Chen, T. M., and Alzubi, O. (2011). Malicious and spam posts in online social networks. Computer, 44(9):23–28. DOI:10.1109/MC.2011.222 [Google Scholar]

Aggarwal, C. C. (2015). Outlier analysis. In Data Mining, pages 237–263. Springer. DOI:10.1007/978-3-319-14142-8_8

Akilan, A. (2015). Text mining: Challenges and future directions. In Electronics and Communication Systems (ICECS), 2015 2nd International Conference on, pages 1679–1684. IEEE. DOI:10.1109/ECS.2015.7124872 [Google Scholar]

Barbon, S., Igawa, R. A., and Bogaz Zarpelao, B. (2016). Authorship verification applied to detection of compromised accounts on online social networks. Multimedia Tools and Applications, pages 1–21. DOI:10.1007/s11042-016-3899-8 [Google Scholar]

Bhowmick, S. and Srinivasan, S. (2013). A template for parallelizing the louvain method for modularity maximization. In Dynamics On and Of Complex Networks, Volume 2, pages 111–124. Springer. DOI:10.1007/978-1-4614-6729-8_6 [Google Scholar]

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008+. DOI:10.1088/1742-5468/2008/10/P10008 [Google Scholar]

Campigotto, R., Cespedes, P. C., and Guillaume, J.-L. (2014). A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518. [Google Scholar]

Chen, Y., Amiri, H., Li, Z., and Chua, T.-S. (2013). Emerging topic detection for organizations from microblogs. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’13, pages 43–52, New York, NY, USA. ACM. DOI:10.1145/2484028.2484057 [Google Scholar]

Chen, Z. and Liu, B. (2014). Mining topics in documents: Standing on the shoulders of big data. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 1116–1125, New York, NY, USA. ACM. DOI:10.1145/2623330.2623622 [Google Scholar]

Chitra, K. and Subashini, B. (2013). Data mining techniques and its applications in banking sector. International Journal of Emerging Technology and Advanced Engineering, 3(8):219–226. [Google Scholar]

Choi, D., Ko, B., Kim, H., and Kim, P. (2014). Text analysis for detecting terrorismrelated articles on the web. Journal of Network and Computer Applications, 38:16–21. DOI:10.1016/j.jnca.2013.05.007 [Google Scholar]

De Meo, P., Ferrara, E., Fiumara, G., and Provetti, A. (2011). Generalized louvain method for community detection in large networks. In Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on, pages 88–93. IEEE. DOI:10.1109/ISDA.2011.6121636 [Google Scholar]

Gao, H., Hu, J., Huang, T., Wang, J., and Chen, Y. (2011). Security issues in online social networks. IEEE Internet Computing, 15(4):56–63. DOI:10.1109/MIC.2011.50 [Google Scholar]

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42(1-2):177–196. DOI:10.1023/A:1007617005950 [Google Scholar]

Huang, S., Yang, Y., Li, H., and Sun, G. (2014a). Topic detection from microblog based on text clustering and topic model analysis. In Services Computing Conference (APSCC), 2014 Asia-Pacific, pages 88–92. IEEE. DOI:10.1109/APSCC.2014.18 [Google Scholar]

Igawa, R., Sakaji Kido, G., Seixas, J., and Barbon, S. (2014). Adaptive distribution of vocabulary frequencies: A novel estimation suitable for social media corpus. In Intelligent Systems (BRACIS), 2014 Brazilian Conference on, pages 282–287. DOI:10.1109/BRACIS.2014.58 [Google Scholar]

Igawa, R. A., Barbon Jr, S., Paulo, K. C. S., Kido, G. S., Guido, R. C., Junior, M. L. P., and da Silva, I. N. (2016). Account classification in online social networks with lbca and wavelets. Information Sciences, 332:72–83. DOI:10.1016/j.ins.2015.10.039 [Google Scholar]

Igawa, R. A., de Almeida, A. M. G., Zarpelao, B. B., and Barbon Jr, S. (2015). Recognition of compromised accounts on twitter. In Proceedings of the Annual Conference on Brazilian Symposium on Information Systems: Information Systems: A Computer Socio-Technical Perspective, volume 1, pages 9–14. [BDBComp] [Google Scholar]

Jin, L., Chen, Y., Wang, T., Hui, P., and Vasilakos, A. V. (2013). Understanding user behavior in online social networks: A survey. IEEE Communications Magazine, 51(9):144–150. DOI:10.1109/MCOM.2013.6588663 [Google Scholar]

Kido, G. S., Igawa, R. A., and Barbon Jr, S. (2016). Topic modeling based on louvain method in online social networks. In XII Brazilian Symposium on Information Systems - Information Systems in the Cloud Computing Era, pages 353–360. [BDBComp] [Google Scholar]

Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3):259–284. DOI:10.1080/01638539809545028 [Google Scholar]

Li, H., Yan, J., Weihong, H., and Zhaoyun, D. (2014). Mining user interest in microblogs with a user-topic model. Communications, China, 11(8):131–144. DOI:10.1109/CC.2014.6911095 [Google Scholar]

Newman, M. E. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical review E, 69(2):026113. DOI:10.1103/PhysRevE.69.026113 [Google Scholar]

Papadopoulos, S., Kompatsiaris, Y., Vakali, A., and Spyridonos, P. (2012). Community detection in social media. Data Mining and Knowledge Discovery, 24(3):515–554. DOI:10.1007/s10618-011-0224-z [Google Scholar]

Powers, D. M. (1998). Applications and explanations of zipf’s law. In Proceedings of the joint conferences on new methods in language processing and computational natural language learning, pages 151–160. Association for Computational Linguistics. [Google Scholar]

Roth, B., Barth, T., Wiegand, M., and Klakow, D. (2013). A survey of noise reduction methods for distant supervision. In Proceedings of the 2013 workshop on Automated knowledge base construction, pages 73–78. ACM. DOI:10.1145/2509558.2509571 [Google Scholar]

Tan, S., Li, Y., Sun, H., Guan, Z., Yan, X., Bu, J., Chen, C., and He, X. (2014). Interpreting the public sentiment variations on twitter. Knowledge and Data Engineering, IEEE Transactions on, 26(5):1158–1170. DOI:10.1109/TKDE.2013.116 [Google Scholar]

Tang, L., Wang, X., and Liu, H. (2012). Community detection via heterogeneous interaction analysis. Data Mining and Knowledge Discovery, 25(1):1–33. DOI:10.1007/s10618-011-0231-0 [Google Scholar]

Thelwall, M., Buckley, K., and Paltoglou, G. (2012). Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology, 63(1):163–173. DOI:10.1002/asi.21662 [Google Scholar]

Traag, V. A. (2015). Faster unfolding of communities: Speeding up the louvain algorithm. Physical Review E, 92(3):032801. DOI:10.1103/PhysRevE.92.032801 [Google Scholar]

Tsai, F. S. (2011). A tag-topic model for blog mining. Expert Systems with Applications, 38(5):5330 – 5335. DOI:10.1016/j.eswa.2010.10.025 [Google Scholar]

Vanetti, M., Binaghi, E., Ferrari, E., Carminati, B., and Carullo, M. (2013). A system to filter unwanted messages from osn user walls. IEEE Transactions on Knowledge and data Engineering, 25(2):285–297. DOI:10.1109/TKDE.2011.230 [Google Scholar]

Verma, V., Ranjan, M., and Mishra, P. (2015). Text mining and information professionals: Role, issues and challenges. In Emerging Trends and Technologies in Libraries and Information Services (ETTLIS), 2015 4th International Symposium on, pages 133–137. DOI:10.1109/ETTLIS.2015.7048186 [Google Scholar]

Zappavigna, M. (2011). Ambient affiliation: A linguistic perspective on twitter. New media & society, 13(5):788–806. DOI:10.1177/1461444810385097 [Google Scholar]

Zeng, J., Duan, J., Cao, W., and Wu, C. (2012). Topics modeling based on selective zipf distribution. Expert Systems with Applications, 39(7):6541 – 6546. DOI:10.1016/j.eswa.2011.12.051 [Google Scholar]

Zhang, C., Sun, J., Zhu, X., and Fang, Y. (2010). Privacy and security for online social networks: challenges and opportunities. IEEE Network, 24(4):13–18. DOI:10.1109/MNET.2010.5510913 [Google Scholar]

iSys - Revista Brasileira de Sistemas de Informação - CESI/SBC
ISSN Eletrônico: 1984-2902