A Method Based on Naming Similarity to Identify Reuse Opportunities

Johnatan Oliveira, Eduardo Fernandes, Maurício Souza, Eduardo Figueiredo


Software reuse is a development strategy in which existing software components are used to implement new software systems. There are many advantages of applying software reuse, such as minimization of development efforts and improvement of software quality. Few methods have been proposed in the literature for recommendation of reuse opportunities. In this paper, we propose a method for identification and recommendation of reuse opportunities based on the similarity of the names of classes. Our method, called JReuse, computes a similarity function to identify similarly named classes from a set of software systems from a specific domain. The identified classes compose a repository with reuse opportunities. We also present a prototype tool to support the proposed method. We applied our method, through the tool, to 72 software systems mined from GitHub, in 4 different domains: accounting, restaurant, hospital, and e-commerce. In total, these systems have 1,567,337 lines of code, 57,017 methods, and 12,598 classes. As a result, we observe that JReuse is able to identify the main classes that are frequent in each selected domain.

Texto completo:

PDF (English)


Caldiera, G. and Basili, V. R. (1991). Identifying and Qualifying Reusable Software Components. IEEE Computer, 24(2):61–70. DOI:10.1109/2.67210 [Google Scholar]

Cornelissen, B., Zaidman, A., Van Deursen, A., Moonen, L., and Koschke, R. (2009). A Systematic Survey of Program Comprehension through Dynamic Analysis. IEEE Transactions on Software Engineering (TSE), 35(5):684–702. DOI:10.1109/TSE.2009.28 [Google Scholar]

Cybulski, J. and Reed, K. (2000). Requirements Classification and Reuse: Crossing Domain Boundaries. In Proceedings of the 6th International Conference on Software Reuse (ICSR), pages 190–210. DOI:10.1007/978-3-540-44995-9_12 [Google Scholar]

Guo, J. and Luqi (2000). A Survey of Software Reuse Repositories. In Proceedings of the 7th International Conference and Workshops on the Engineering of Computer Based Systems (ECBS), pages 92–100. DOI:10.1109/ECBS.2000.839866 [Google Scholar]

Inoue, K., Yokomori, R., Yamamoto, T., Matsushita, M., and Kusumoto, S. (2005). Ranking Significance of Software Components based on Use Relations. IEEE Transactions on Software Engineering (TSE), 31(3):213–225. DOI:10.1109/TSE.2005.38 [Google Scholar]

Kawaguchi, S., Garg, P., Matsushita, M., and Inoue, K. (2006). MUDABlue: An Automatic Categorization System for Open Source Repositories. Journal of Systems and Software (JSS), 79(7):939-953. DOI:10.1016/j.jss.2005.06.044 [Google Scholar]

Koziolek, H., Goldschmidt, T., Gooijer, T., Domis, D., and Sehestedt, S. (2013). Experiences from Identifying Software Reuse Opportunities by Domain Analysis. In Proceedings of the 17th International Software Product Line Conference (SPLC), pages 208–217. DOI:10.1145/2491627.2491641 [Google Scholar]

Krueger, C. (1992). Software Reuse. ACM Computing Surveys (CSUR), 24(2):131–183. DOI:10.1145/130844.130856 [Google Scholar]

Kuhn, A., Ducasse, S., and Gírba, T. (2007). Semantic Clustering: Identifying Topics in Source Code. Information and Software Technology (IST), 49(3):230–243. DOI:10.1016/j.infsof.2006.10.017 [Google Scholar]

Lee, J., Kang, K. C., and Kim, S. (2004). A Feature-Based Approach to Product Line Production Planning. In Proceedings of the 3rd International Conference on Software Product Lines (SPLC), pages 183–196. DOI:10.1007/978-3-540-28630-1_11 [Google Scholar]

Li, J., Zhang, Z., and Yang, H. (2005). A Grid Oriented Approach to Reusing Legacy Code in ICENI Framework. In Proceedings of the 3rd International Conference on Information Reuse and Integration (IRI), pages 464–469. DOI:10.1109/IRI-05.2005.1506517 [Google Scholar]

Liu, H. and Lu, R. (2008). Word Similarity based on an Ensemble Model using Ranking SVMs. In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pages 283–286. DOI:10.1109/WIIAT.2008.34 [Google Scholar]

Maarek, Y., Berry, D., and Kaiser, G. (1991). An Information Retrieval Approach for Automatically Constructing Software Libraries. IEEE Transactions on Software Engineering (TSE), 17(8):800–813. DOI:10.1109/32.83915 [Google Scholar]

Mende, T., Koschke, R., and Beckwermert, F. (2009). An Evaluation of Code Similarity Identification for the Grow-and-Prune Model. Journal of Software Maintenance and Evolution: Research and Practice, 21(2):143–169. DOI:10.1002/smr.402 [Google Scholar]

Michail, A. and Notkin, D. (1999). Assessing Software Libraries by Browsing Similar Classes, Functions and Relationships. In Proceedings of the 21st International Conference on Software Engineering (ICSE), pages 463–472. DOI:10.1145/302405.302678 [Google Scholar]

Mohagheghi, P. and Conradi, R. (2007). Quality, Productivity and Economic Benefits of Software Reuse: A Review of Industrial Studies. Empirical Software Engineering (ESE), 12(5):471–516. DOI:10.1007/s10664-007-9040-x [Google Scholar]

Monroe, R. and Garlan, D. (1996). Style-Based Reuse for Software Architectures. In Proceedings of the 4th International Conference on Software Reuse (ICSR), pages 84–93. DOI:10.1109/ICSR.1996.496116 [Google Scholar]

Morisio, M., Ezran, M., and Tully, C. (2002). Success and Failure Factors in Software Reuse. IEEE Transactions on Software Engineering (TSE), 28(4):340–357. DOI:10.1109/TSE.2002.995420 [Google Scholar]

Neighbors, J. (1992). The Evolution from Software Components to Domain Analysis. International Journal of Software Engineering and Knowledge Engineering (IJSEKE), 2(3):325–354. DOI:10.1142/S0218194092000166 [Google Scholar]

Oliveira, J. Fernandes, E., Souza, M., and Figueiredo, E. (2016). A Method Based on Naming Similarity to Identify Reuse Opportunities. In Proceedings of the XII Brazilian Symposium on Information Systems (SBSI). [BDBComp] [Google Scholar]

Oliveira, M., Goncalves, E., and Bacili, K. (2007). Automatic Identification of Reusable Software Development Assets: Methodology and Tool. In Proceedings of 5th the International Conference on Information Reuse and Integration (IRI), pages 461–466. DOI:10.1109/IRI.2007.4296663 [Google Scholar]

Pressman, R. (2005). Software Engineering: A Practitioner’s Approach. McGraw-Hill Education. [Google Scholar]

Ramler, R., Moser, M., and Pichler, J. (2016). Automated Static Analysis of Unit Test Code. In Proceedings of the 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 25–28. DOI:10.1109/SANER.2016.102 [Google Scholar]

Ravichandran, T. and Rothenberger, M. (2003). Software Reuse Strategies and Component Markets. Communications of the ACM, 46(8):109–114. DOI:10.1145/859670.859678 [Google Scholar]

Sojer, M. and Henkel, J. (2011). License Risks from Ad Hoc Reuse of Code from the Internet. Communications of the ACM, 54(12):74–81. DOI:10.1145/2043174.2043193 [Google Scholar]

Tian, Y., Lo, D., and Lawall, J. (2014). SEWordSim: Software-specific word similarity database. In Proceedings of the 36th International Conference on Software Engineering (ICSE), pages 568–571. DOI:10.1145/2591062.2591071 [Google Scholar]

Wang, Z., Xu, X., and Zhan, D. (2005). A Survey of Business Component Identification Methods and Related Techniques. International Journal of Information Technology, 2(4):229–238. [Google Scholar]

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., and Wesslén, A. (2012). Experimentation in Software Engineering. Springer Science & Business Media. [Google Scholar]

Ye, Y. and Fischer, G. (2005). Reuse-Conducive Development Environments. Automated Software Engineering (ASE), 12(2):199–235. DOI:10.1007/s10515-005-6206-x [Google Scholar]

Yujian, L. and Bo, L. (2007). A Normalized Levenshtein Distance Metric. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(6):1091–1095. DOI:10.1109/TPAMI.2007.1078 [Google Scholar]

Zhen, Z., Shen, J., and Lu, S. (2008). WCONS: An Ontology Mapping Approach based on Word and Context Similarity. In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pages 334–338. DOI:10.1109/WIIAT.2008.238 [Google Scholar]

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

iSys - Revista Brasileira de Sistemas de Informação - CESI/SBC
ISSN Eletrônico: 1984-2902