Automatic Patent Clustering using SOM and Bibliographic Coupling
Abstract
Patents are usually organized in classes generated by the offices responsible for patents protection, to create a useful format to the information retrieval process. The complexity of patent taxonomies is a challenge for the automation of patent classification. Beside this, the high numbers of subgroups makes the classification in deeper levels more difficult. This work proposes a method to cluster patents using Self Organizing Maps (SOM) networks and bibliographic coupling. To validate the proposed method, an empirical experiment used a patent database from a specific classification system. The obtained results show that patents clusters were successfully identified by SOM through their cited references, and that SOM results were similar to k-Means algorithm results to perform this task. This study can contribute to the development of the knowledge organization systems by evaluating the use of citation analysis in the automatic clustering of patents in a constrained knowledge domain, at the subgroup level of current patent classification systems.Downloads
References
Baeza-Yates, R. and Ribeiro-Neto, B. (2011). Modern information retrieval. (2nd.ed.). England: Pearson.
Borgman, C. L. and Furner, J. (2002). Scholarly communication and bibliometrics, Annual Review of Information Science and Technology, 36 (1), 2-72. [Google Scholar]
Chakrabarti, A. K; Dror, I. and Eakabuse, N. (1993). Interorganizational transfer of knowledge: an analysis of patent citations of a defense firm, IEEE Transactions on Engineering Management, 40 (1), 91-94. DOI:10.1109/PICMET.1991.183703
Croft, W. B., Metzler, D. and Strohman, T. (2010). Search Engines: Information Retrieval in Practice. Boston: Addison Wesley.
Engelsman, E. C. and Van Raan, A. F. J. (1994). A patent-based cartography of technology, Research Policy, 23(1), 1-26. DOI:10.1016/0048-7333(94)90024-8
Hall, B. H., Jaffe, A. B. and Trajtenberg, M. (2002). The NBER patent citations data file: lessons, insights and methodological tools. In A. B. Jaffe and M. Trajtenberg (Eds.), Patents, citations & innovations (pp. 403-459). Cambridge, MA, London:MIT Presss.
Haykin, S. (1994). Neural Networks: a comprehensive foundation. New Jersey: Prentice Hall.
He, Y. and Hui, S. C. (2001). PubSearch: a web citation-based retrieval system. Library hi tech, 19, 274-285. [Google Scholar]
Hjorland, B. (2002). Domain analysis in information science: eleven approaches – traditional as well as innovative, Journal of Documentation, 58, 422-462. [Google Scholar]
Jacob, E. (2004). Classification and categorization: a difference that makes a difference, Library Trends, 52(3), 515-540. [Google Scholar]
Kukolj, D. et al. (2012). Comparison of Algorithms for Patent Documents Clusterization. In: MIPRO Proceedings of the 35th International Convention, Opatija, Croatia, 995-997. [Google Scholar]
Lai, K-K. and Wu, S-J. (2005). Using the patent co-citation approach to establish a new patent classification system, Information Processing & Management: an International Journal, 41(2), 313-330. DOI: 10.1016/j.ipm.2003.11.004 [Google Scholar]
Li, X., Chen, H., Zhang, Z. and Li, J. (2007). Automatic patent classification using citation network information: an experimental study in nanotechnology, In: Proceedings of the seventh ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’07). Vancouver, Canada. DOI: 10.1145/1255175.1255262 [Google Scholar]
Liu, D-R. and Shih, M-J. (2011). Hybrid-patent classification based on patent-network analysis, Journal of the American Society for Information Science and Technology, 62(2), 246-256. DOI: 10.1002/asi.21459 [Google Scholar]
Meireles, M. R. G., Cendón, B. V. and Almeida, P. E. M. (2014). Bibliometric Knowledge Organization: A Domain Analytic Method Using Artificial Neural Networks, Knowledge Organization, 41(2), 145-159. [Google Scholar]
Meireles, M. R. G., Ferraro, G and Shlomo, G. (2016). Classification and information management for patent collections: a literature review and some research questions, Information Research, 21(1). [Google Scholar]
Morris, S. A., Wu, Z. and Yen, G. (2001). A SOM mapping technique for visualizing documents in a database. In: Proceedings of the International Joint Conference on Neural Network, Washington, D. C., 1914-1919. DOI: 10.1109/IJCNN.2001.938456 [Google Scholar]
Pfitzner D., Leibbrandt R. and Powers D. (2009). Characterization and evaluation of similarity measures for pairs of clusterings, Knowledge and Information Systems, 19, 361-394. DOI: 10.1007/s10115-008-0150-6 [Google Scholar]
Sapsalis, E., Van Pottelsberghe de la Potterie, B. and Navon, R. (2006). Academic versus industry patenting: an in-depth analysis of what determines patent value, Research Policy, 35 (10), 1631-1645. DOI:10.1016/j.respol.2006.09.014 [Google Scholar]
Smith, H. (2002). Automation of patent classification, World Patent Information, 24(4), 269-271. DOI:10.1016/S0172-2190(02)00067-4 [Google Scholar]
Tikk. D., Biró, G. and Törcsvári, A. (2008). A hierarchical Online Classifier for Patent Categorization, 244-267. [Google Scholar]
Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations, The Rand Journal of Economics, 21(1), 172-187. [Google Scholar]
Widodo, A. and Budi I. (2011). Clustering Patent Document in the Field of ICT (Information & Communication Technology). In: International Conference on Semantic Technology and Information Retrieval, Putrajaya, Malaysia, 203-208. DOI:10.1109/STAIR.2011.5995789 [Google Scholar]