•  
  •  
 

Keywords

Cross-language retrieval, Cross-Language Information Retrieval, google translate API, tourism news, search engine

Document Type

Article

Abstract

Cross-Language Information Retrieval (CLIR) is responsible for retrieving information stored in a language different from the language of the query provided by the user. Some translation methods commonly used in CLIR are Dictionary, Parallel corpora, Comparable corpora, Machine translator, Ontology, and Transitive-based. The query must be translated to the target language, followed by preprocessing and calculating the similarity between the query and all documents in the corpus. The problem is the time and accuracy of query translation. Moreover, the queries are not written as complete sentences according to certain language rules. Stemming, for example, every language has its own method. Indonesian has basic words and affixes in the form of prefixes, suffixes, infixes, and confixes, while English only has suffixes. Stemming takes a long time in text processing. In the Indonesian search engine (SEBI), the provision of cross-language tourism news retrieval is realized using the Google Translate API, which translates the Query and all documents into English, Porter's stemming technique to convert each term to its general form, and cosine similarity to calculate similarity. This approach can deliver cross-language tourism news instantly while increasing the precision and efficiency of the SEBI search engine, although some improvements are needed to provide a more precise and efficient similarity computation.

First Page

113

Last Page

120

Page Range

113-120

Issue

1

Volume

8

Digital Object Identifier (DOI)

10.21831/elinvo.v8i1.55851

Source

https://journal.uny.ac.id/index.php/elinvo/article/view/55851

References

K. Kayode and E. Ayetiran, "Survey on cross-lingual information retrieval," Int. J. Sci. Eng. Res, vol. 9, pp. 484-491, 2018.

S. Vaishnavi, "Survey on Variants of Cross-Language Information Retrieval," Int. J. Recent Innov. Trends Comput. Commun., vol. 6, no. 1, pp. 167-170, 2018.

P. Bajpai, P. Verma, and S. Q. Abbas, "English-Hindi Cross Language Information Retrieval System: Query Perspective.," J. Comput. Sci., vol. 14, no. 5, pp. 705-713, 2018.

J. A. Hugh, E. Williams, and S. M. M. Tahaghoghi, "Stemming Indonesian language," in 28th Australasian Computer Science Conference(ACSC2005), Conferences in Research and Practice in Information Technology, 2005, vol. 38, pp. 1-8.

P. M. Prihatini, I. K. G. D. Putra, I. A. D. Giriantari, and M. Sudarma, "Stemming Algorithm for Indonesian Digital News Text Processing," Int. J. Eng. Emerg. Technol., vol. 2, no. 2, pp. 1-7, 2018.

R. K. Hapsari and Y. J. Santoso, "Stemming Artikel Berbahasa Indonesia Dengan Pendekatan Confix-Stripping," in Prosiding Seminar Nasional Manajemen Teknologi XXII, 2015, pp. 1-8.

D. O. Baskoro, H. Malik, and M. H. Anshari, "Porter Stemmer Information Retrieval," Comput. Sci. Gadjah Mada Univ., 2012.

M. Alif, F. Solihin, and H. Husni, "Perbandingan Metode Enhanced Confix Stripping dan Porter Stemmer Untuk Stemming Konten Bahasa Indonesia," 2014.

R. Melita, "Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim)," Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta, 2018.

A. A. Maarif, "Penerapan Algoritma TF-IDF untuk Pencarian Karya Ilmiah," Dok. Karya Ilmiah| Tugas Akhir| Progr. Stud. Tek. Inform. Fak. Ilmu Komputer| Univ. Dian Nuswantoro Semarang, vol. 5, no. 4, 2015.

R. Prasath and S. Sarkar, "Cross-Language Information Retrieval with Incorrect Query Translations," Polibits, no. 54, pp. 33-42, 2016.

S. Napitupulu, "Analyzing Indonesian-English abstracts translation in view of translation errors by Google Translate," Int. J. English Lang. Linguist. Res., vol. 5, no. 2, pp. 15-23, 2017.

H. Husni, I. O. Suzanti, Y. D. Pramudita, S. S. Putro, and L. Heryawan, "Web Service for Search Engine Bahasa Indonesia (SEBI)," in Journal of Physics: Conference Series, 2020, vol. 1569, no. 2, p. 22087.

H. W. A. Kesuma and F. S. Pribadi, "Penerapan Cosine Similarity dalam Aplikasi Kitab Undang-Undang Hukum Dagang (Wetboek Van Koophandle Voor Indonesia)," J. Tek. Elektro, vol. 8, no. 1, pp. 18-20, 2016.

M. Saravanan and K. Sathish, "Tamil to English Cross Lingual Information Retrieval System for Agricultural Domain Using VSM."

P. Bhattacharya, P. Goyal, and S. Sarkar, "Query translation for cross-language information retrieval using multilingual word clusters," in Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), 2016, pp. 152-162.

A. J. Agrawal, "Cross Language Information Retrieval using Selective Documents Technique and Query Expansion," 2018.

J. Vembunarayanan, "Tf-idf and cosine similarity." 2013.

Y. Rajanak, R. Patil, Y.P. Singh, "Language Detection Using Natural Language Processing" in 9th International Conference on Advanced Computing and Communication Systems (ICACCS), 2023

Share

COinS