Cross-Language Tourism News Retrieval System Using Google Translate API on SEBI Search Engine

H. Husni, University of Trunojoyo Madura, Indonesia
Arif Muntasa, University of Trunojoyo Madura
Sigit Susanto Putro, University of Trunojoyo Madura, Indonesia
Zulfi Osman, University of Trunojoyo Madura, Indonesia

Abstract


Cross-Language Information Retrieval (CLIR) is responsible for retrieving information stored in a language different from the language of the query provided by the user. Some translation methods commonly used in CLIR are Dictionary, Parallel corpora, Comparable corpora, Machine translator, Ontology, and Transitive-based. The query must be translated to the target language, followed by preprocessing and calculating the similarity between the query and all documents in the corpus. The problem is the time and accuracy of query translation. Moreover, the queries are not written as complete sentences according to certain language rules. Stemming, for example, every language has its own method. Indonesian has basic words and affixes in the form of prefixes, suffixes, infixes, and confixes, while English only has suffixes. Stemming takes a long time in text processing. In the Indonesian search engine (SEBI), the provision of cross-language tourism news retrieval is realized using the Google Translate API, which translates the Query and all documents into English, Porter's stemming technique to convert each term to its general form, and cosine similarity to calculate similarity. This approach can deliver cross-language tourism news instantly while increasing the precision and efficiency of the SEBI search engine, although some improvements are needed to provide a more precise and efficient similarity computation.


Keywords


Cross-language retrieval; Cross-Language Information Retrieval; google translate API; tourism news; search engine

Full Text:

PDF

References


K. Kayode and E. Ayetiran, “Survey on cross-lingual information retrieval,” Int. J. Sci. Eng. Res, vol. 9, pp. 484–491, 2018.

S. Vaishnavi, “Survey on Variants of Cross-Language Information Retrieval,” Int. J. Recent Innov. Trends Comput. Commun., vol. 6, no. 1, pp. 167–170, 2018.

P. Bajpai, P. Verma, and S. Q. Abbas, “English-Hindi Cross Language Information Retrieval System: Query Perspective.,” J. Comput. Sci., vol. 14, no. 5, pp. 705–713, 2018.

J. A. Hugh, E. Williams, and S. M. M. Tahaghoghi, “Stemming Indonesian language,” in 28th Australasian Computer Science Conference(ACSC2005), Conferences in Research and Practice in Information Technology, 2005, vol. 38, pp. 1–8.

P. M. Prihatini, I. K. G. D. Putra, I. A. D. Giriantari, and M. Sudarma, “Stemming Algorithm for Indonesian Digital News Text Processing,” Int. J. Eng. Emerg. Technol., vol. 2, no. 2, pp. 1–7, 2018.

R. K. Hapsari and Y. J. Santoso, “Stemming Artikel Berbahasa Indonesia Dengan Pendekatan Confix-Stripping,” in Prosiding Seminar Nasional Manajemen Teknologi XXII, 2015, pp. 1–8.

D. O. Baskoro, H. Malik, and M. H. Anshari, “Porter Stemmer Information Retrieval,” Comput. Sci. Gadjah Mada Univ., 2012.

M. Alif, F. Solihin, and H. Husni, “Perbandingan Metode Enhanced Confix Stripping dan Porter Stemmer Untuk Stemming Konten Bahasa Indonesia,” 2014.

R. Melita, “Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim),” Fakultas Sains dan Teknologi UIN Syarif Hidayatullah Jakarta, 2018.

A. A. Maarif, “Penerapan Algoritma TF-IDF untuk Pencarian Karya Ilmiah,” Dok. Karya Ilmiah| Tugas Akhir| Progr. Stud. Tek. Inform. Fak. Ilmu Komputer| Univ. Dian Nuswantoro Semarang, vol. 5, no. 4, 2015.

R. Prasath and S. Sarkar, “Cross-Language Information Retrieval with Incorrect Query Translations,” Polibits, no. 54, pp. 33–42, 2016.

S. Napitupulu, “Analyzing Indonesian-English abstracts translation in view of translation errors by Google Translate,” Int. J. English Lang. Linguist. Res., vol. 5, no. 2, pp. 15–23, 2017.

H. Husni, I. O. Suzanti, Y. D. Pramudita, S. S. Putro, and L. Heryawan, “Web Service for Search Engine Bahasa Indonesia (SEBI),” in Journal of Physics: Conference Series, 2020, vol. 1569, no. 2, p. 22087.

H. W. A. Kesuma and F. S. Pribadi, “Penerapan Cosine Similarity dalam Aplikasi Kitab Undang-Undang Hukum Dagang (Wetboek Van Koophandle Voor Indonesia),” J. Tek. Elektro, vol. 8, no. 1, pp. 18–20, 2016.

M. Saravanan and K. Sathish, “Tamil to English Cross Lingual Information Retrieval System for Agricultural Domain Using VSM.”

P. Bhattacharya, P. Goyal, and S. Sarkar, “Query translation for cross-language information retrieval using multilingual word clusters,” in Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016), 2016, pp. 152–162.

A. J. Agrawal, “Cross Language Information Retrieval using Selective Documents Technique and Query Expansion,” 2018.

J. Vembunarayanan, “Tf-idf and cosine similarity.” 2013.

Y. Rajanak, R. Patil, Y.P. Singh, "Language Detection Using Natural Language Processing" in 9th International Conference on Advanced Computing and Communication Systems (ICACCS), 2023




DOI: https://doi.org/10.21831/elinvo.v8i1.55851

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Elinvo (Electronics, Informatics, and Vocational Education)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Our Journal indexed by:

ISSN 2477-2399 (online) || ISSN 2580-6424 (print)

View My Stats

Flag Counter