NAKNN: An Efficient Classification of Indonesian News Texts with Nazief-Adriani and KNN

Basirudin Ansor(1*), Aditya Putra Ramdani(2), Nova Christina Sari(3), Muhammad Zainudin Al Amin(4), Achmad Solichan(5), Kilala Mahadewi(6)


(1) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(2) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(3) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(4) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(5) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(6) Universitas Muhammadiyah Semarang, Semarang, Indonesia
(*) Corresponding Author

Abstract


Internet usage in Indonesia has seen a significant increase, reaching 215.63 million users in 2022-2023, or 78.19% of the population. With the ease of internet access, digital news portals like Narasi TV have become a primary source of information for many people. However, the large number of news articles makes manual categorizing challenging. This study aims to classify Indonesian-language news documents from Narasi TV using the Nazief-Adriani algorithm for stemming and the K-Nearest Neighbor (KNN) method for classification. The text mining process begins with preprocessing, which includes case folding, tokenizing, stop-word filtering, and stemming. Using a dataset of 500 news documents, the study demonstrated that with a 90:10 data split, the average accuracy reached 93%, with the highest value being 100%. For the 80:20 data split, the average accuracy was 89%, with the highest value being 93%, and for a 70:30 data split, the average accuracy was 87%, with the highest value being 89%. In conclusion, the combination of the Nazief-Adriani algorithm and the KNN method with optimal k selection and random states obtained high accuracy, obtaining an average accuracy of 93%) in classifying Indonesian-language news documents. These results demonstrate the significant potential of text mining and classification techniques to manage digital news.

Keywords


Text Mining, News Classification; Nazief-Adriani; K-Nearest Neighbor;

Full Text:

PDF

References


Ahmed, J., & Ahmed, M. (2021). Online News Classification Using Machine Learning Techniques. IIUM Engineering Journal, 22(2), 210–225. https://doi.org/10.31436/iiumej.v22i2.1662

Apriani, H., Jaman, J. H., & Adam, R. I. (2022). Optimasi SVM menggunakan algoritme grid search untuk identifikasi citra biji kopi robusta berdasarkan circularity dan eccentricity. Jurnal Teknologi Dan Sistem Komputer.

Herlingga, A. C., Prismana, I. P. E., Prehanto, D. R., & Dermawan, D. A. (2020). Algoritma Stemming Nazief & Adriani dengan Metode Cosine Similarity untuk Chatbot Telegram Terintegrasi dengan E-layanan. Journal of Informatics and Computer Science (JINACS), 2(01), 19–26. https://doi.org/10.26740/jinacs.v2n01.p19-26

IndonesiaBaik.id. (2023). Pengguna Internet Indonesia Paling banyak Usia Berapa? Retrieved February 6, 2024, from Indonesia.go.id website: https://indonesia.go.id/mediapublik/detail/2093#:~:text=Berdasarkan hasil survei Asosiasi Penyelenggara,orang pada periode 2022-2023.

Krisdamarjati, Y. A. (2022). Membaca Menjadi Pilihan Mengakses Berita Digital. Retrieved February 6, 2024, from kompas.id website: https://www.kompas.id/baca/riset/2022/07/07/membaca-menjadi-pilihan-mengakses-berita-digital

Lakonawa, K. N., Mola, S. A. S., & Fanggidae, A. (2021). Nazief-Adriani Stemmer Dengan Imbuhan Tak Baku Pada Normalisasi Bahasa Percakapan Di Media Sosial. Jurnal Komputer Dan Informatika, 9(1), 65–73. https://doi.org/10.35508/jicon.v9i1.3749

Narasi People. (2024). TENTANG NARASI. Retrieved February 16, 2024, from Narasi.tv website: https://narasi.tv/about-us?menu=informasi-perusahaan

Rozi, F., Sukmana, F., & Adani, M. N. (2021). Pengelompokkan Judul Buku dengan Menggunakan Algoritma K-Nearest Neighbor (K-NN) dan Term Frequency – Inverse Document Frequency (TF-IDF). JIMP: Jurnal Informatika Merdeka Pasuruan, 6(3), 1–5.

Septian, G., Susanto, A., & Shidik, G. F. (2017). Indonesian news classification based on NaBaNA. Proceedings - 2017 International Seminar on Application for Technology of Information and Communication: Empowering Technology for a Better Human Life, ISemantic 2017, 2018-Janua, 175–180. https://doi.org/10.1109/ISEMANTIC.2017.8251865

Sinaga, A., & Nainggolan, S. P. (2023). Analisis Perbandingan Akurasi Dan Waktu Proses Algoritma Stemming Arifin-Setiono Dan Nazief-Adriani Pada Dokumen Teks Bahasa Indonesia. Sebatik, 27(1), 63–69. https://doi.org/10.46984/sebatik.v27i1.2072

Sudrajat, A., Wulandari, R. R., & Syafwan, E. (2022). Indonesian Language Hoax News Classification Basedn on Naïve Bayes. Journal of Applied Intelligent System, 7(1), 70–79. https://doi.org/10.33633/jais.v7i1.5985

Tejawati, A., Septiarini, A., Rismawati, R., & Puspitasari, N. (2023). Perbandingan Metode K-Nearest Neighbor dan Naive Bayes untuk Klasifikasi Konten Berita. Jurnal Teknik Informatika (Jutif), 4(2), 401–412.

Wahyudi, D., Susyanto, T., Nugroho, D., Studi Teknik Informatika, P., Sinar Nusantara Surakarta, S., & Studi Sistem Informasi, P. (2017). Implementasi dan Analisis Algoritma Stemming Nazief & Adriani dan Porter pada Dokumen Berbahasa Indonesia. Jurnal Ilmiah SINUS STIMIK Sinar Nusantara Surakarta, 15(2).


Article Metrics

Abstract view : 111 times
PDF - 6 times

DOI: https://doi.org/10.26714/jichi.v5i2.15420

Refbacks

  • There are currently no refbacks.


____________________________________________________________________________
Journal of Intelligent Computing and Health Informatics (JICHI)
ISSN 2715-6923 (print) | 2721-9186 (online)
Organized by
Department of Informatics
Faculty of Engineering
Universitas Muhammadiyah Semarang

W : https://jurnal.unimus.ac.id/index.php/ICHI
E : jichi.informatika@unimus.ac.id, ahmadilham@unimus.ac.id

View My Stats