REID (Research and Evaluation in Education)


clusterization, K-means, Euclidean distance, national examination, high school

Document Type



This study aims to classify high schools in Papua Province, Indonesia, based on the 2019 National Examination scores so they can be considered in maintaining the sustainability of school quality in Papua. In this study, all senior high schools in Papua Province were grouped into three clusters: Cluster 1 (high), Cluster 2 (medium), and Cluster 3 (low clusters) using the K-Means Algorithm on the 2019 National Examination data. The data were obtained through the website official Center of Educational Assessment of the Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia. Clarification was done by grouping data on national examination scores from each school based on the similarity of the data with data from other schools. The results of the high school clustering using the K-Means Algorithm show that 18 schools are in Cluster 1, 58 schools in Cluster 2, and 68 schools in Cluster 3. The results of the analysis of the K-Means Algorithm show an R2 value of 0.723 and a Silhouette score of 0.42.

Page Range






Digital Object Identifier (DOI)





Aditya, A., Jovian, I., & Sari, B. N. (2020). Implementasi K-means clustering Ujian Nasional sekolah menengah pertama di Indonesia tahun 2018/2019. Jurnal Media Informatika Budidarma, 4(1), 51-58. https://doi.org/10.30865/mib.v4i1.1784.

Berry, N. S., & Maitra, R. (2019). TiK-means: Transformation-infused K-means clustering for skewed groups. Statistical Analysis and Data Mining: The ASA Data Science Journal, 12(3), 223-233. https://doi.org/10.1002/SAM.11416.

Bilodeau, M., & Brenner, D. (2000). Theory of multivariate statistics. Springer. https://doi.org/10.5860/choice.37-3391.

Cai, Y., & Tang, C. (2021). Privacy of outsourced two-party K-means clustering. Concurrency and Computation: Practice and Experience, 33(8), e5473. https://doi.org/10.1002/CPE.5473.

Capó, M., Pérez, A., & Lozano, J. A. (2020). An efficient K-means clustering algorithm for tall data. Data Mining and Knowledge Discovery, 34(3), 776-811. https://doi.org/10.1007/S10618-020-00678-9.

Chikoko, V. (2007). The school cluster system as an innovation: Perceptions of Zimbabwean teachers and school heads. Africa Education Review, 4(1), 42-57. https://doi.org/10.1080/18146620701412142.

Crawford, A. M., Berry, N. S., & Carriquiry, A. L. (2021). A clustering method for graphical handwriting components and statistical writership analysis. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14(1), 41-60. https://doi.org/10.1002/SAM.11488.

Demidenko, E. (2018). The next-generation K-means algorithm. Statistical Analysis and Data Mining: The ASA Data Science Journal, 11(4), 153-166. https://doi.org/10.1002/SAM.11379.

Denis, D. J. (2020). Univariate, bivariate, and multivariate statistics using R: Quantitative Tools for data analysis and data science. John Wiley & Sons. https://doi.org/10.1002/9781119549963.

Dorman, K. S., & Maitra, R. (2021). An efficient k-modes algorithm for clustering categorical datasets. Statistical Analysis and Data Mining: The ASA Data Science Journal. https://doi.org/10.1002/SAM.11546.

Ediyanto, E., Mara, M. N., & Satyahadewi, N. (2013). Pengklasifikasian karakteristik dengan metode K-Means cluster analysis. Buletin Ilmiah Mat. Stat. Dan Terapannya (Bimaster), 02(2), 133-136. https://jurnal.untan.ac.id/index.php/jbmstr/article/view/3033.

Estivill-Castro, V., & Yang, J. (2004). Fast and robust general purpose clustering algorithms. Data Mining and Knowledge Discovery, 8(2), 127-150. https://doi.org/10.1023/B:DAMI.0000015869.08323.B3.

Faisal, M., Zamzami, E. M., & Sutarman, S. (2020). Comparative analysis of inter-centroid K-means performance using Euclidean distance, Canberra distance and Manhattan distance. Journal of Physics: Conference Series, 1566. https://doi.org/10.1088/1742-6596/1566/1/012112.

Han, J., & Kamber, M. (2011). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.

Hossain, M. S., Ramakrishnan, N., Davidson, I., & Watson, L. T. (2012). How to “alternatize“ a clustering algorithm. Data Mining and Knowledge Discovery, 27(2), 193-224. https://doi.org/10.1007/S10618-012-0288-4.

Huang, Z. (1998). Extensions to the K-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2(3), 283-304. https://doi.org/10.1023/A:1009769707641.

Huberty, C., & Elejnik, S. (2007). Applied MANOVA and discriminant analysis. Journal of the American Statistical Association, 102(479), 1075-1076. https://doi.org/10.1198/jasa.2007.s203.

Imawan, O. R., & Ismail, R. (2020). Meningkatkan kompetensi guru Matematika dalam mengembangkan media pembelajaran 4.0 melalui pelatihan aplikasi Geogebra. Jurnal Masyarakat Mandiri (JMM), 4(6), 1231-1239. https://journal.ummat.ac.id/index.php/jmm/article/view/3102.

Ismail, R., & Imawan, O. R. (2021a). Meningkatkan penguasaan TPACK guru di Papua melalui pelatihan pembuatan video pembelajaran pada masa pandemi Covid-19. Jurnal Masyarakat Mandiri (JMM), 5(1), 277-288. https://journal.ummat.ac.id/index.php/jmm/article/view/3862.

Ismail, R., & Imawan, O. R. (2021b). Optimalisasi kompetensi calon guru Matematika di Papua melalui pembuatan video pembelajaran di masa pandemi Covid-19. Jurnal Masyarakat Mandiri (JMM), 5(2), 734-745. http://journal.ummat.ac.id/index.php/jmm/article/view/4158.

Kapil, S., Chawla, M., & Ansari, M. D. (2016). On K-means data clustering algorithm with genetic algorithm. In the 4th International Conference on Parallel, Distributed and Grid Computing, 202-206. https://doi.org/10.1109/PDGC.2016.7913145.

Khairati, A. F., Adlina, A. A., Hertono, G. F., & Handari, B. D. (2019). Kajian indeks validitas pada algoritma K-means enhanced dan K-means MMCA. In Prosiding Seminar Nasional Matematika, 2, 161-170. https://journal.unnes.ac.id/sju/index.php/prisma/article/view/28906.

Kurniadi, D., & Sugiyono, A. (2020). Pengelompokkan data akademik menggunakan algoritma K-means pada data akademik Unissula. Jurnal Transformatika, 18(1), 93-101. https://doi.org/10.26623/transformatika.v18i1.2277.

Lithio, A., & Maitra, R. (2018). An efficient k-means-type algorithm for clustering datasets with incomplete records. Statistical Analysis and Data Mining: The ASA Data Science Journal, 11(6), 296-311. https://doi.org/10.1002/SAM.11392.

Lock, A. (2011). Clustering together to advance school improvement: Working together in peer support with an external colleague. National College for Leadership of Schools and Children's Services.

Mahdavi, M., & Abolhassani, H. (2008). Harmony K-means algorithm for document clustering. Data Mining and Knowledge Discovery, 18(3), 370-391. https://doi.org/10.1007/S10618-008-0123-0.

Mavroeidis, D., & Marchiori, E. (2013). Feature selection for k-means clustering stability: Theoretical analysis and an algorithm. Data Mining and Knowledge Discovery, 28(4), 918-960. https://doi.org/10.1007/S10618-013-0320-3.

Nariya, M., Kim, J. H., Xiong, J., Kleindl, P. A., Hewarathna, A., Fisher, A. C., Joshi, S. B., Schöneich, C., Forrest, M. L., Middaugh, C. R., Volkin, D. B., & Deeds, E. J. (2017). Comparative characterization of crofelemer samples using data mining and machine learning approaches with analytical stability data set. Journal of Pharmaceutical Sciences, 106(11), 3270-3279. https://doi.org/10.1016/j.xphs.2017.07.013.

Primartha, R. (2018). Buku belajar maschine learning: Teori dan praktek. Informatika.

Rajabi, A., Eskandari, M., Ghadi, M. J., Li, L., Zhang, J., & Siano, P. (2020). A comparative study of clustering techniques for electrical load pattern segmentation. Renewable and Sustainable Energy Reviews, 120. 109628. https://doi.org/10.1016/J.RSER.2019.109628.

Rencher, A. (2001). Methods of multivariate analysis (2nd ed.). John Wiley & Sons.

Singh, A., Yadav, A., & Rana, A. (2013). K-means with three different distance metrics. International Journal of Computer Applications, 67(10), 13-17. https://doi.org/10.5120/11430-6785.

Sutriyani, T. P., Siregar, A. M., & Kusumaningrum, D. S. (2018). Implementasi algoritma K-means terhadap Pengelompokan nilai ujian nasional tingkat SMP di Provinsi Jawa Barat. Techno Xplore : Jurnal Ilmu Komputer Dan Teknologi Informasi, 3(1), 30-36. https://doi.org/10.36805/technoxplore.v3i1.797.

Tabachnik, B., & Fidel, L. (2014). Using multivariate statistics (6th ed.). Pearson Education.

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining (2nd ed.). Pearson. https://www.pearson.com/us/higher-education/program/Tan-Introduction-to-Data-Mining-2nd-Edition/PGM214749.html.

Tinsley, H. E., & Brown, S. (2000). Handbook of applied multivariate statistics and mathematical modeling. In Handbook of applied multivariate statistics and mathematical modeling. Elsevier Science & Technology Books. https://doi.org/10.1016/b978-0-12-691360-6.x5000-9.

Toledo, M. D. G. (2005). A comparison in cluster validation techniques. University of Puerto Rico.

van der Maaten, L., & Hinton, G. (2008). Visualizing Data Using t-SNE. Journal of Machine Learning Research, 9(1), 2579-2605. https://www.jmlr.org/papers/v9/vandermaaten08a.html.

Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1-37. https://doi.org/10.1007/s10115-007-0114-2.