Topic Mining-Based Knowledge Discovery of User Health Information Needs
Downloads
Understanding the user’s need for health information has become increasingly important as the use of digital health services continues to grow. However, the unstructured data of user-generated questions presents challenges in accurately capturing and analyzing these needs. This study contributes to addressing SDG 3 (Good Health and Well-being) by utilizing topic mining-based knowledge discovery to identify the primary topics emerging from user questions submitted through the “Tanya Dokter” feature on the Alodokter platform. A total of 8,550 questions were obtained through web scraping between July 2024 and June 2025. The collected data were preprocessed and subsequently analyzed using seven topic modeling approaches: Latent Dirichlet Allocation (LDA), Correlated Topic Model (CTM), Latent Semantic Analysis (LSA), Non-negative Matrix Factorization (NMF), BERTopic, Top2Vec, and ProdLDA. To assess model performance, the coherence metric (c_v) was employed to identify the most effective method. Among these techniques, NMF achieved the best results, producing the highest coherence score of 0.67 with six well-defined topics. The findings show six primary areas of concern: pregnancy; menstruation and contraceptive management; general health and minor ailments; infant care; dermatological conditions; and musculoskeletal and other physical complaints. General health-related issues occurred most frequently, particularly during seasonal transitions, while menstruation and contraceptive management received the least attention, despite menstruation contributing to women’s health risks and the use of contraceptives helping to reduce maternal mortality in Indonesia. These findings offer valuable insights for digital health platforms like Alodokter to enhance information delivery and health literacy, ultimately improving online health services and supporting the achievement of SDG 3
[1] X. Jia, Y. Pang, and L. S. Liu, “Online Health Information Seeking Behavior: A Systematic Review,” Healthcare, vol. 9, no. 12, p. 1740, Dec. 2021, doi: 10.3390/healthcare9121740.
[2] Y. Hua, W. Shujuan, and W. Fucheng, “Online health community—An empirical analysis based on grounded theory and entropy weight TOPSIS method to evaluate the service quality,” Digit. Heal., vol. 9, Jan. 2023, doi: 10.1177/20552076231207201.
[3] E. Vega, M. Zepeda, E. Gutierrez, M. Martinez, S. Gomez, and S. Caldera, “Internet Health Information on Patient’s Decision-Making: Implications, Opportunities and Challenges,” Med. Res. Arch., vol. 11, no. 7.2, 2023, doi: 10.18103/mra.v11i7.2.4066.
[4] K. Stifjell, T. M. Sandanger, and C. Wien, “Exploring Online Health Information–Seeking Behavior Among Young Adults: Scoping Review,” J. Med. Internet Res., vol. 27, p. e70379, Sep. 2025, doi: 10.2196/70379.
[5] Asosiasi Penyelenggara Jasa Internet Indonesia (APJII), “Survei Penetrasi Internet Indonesia 2024.” [Online]. Available: https://survei.apjii.or.id/
[6] Z. Sun, K. Wang, Y. Jin, Z. Wang, and R. Yang, “Why are you? Exploring patients’ behavior in selecting physicians in online health communities,” Inf. Manag., vol. 62, no. 6, p. 104176, Sep. 2025, doi: 10.1016/j.im.2025.104176.
[7] S. Sanger, S. Duffin, R. E. Gough, and P. A. Bath, “Use of Online Health Forums by People Living With Breast Cancer During the COVID-19 Pandemic: Thematic Analysis,” JMIR Cancer, vol. 9, p. e42783, Feb. 2023, doi: 10.2196/42783.
[8] H. E. Wood et al., “Moderators’ Experiences of the Safety and Effectiveness of Patient Engagement in an Asthma Online Health Community: Exploratory Qualitative Interview Study,” J. Med. Internet Res., vol. 27, p. e58167, Apr. 2025, doi: 10.2196/58167.
[9] J. Gao, Y. Zhao, D. Yang, Y. Liu, and L. Zhao, “Dynamic recommender system for chronic disease-focused online health community,” Expert Syst. Appl., vol. 258, p. 125086, Dec. 2024, doi: 10.1016/j.eswa.2024.125086.
[10] Y. Zhao and L. Zhang, “Getting better? Examining the effects of social support in OHCs on users’ emotional improvement,” Inf. Process. Manag., vol. 61, no. 4, p. 103754, Jul. 2024, doi: 10.1016/j.ipm.2024.103754.
[11] R. Bongelli, A. Bertolazzi, M. Paolanti, and I. Riccioni, “Exploring online patient-doctor interactions. An epistemic and pragmatic analysis of Q&A patterns in an Italian ‘Ask to the doctor’ medical forum,” Patient Educ. Couns., vol. 134, p. 108662, May 2025, doi: 10.1016/j.pec.2025.108662.
[12] Alodokter, “Tanya Dokter”, [Online]. Available: https://www.alodokter.com/komunitas/diskusi/penyakit
[13] L. Nie, J. Xu, and R. Wang, “Health information needs and feedback of users in the online TCM community,” PLoS One, vol. 19, no. 3, p. e0301536, Mar. 2024, doi: 10.1371/journal.pone.0301536.
[14] K. Mermin-Bunnell et al., “Use of Natural Language Processing of Patient-Initiated Electronic Health Record Messages to Identify Patients With COVID-19 Infection,” JAMA Netw. Open, vol. 6, no. 7, p. e2322299, Jul. 2023, doi: 10.1001/jamanetworkopen.2023.22299.
[15] M. Kamba et al., “Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach,” JMIR Cancer, vol. 7, no. 4, p. e32005, Oct. 2021, doi: 10.2196/32005.
[16] H. Sufriyana, Y.-W. Wu, and E. C.-Y. Su, “Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia,” EBioMedicine, vol. 54, p. 102710, Apr. 2020, doi: 10.1016/j.ebiom.2020.102710.
[17] Q. Zhang, T. Guan, and Y. Liao, “Knowledge of and policy support for the SDGs: An inverted U-shaped relationship,” J. Environ. Manage., vol. 368, p. 122117, Sep. 2024, doi: 10.1016/j.jenvman.2024.122117.
[18] A. Ficko, S. Sarkki, Y. S. Gultekin, A. Egli, and J. Hiedanpää, “Reflective thinking meets artificial intelligence: Synthesizing sustainability transition knowledge in left-behind mountain regions,” Geogr. Sustain., vol. 6, no. 1, p. 100257, Feb. 2025, doi: 10.1016/j.geosus.2024.100257.
[19] F. Alqurashi and I. Ahmad, “A data-driven multi-perspective approach to cybersecurity knowledge discovery through topic modelling,” Alexandria Eng. J., vol. 107, pp. 374–389, Nov. 2024, doi: 10.1016/j.aej.2024.07.044.
[20] T. Timakum, S. Lee, and M. Song, “Exploring the research landscape of data warehousing and mining based on DaWaK Conference full-text articles,” Data Knowl. Eng., vol. 135, p. 101926, Sep. 2021, doi: 10.1016/j.datak.2021.101926.
[21] W. Ning, J. Liu, and H. Xiong, “Knowledge discovery using an enhanced latent Dirichlet allocation-based clustering method for solving on-site assembly problems,” Robot. Comput. Integr. Manuf., vol. 73, p. 102246, Feb. 2022, doi: 10.1016/j.rcim.2021.102246.
[22] J. Wang, X. Wang, L. Wang, and Y. Peng, “Health Information Needs of Young Chinese People Based on an Online Health Community: Topic and Statistical Analysis,” JMIR Med. Informatics, vol. 9, no. 11, p. e30356, Nov. 2021, doi: 10.2196/30356.
[23] J. Wang, L. Wang, J. Xu, and Y. Peng, “Information Needs Mining of COVID-19 in Chinese Online Health Communities,” Big Data Res., vol. 24, p. 100193, May 2021, doi: 10.1016/j.bdr.2021.100193.
[24] A. Muhaimin et al., “Social Media Analysis and Topic Modeling: Case Study of Stunting in Indonesia,” Telematika, vol. 20, no. 3, p. 406, Nov. 2023, doi: 10.31315/telematika.v20i3.10797.
[25] M. Habibi, A. Priadana, and M. Rifqi Ma’arif, “Sentiment Analysis and Topic Modeling of Indonesian Public Conversation about COVID-19 Epidemics on Twitter,” IJID (International J. Informatics Dev., vol. 10, no. 1, pp. 23–30, Jun. 2021, doi: 10.14421/ijid.2021.2400.
[26] Y. Sahria and Dhomas Hatta Fudholi, “Analysis of Health Research Topics in Indonesia Using the LDA (Latent Dirichlet Allocation) Topic Modeling Method,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 336–344, Apr. 2020, doi: 10.29207/resti.v4i2.1821.
[27] B. Krishna and P. Puram, “The impact of trust-based challenges on user satisfaction in food sharing platforms: A text mining approach,” Technol. Forecast. Soc. Change, vol. 216, p. 124159, Jul. 2025, doi: 10.1016/j.techfore.2025.124159.
[28] N. Falah, N. Falah, J. Solis-Guzman, and M. Marrero, “An indicator-based framework of circular cities focused on sustainability dimensions and sustainable development goal 11 obtained using machine learning and text analytics,” Sustain. Cities Soc., vol. 121, p. 106219, Mar. 2025, doi: 10.1016/j.scs.2025.106219.
[29] N. Strelkovskii and N. Komendantova, “Integration of UN sustainable development goals in national hydrogen strategies: A text analysis approach,” Int. J. Hydrogen Energy, vol. 102, pp. 1282–1294, Feb. 2025, doi: 10.1016/j.ijhydene.2025.01.134.
[30] E. Rijcken, K. Zervanou, M. Spruit, P. Mosteiro, F. Scheepers, and U. Kaymak, “Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records,” in 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), IEEE, Oct. 2022, pp. 2669–2674. doi: 10.1109/SMC53654.2022.9945594.
[31] E. Navarro and H. Homayouni, “Topic Modeling in Cardiovascular Research Publications,” vol. 1, no. 1, 2023.
[32] M. Mujahid et al., “Sentiment Analysis and Topic Modeling on Tweets about Online Education during COVID-19,” Appl. Sci., vol. 11, no. 18, p. 8438, Sep. 2021, doi: 10.3390/app11188438.
[33] M. Razavi et al., “Machine Learning, Deep Learning, and Data Preprocessing Techniques for Detecting, Predicting, and Monitoring Stress and Stress-Related Mental Disorders: Scoping Review,” JMIR Ment. Heal., vol. 11, p. e53714, Aug. 2024, doi: 10.2196/53714.
[34] E. W. D’Souza, A. J. MacGregor, R. R. Markwald, T. A. Elkins, and J. M. Zouris, “Investigating insomnia in United States deployed military forces: A topic modeling approach,” Sleep Heal., vol. 10, no. 1, pp. 75–82, Feb. 2024, doi: 10.1016/j.sleh.2023.09.014.
[35] C. Lalk et al., “Measuring Alliance and Symptom Severity in Psychotherapy Transcripts Using Bert Topic Modeling,” Adm. Policy Ment. Heal. Ment. Heal. Serv. Res., vol. 51, no. 4, pp. 509–524, Jul. 2024, doi: 10.1007/s10488-024-01356-4.
[36] J. W. Lee, Y. Kim, and D. H. Han, “LDA-based topic modeling for COVID-19-related sports research trends,” Front. Psychol., vol. 13, Nov. 2022, doi: 10.3389/fpsyg.2022.1033872.
[37] B. Karas, S. Qu, Y. Xu, and Q. Zhu, “Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis,” Front. Artif. Intell., vol. 5, Aug. 2022, doi: 10.3389/frai.2022.948313.
[38] S. Ying, “Guests’ Aesthetic experience with lifestyle hotels: An application of LDA topic modelling analysis,” Heliyon, vol. 10, no. 16, p. e35894, Aug. 2024, doi: 10.1016/j.heliyon.2024.e35894.
[39] Z. Chen and B. Zaman, “In case players were wondering: A topic modelling and sentiment analysis study of community discussions on weapon cases in the CS:GO game,” Entertain. Comput., vol. 54, p. 100936, Jun. 2025, doi: 10.1016/j.entcom.2025.100936.
[40] I. Vayansky and S. A. P. Kumar, “A review of topic modeling methods,” Inf. Syst., vol. 94, p. 101582, Dec. 2020, doi: 10.1016/j.is.2020.101582.
[41] A. Meddeb and L. Ben Romdhane, “Using Topic Modeling and Word Embedding for Topic Extraction in Twitter,” Procedia Comput. Sci., vol. 207, pp. 790–799, 2022, doi: 10.1016/j.procs.2022.09.134.
[42] W. Jo, Y. Kim, M. Seo, N. Lee, and J. Park, “Online information analysis on pancreatic cancer in Korea using structural topic model,” Sci. Rep., vol. 12, no. 1, p. 10622, Jun. 2022, doi: 10.1038/s41598-022-14506-1.
[43] T. Kekere, V. Marivate, and M. Hattingh, “Exploring COVID-19 public perceptions in South Africa through sentiment analysis and topic modelling of Twitter posts,” African J. Inf. Commun., no. 31, Jun. 2023, doi: 10.23962/ajic.i31.14834.
[44] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.05794
[45] X. Gao and C. Sazara, “Discovering Mental Health Research Topics with Topic Modeling,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.13569
[46] A. Srivastava and C. Sutton, “Autoencoding Variational Inference For Topic Models,” Mar. 2017, [Online]. Available: http://arxiv.org/abs/1703.01488
[47] A. Abdelrazek, Y. Eid, E. Gawish, W. Medhat, and A. Hassan, “Topic modeling algorithms and applications: A survey,” Inf. Syst., vol. 112, p. 102131, Feb. 2023, doi: 10.1016/j.is.2022.102131.
[48] S. Tunca, “Algorithms of emotion: A hybrid NLP analysis of neurodivergent Reddit communities,” Acta Psychol. (Amst)., vol. 260, p. 105519, Oct. 2025, doi: 10.1016/j.actpsy.2025.105519.
[49] A. Cheddak, T. Ait Baha, Y. Es-Saady, M. El Hajji, and M. Baslam, “BERTopic for Enhanced Idea Management and Topic Generation in Brainstorming Sessions,” Information, vol. 15, no. 6, p. 365, Jun. 2024, doi: 10.3390/info15060365.
[50] E. Cheese et al., “Using Natural Language Processing to Explore Patient Perspectives on AI Avatars in Support Materials for Patients With Breast Cancer: Survey Study,” J. Med. Internet Res., vol. 27, p. e70971, Jun. 2025, doi: 10.2196/70971.
[51] X. Li, X. Liu, C. Yin, S. Collins, and E. Alanazi, “Impact of a Virtual Reality Video (‘A Walk-Through Dementia’) on YouTube Users: Topic Modeling Analysis,” JMIR Form. Res., vol. 9, pp. e67755–e67755, Apr. 2025, doi: 10.2196/67755.
[52] S. O. Korkut, O. O. Kaymak, A. Onan, E. Ulker, and F. Yalcin, “A Roadmap of Emerging Trends Discovery in Hydrology: A Topic Modeling Approach,” pp. 1–16, 2023, [Online]. Available: https://arxiv.org/abs/2310.15943v1
[53] K. Georgiou, N. Mittas, A. Chatzigeorgiou, and L. Angelis, “An empirical study of COVID-19 related posts on Stack Overflow: Topics and technologies,” J. Syst. Softw., vol. 182, p. 111089, Dec. 2021, doi: 10.1016/j.jss.2021.111089.
[54] A. Krishnan and I. M. Ghebrehiwet, “GCD-TM: Graph-Driven Community Detection for Topic Modelling in Psychiatry Texts,” in Proceedings of the 1st Workshop on NLP for Science (NLP4Science), Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 47–57. doi: 10.18653/v1/2024.nlp4science-1.6.
[55] D. Agustian, K. Mutyara, C. Murad, T. M. Uyeki, C. B. Kartasasmita, and E. A. Simoes, “Epidemiology and population-based incidence of influenza in two communities, Bandung district, West Java, Indonesia, 2008–2011,” Narra J, vol. 4, no. 3, p. e981, Oct. 2024, doi: 10.52225/narra.v4i3.981.
[56] Kementerian PPN/Bappenas, “Rencana Pembangunan Jangka Menengah Nasional Tahun 2025-2029.” [Online]. Available: https://www.bappenas.go.id/datapublikasishow?q=Rencana Pembangunan dan Rencana Kerja Pemerintah
[57] F. U. Prameswari, F. R. Muharram, T. Setyaningrum, and C. R. S. Prakoeswa, “Burden of Skin and Subcutaneous Diseases in Indonesia 1990 to 2019,” Acta Derm. Venereol., vol. 103, p. adv18291, Dec. 2023, doi: 10.2340/actadv.v103.18291.
[58] M. Syairaji, D. S. Nurdiati, B. S. Wiratama, Z. D. Prüst, K. W. M. Bloemenkamp, and K. J. C. Verschueren, “Trends and causes of maternal mortality in Indonesia: a systematic review,” BMC Pregnancy Childbirth, vol. 24, no. 1, p. 515, Jul. 2024, doi: 10.1186/s12884-024-06687-6.
[59] I. Siramaneerat, E. Astutik, F. Agushybana, P. Bhumkittipich, and W. Lamprom, “Examining determinants of stunting in Urban and Rural Indonesian: a multilevel analysis using the population-based Indonesian family life survey (IFLS),” BMC Public Health, vol. 24, no. 1, p. 1371, May 2024, doi: 10.1186/s12889-024-18824-z.
[60] T. A. E. Prasetya, N. I. A. Samad, A. Rahmania, D. A. Arifah, R. A. A. Rahma, and A. Al Mamun, “Workstation Risk Factors for Work-related Musculoskeletal Disorders Among IT Professionals in Indonesia,” J. Prev. Med. Public Heal., vol. 57, no. 5, pp. 451–460, Sep. 2024, doi: 10.3961/jpmph.24.214.
[61] K. A. Akbar, P. Try, P. Viwattanakulvanid, and K. Kallawicha, “Work-Related Musculoskeletal Disorders Among Farmers in the Southeast Asia Region: A Systematic Review,” Saf. Health Work, vol. 14, no. 3, pp. 243–249, Sep. 2023, doi: 10.1016/j.shaw.2023.05.001.
[62] E. Kholinne, X. Azalia, E. P. Rahayu, I. J. Anestessia, N. Agil, and Muchtar, “The prevalence and risk factors of musculoskeletal disorders among Indonesian dental professionals,” Front. Rehabil. Sci., vol. 6, Feb. 2025, doi: 10.3389/fresc.2025.1513442.
[63] Badan Pusat Statistik (BPS), “Profil Statistik Kesehatan 2023.” [Online]. Available: https://www.bps.go.id/id/publication/2023/12/20/feffe5519c812d560bb131ca/profil-statistik-kesehatan-2023.html
[64] Q. Liu et al., “Health Communication Through News Media During the Early Stage of the COVID-19 Outbreak in China: Digital Topic Modeling Approach,” J. Med. Internet Res., vol. 22, no. 4, p. e19118, Apr. 2020, doi: 10.2196/19118.
[65] J. Muragijemariya, V. Ihogoza, and E. T. Luhanga, “Scope of Online Maternal Health Information in Kinyarwanda and Opportunities for Digital Health Developers,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.03805
Copyright (c) 2025 Dayana Khoiriyah Harahap, Ken Ditha Tania, Putri Eka Sevtiyuni (Author)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).





