Home | Register | Login | Inquiries | Alerts | Sitemap |  


Advanced Search
JKM > Volume 45(3); 2024 > Article
Kim, Park, Jeong, Lee, Kim, Sung, and Yu: Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis: a feasibility study

Abstract

Objectives

We analyzed Sasang constitution case reports using text mining to derive network analysis results and designed a classification algorithm using machine learning to select a model suitable for classifying Sasang constitution based on text data.

Methods

Case reports on Sasang constitution published from January 1, 2000, to December 31, 2022, were searched. As a result, 343 papers were selected, yielding 454 cases. Extracted texts were pretreated and tokenized with the Python-based KoNLPy package. Each morpheme was vectorized using TF-IDF values. Word cloud visualization and centrality analysis identified keywords mainly used for classifying Sasang constitution in clinical practice. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models—XGBoost, LightGBM, SVC, Logistic Regression, and Random Forest Classifier—was evaluated using accuracy and F1-Score.

Results

Through word cloud visualization and centrality analysis, specific keywords for each constitution were identified. Logistic regression showed the highest accuracy (0.839416), while random forest classifier showed the lowest (0.773723). Based on F1-Score, XGBoost scored the highest (0.739811), and random forest classifier scored the lowest (0.643421).

Conclusions

This is the first study to analyze constitution classification by applying text mining and machine learning to case reports, providing a concrete research model for follow-up research. The keywords selected through text mining were confirmed to effectively reflect the characteristics of each Sasang constitution type. Based on text data from case reports, the most suitable machine learning models for diagnosing Sasang constitution are logistic regression and XGBoost.

tf-idf(t,d)=tf(t,d)×idf(t)idf(t)=log1+nd1+df(d,t)+1

Fig. 1
Study flow of text-mining and machine learning.
jkm-45-3-193f1.gif
Fig. 2
Flow chart of literature searches and screening results.
jkm-45-3-193f2.gif
Fig. 3
Wordcloud visualization analysis result of Sasang constitution.
jkm-45-3-193f3.gif
Table 1
English Word Translation Criteria
Translation exclusion criteria Examples
Words written in English in most of the research papers VAS, QSCCII
Words that represents a unit kg, cm
Name of the medicine trolac, NSAID
Table 2
Data Refining Criteria
Criteria Example
Before after
Exclusions Not a key variable, and used conventionally Above-mentioned, Opinion, Not, usual, And, When, time Delete
Terms related to Korean medicine, but used conventionally Common Questions in the Constitutional Questionnaire (Address, Symptoms usually present, Medical history, Body type, Temperament, Abilities) Oriental Medicine, Diagnosis, Defecation, Urine
Synonyms Cases with the same or similar meanings but different spellings ‘ears, eyes, mouth, and nose’, ‘ears, eyes, nose, and mouth’, ‘eyes, nose, and mouth’ ‘ears, eyes, nose, and mouth’
Cases where a single word represents or encompasses other words Sleep disorder, Difficulty falling asleep, Nocturnal sleep disorder, Difficulty maintaining sleep, Insomnia, Sleep difficulties, Difficulty falling asleep Sleep disorder
Native words Cases where a compound word is perceived as separate components cold, sweat Cold sweat
Hyung, Geumji, Pose Hyunggeumjipose
Cases where multiple words should be considered as a single phrase Abdominal, bloating Abdominal bloating
Nocturnal, sleep, disorder Nocturnal sleep disorder
Table 3
Number of Data by Sasang Constitution
Sasang Constitution Number of data
Soeumin 92
Soyangin 198
Taeeumin 148
Taeyangin 16
Total 454
Table 4
Combined Centrality (Top10)
Soeumin Soyangin Taeeumin Taeyangin
Word TI CC Word TI CC Word TI CC Word TI CC
1 Thin 0.136 0.217 Vomiting 0.353 0.313 Dizziness 0.245 0.211 nothing particular 0.122 0.182
2 Chest 0.082 0.201 Headache 0.163 0.282 Headache 0.163 0.210 Evening 0.082 0.182
3 Severe 0.027 0.196 nausea 0.381 0.280 Gait 0.109 0.201 Weakness 0.218 0.155
4 Bilateral 0.082 0.193 Thin 0.136 0.276 Bilateral 0.082 0.197 Gait 0.109 0.155
5 Abdomen 0.109 0.186 Pain 0.082 0.260 Head 0.082 0.193 Exercise 0.109 0.155
6 abdominal pain 0.082 0.181 Dizziness 0.245 0.258 Thorax 0.109 0.190 -ed 0.155
7 Physique 0.272 0.176 Entire body 0.163 0.249 Abdomen 0.109 0.186 Duration 0.163 0.152
8 Lower extremities 0.176 Above 0.236 Drug 0.181 Bilateral 0.082 0.152
9 Shoulder 0.054 0.174 Physique 0.272 0.225 stress 0.054 0.175 Defecation 0.109 0.125
10 Drug 0.172 Administration 0.190 0.220 Nocturnal 0.136 0.172 Limbs 0.109 0.125

* TI: TF-IDF, CC: Combined Centrality, jkm-45-3-193f4.gif: TF-IDF<0.1

Table 5
Best Parameter of Algorithm
Algorithm Best Params. F1-score
XGBoost {‘learning_rate’: 0.5, ‘max_depth’: 20, ···} 0.696374
LightGBM {‘learning_rate’: 1, ‘max_depth’: 10, ···} 0.695833
SVC {‘C’: 10, ‘kernel’: ‘linear’} 0.651028
Logistic Regression {‘C’: 20} 0.668290
Random Forest Classifier {‘n_estimators’: 50} 0.603950
Table 6
Accuracy and F1-Score of Algorithms
Algorithm Accuracy F1-Score Precision Recall
XGBoost 0.810219 0.739811 0.859072 0.696374
LightGBM 0.795620 0.730692 0.835910 0.695833
SVC 0.817518 0.688447 0.872854 0.651028
Logistic Regression 0.839416 0.705982 0.889106 0.668290
Random Forest Classifier 0.773723 0.643421 0.853030 0.603950

참고문헌

1. Jung, S. H. (2021). A Study on <Nanjungilgi> Using Topic Modeling and Network Analysis. The Korean Language and Literature, (197), 111-144. https://doi.org/10.31889/kll.2021.12.197.111
crossref

2. Cho, S. Z., & Kang, S. H. (2016). Industrial Applications of Machine Learning (Artificial Intelligence). Industrial Engineering Magazine, 23(2), 34-38.


3. Seo, H. J. (2019). A Preliminary Discussion on Policy Decision Making of AI in The Fourth Industrial Revolution. Informatization Policy, 26(3), 1-1. https://doi.org/10.22693/NIAIP.2019.26.3.003


4. Baek, S. W. (2023). Natural Language Processing in Construction Management. KSCE 2023 CONVENTION, 549-550.


5. Park, K. M., & Hwang, K. B. (2011). A Bio-Text Mining System Based on Natural Language Processing. Journal of KIISE: Computing Practices and Letters, 17(4), 205-213.


6. Choi, C. H., Park, K. H., Park, H. K., Lee, M. J., Kim, J. S., & Kim, H. S. (2017). Development of Heavy Rain Damage Prediction Function for Public Facility Using Machine Learning. Journal of Korean Society of Hazard Mitigation, 17(6), 443-450. https://doi.org/10.9798/KOSHAM.2017.17.6.443
crossref

7. Hong, J. W., Kim, Y. I., Park, S. J., Kim, B. C., Eom, I. K., & Hwang, M. W., et al (2009). Data mining Algorithms for the Development of Sasang Type Diagnosis. Journal of Physiology & Pathology in Korean Medicine, 23(6), 1234-1240.


8. Lee, J. H., & Lee, H. H. (2019). Selecting Sasang-Type classification model using machine learning and designing the service flow. Journal of Digital Contents Society, 20(2), 321-327. http://dx.doi.org/10.9728/dcs.2019.20.2.321
crossref

9. Lee, H. R., & Lee, J. H. (2021). A Study on the Development of Diagnostic Tools for Sasang Constitutional Patterns. Journal of Sasang Constitutional Medicine, 33(3), 95-126. https://doi.org/10.7730/JSCM.2021.33.3.95


10. Kim, G. W. (2002). Relation of Sasang Constitution diseases and Mind-Body Medicine (Sasang Constitutinal Medicine from the psychiatry point of view). Journal of Oriental Neuropsychiatry, 13(2), 11-19.


11. Craddock, N., & Mynors-Wallis, L. (2014). Psychiatric diagnosis: impersonal, imperfect and important. Br J Psychiatry, 204(2), 93-95. https://doi.org/10.1192/bjp.bp.113.133090
crossref pmid

12. Srivastava A., Sahami M.(2009). Text mining: Classification, Clustering, and Applications. CRC Press.


13. Park S. E., Gang J. Y.Python Text Mining Complete Guide. 1st Edition. Gyeonggi. Wikibooks;(2022). p. 322


14. Seo D. H.Grab It! Text Mining with Python. 1st Edition. Seoul. bjpublic;(2019). p. 203


15. Park, D. H., & Cho, M H. (2022). Identifying Fine Dining Restaurant Consumers’ Perceptions: A Pre- and During COVID-19 Comparison using Big Data. Korean Journal of Hospitality & Tourism, 31(4), 17-32. https://doi.org/10.24992/KJHT.2022.6.31.04.17
crossref

16. Seo D. H.(2019). Grab It! Text Mining with Python. 1st Edition. Seoul. bjpublic;p. 203


17. Rácz, A., Bajusz, D., & Héberger, K. (2021). Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules, 26(4), 1111.
crossref pmid pmc

18. Department of Sasang Constitutional Medicine, College of Korean Medicine. (2004). Sasang constitutional medicine. Jipmoon, 164–165, 643729-730.


19. Park, H. S., Joo, J. C., Kim, J. H., & Kim, K. Y. (2002). A Study on clinical application of the QSCCII(Questionnaire for the Sasang Constitution ClassificationII). Journal of Sasang Constitutional Medicine, 14(2), 35-44.


20. Baek, Y. H., Kim, H. S., Lee, S. W., & Jang, E. S. (2014). The Concordance and Validity Assessment of Diagnosis for the Expert in Sasang Constitution. Journal of Sasang constitutional medicine, 26(3), 295-303.
crossref

21. Lee, S. G., Kwak, C. K., Lee, E. J., Ko, B. H., & Song, I. B. (2003). The Study of the Upgrade of QSCCII(II)-A Study on the re-validity of QSCCII-. Journal of Sasang constitutional medicine, 15(1), 39-49.


22. Kang, M. S., Oh, J. W., Lee, H. R., & Lee, J. H. (2019). Patient Group Study to Improve the Accuracy of QSCC II+. Journal of Sasang Constitutional Medicine, 31(3), 48-65. https://doi.org/10.7730/JSCM.2019.31.3.48


23. Do, J. H., Nam, J. H., Jang, E. S., Jang, J. S., Kim, J. W., & Kim, Y. S., et al (2013). Comparison between Diagnostic Results of the Sasang Constitutional Analysis Tool (SCAT) and a Sasang Constitution Expert. Journal of Sasang constitutional medicine, 25(3), 158-166. https://doi.org/10.7730/JSCM.2013.25.3.158
crossref

24. Hwang, D. S., Cho, J. H., Lee, C. H., Jang, J. B., & Lee, K. S. (2006). A Study on Reproducibility of Responses to the Questionnaire for Sasang Constitution Classification II (QSCCII). Journal of Korean Medicine, 27(3), 145-150.


25. Kim, J. W., Sul, Y. K., Choi, J. J., Kwon, S. D., Kim, K. K., & Lee, Y. T. (2007). Comparative Study of Diagnostic Accuracy Rate by Sasang Constitutions on Measurement Method of Body Shape. Journal of physiology & pathology in Korean Medicine, 21(1),


26. Lee, E. J., Song, K. B., Choi, H. S., Yoo, J. H., Kwak, C. K., & Sohn, E. H., et al (2005). Pilot Study on the classification for sasangin by the voice analysis. Journal of Korean Oriental Medicine, 26(1), 93-102.


27. Lee J.H.(2022). Korean Medicine Clinical Practice Guideline for Sasang(Four) constitutional medicine patterns. Korea. The Society of Sasang Constitutional Medicine.


28. Kim, M. J., & Lee, S. J. (2018). Study of health characteristics of female college students according to sasang constitution and factors affecting BMI. Journal of Sasang constitutional medicine, 30(3), 48-61.


29. Kim, E. Y., & Kim, J. W. (2004). A Clinical study on the Sasang Constitution and Obesity. Journal of Sasang constitutional medicine, 16(1), 100-111.


30. Hong, S. C., Lee, S. K., Lee, E. J., Han, G. H., Chou, Y. J., & Choi, C. H., et al (1998). A Study on the morphologic characteristics of each constitution’s trunk. Journal of Sasang constitutional medicine, 10(1), 101-142.


31. Choi, J. S., & Kim, K. Y. (1998). A Study on Disease and Medical Theory of Soyangin Bisoohan-pyohanbyung-theory. Journal of Sasang constitutional medicine, 10(2), 61-110.


32. Park, S. E. (2021). Analysis of the Status of Natural Language Processing Technology Based on Deep Learning. The Korea Journal of BigData, 6(1), 63-81. https://doi.org/10.36498/kbigdt.2021.6.1.63


TOOLS
PDF Links  PDF Links
Full text via DOI  Full text via DOI
PubReader  PubReader
Download Citation  Download Citation
  Print
Share:      
METRICS
0
Crossref
965
View
41
Download
Editorial office contact information
3F, #26-27 Gayang-dong, Gangseo-gu Seoul, 157-200 Seoul, Korea
The Society of Korean Medicine
Tel : +82-2-2658-3627   Fax : +82-2-2658-3631   E-mail : skom1953.journal@gmail.com
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Developed in M2PI