Donguibogam-Based Pattern Diagnosis Using Natural Language Processing and Machine Learning

Article information

J Korean Med. 2020;41(3):1-8
Publication date (electronic) : 2020 September 01
doi : https://doi.org/10.13048/jkm.20021
1Department of Information System, Hanyang University, Seoul, Korea
2Department of Biomedical Engineering, Hanyang University, Seoul, Korea
3Department of Internal Medicine, College of Oriental Medicine, Wonkwang University, Iksan, Korea
Correspondence to: Kang Kyung Sung, Department of Internal Medicine, College of Oriental Medicine, Wonkwang University, 460 Iksan-daero, Sin-dong, Iksan 54538, Korea, Tel: +82-62-670-6412, Fax: +82-62-671-6414, E-mail: sungkk@wonkwang.ac.kr
Received 2020 June 30; Revised 2020 July 25; Accepted 2020 July 27.

Abstract

Objectives

This paper aims to investigate the Donguibogam-based pattern diagnosis by applying natural language processing and machine learning.

Methods

A database has been constructed by gathering symptoms and pattern diagnosis from Donguibogam. The symptom sentences were tokenized with nouns, verbs, and adjectives with natural language processing tool. To apply symptom sentences into machine learning, Word2Vec model has been established for converting words into numeric vectors. Using the pair of symptom’s vector and pattern diagnosis, a pattern prediction model has been trained through Logistic Regression.

Results

The Word2Vec model’s maximum performance was obtained by optimizing Word2Vec’s primary parameters—the number of iterations, the vector’s dimensions, and window size. The obtained pattern diagnosis regression model showed 75% (chance level 16.7%) accuracy for the prediction of Six-Qi pattern diagnosis.

Conclusions

In this study, we developed pattern diagnosis prediction model based on the symptom and pattern diagnosis from Donguibogam. The prediction accuracy could be increased by the collection of data through future expansions of oriental medicine classics.

Fig. 1

Natural Language Processing and Pattern Diagnois Prediction Model Based on Donguibogam.

Fig. 2

Optimization of Word2Vector Model.

Fig. 3

Word2Vector Model-Based Diagnostic Prediction Program.

References

1. Kim JK, Seol IC, Lee I, Jo HK, Yu BC, Choi SM. Report on the Korean standard differentiation of the symptoms and signs for the stroke-1. J Physiol Pathol Korean Med 2006;20(1):229–34.
2. Kang BK, Go HY, Kim JK, Kim BY, Ko MM, Kang KW, et al. Study of concordance rate to measure symptoms in interanl researchers. J Physiol Pathol Korean Med 2006;20(6):1728–31.
3. Go HY, Kim JK, Kang BK, Kim BY, Ko MM, Kang KW, et al. Report on the Korean standard differentiation of the symptoms and signs for the stroke-1 (KSDSSS-1). J Physiol Pathol Korean Med 2006;20(6):1789–92.
4. Go HY, Kim JK, Kang BK, Kim BY, Ko MM, Kang KW, et al. Survey of stroke subtype classification. J Physiol Pathol Korean Med 2007;21(1):318–21.
5. Choi SM, Yang KS. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine. Korean J Orient Med 1995;1(1):101–25.
6. Yang KS, Choi SH, Choi SM, Park KM, Jeong WY, Ahn KS, et al. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine II. Korean J Orient Med 1996;2(1):381–401.
7. Choi SM, Yang KS, Choi SH, Park KM, Park JH, Shim BS, et al. Standardization and unification of the terms and conditions used for diagnosis in oriental medicine III. Korean J Orient Med 1997;3(1):41–65.
8. KOREA INSTITUTE OF ORIENTAL MEDICINE (KIOM). 한의학고전DB
9. 서울대학교 IDS연구실. 꼬꼬마 (KKMA) 세종 말 뭉치 활용 시스템 2010. Available from: http://kkma.snu.ac.kr/.
10. Eddie . 딥 러닝을 이용한 자연어 처리 입문. 대한 민국 Wikidocs; 2020.
11. Yogatama . Learning Word Representations with Hierarchical Sparse Coding. In : ICML (International Conference on Machine Learning); 2015;

Article information Continued

Fig. 1

Natural Language Processing and Pattern Diagnois Prediction Model Based on Donguibogam.

Fig. 2

Optimization of Word2Vector Model.

Fig. 3

Word2Vector Model-Based Diagnostic Prediction Program.