Multifaceted validity analysis of clinical skills test in the educational field setting

Article information

J Korean Med. 2024;45(1):1-16
Publication date (electronic) : 2024 March 1
doi : https://doi.org/10.13048/jkm.24001
1School of Korean Medicine, Pusan National University
2Department of Medical Education, Seoul National University College of Medicine
3Department of Human Systems Medicine, Seoul National University
4Department of Internal Korean Medicine, Woosuk University Medical Center
5Department of Ophthalmology, Otorhinolaryngology, and Dermatology of Korean Medicine, College of Korean Medicine, Kyung Hee University
6KM Science Division, Korea Institute of Oriental Medicine
Correspondence to: Eunbyul Cho, KM Science Division, Korea Institute of Oriental Medicine, 1672, Yuseong-daero, Yuseong-gu, Daejeon, 34054, Republic of Korea, Tel: +82-42-869-2779, Fax: +82-42-861-5800, E-mail: chostar427@gmail.com
Received 2023 September 1; Revised 2023 December 6; Accepted 2024 February 16.

Abstract

Introduction

The importance of clinical skills training in traditional Korean medicine education is increasingly emphasized. Since the clinical skills tests are high-stakes tests that determine success in national licensing exams, it is essential to develop reliable multifaceted analysis methods for clinical skills tests in actual education settings. In this study, we applied the multifaceted validity evaluation methods to the evaluation results of the cardiopulmonary resuscitation module to confirm the applicability and effectiveness of the methods.

Methods

In this study, we used internal consistency, factor analysis, generalizability theory G-study and D-study, ANOVA, Kendall’s tau, descriptive statistics, and other statistical methods to analyze the multidimensional validity of a cardiopulmonary resuscitation test in clinical education settings over the past three years.

Results

The factor analysis and internal consistency analysis showed that the evaluation rubric had an unstable structure and low concordance. The G-study showed that the error of the clinical skills assessment was large due to the evaluator and unexpected errors. The D-study showed that the variance error of the evaluator should be significantly reduced to validate the evaluation. The ANOVA and Kendall’s tau confirmed that evaluator heterogeneity was a problem.

Discussion and Conclusion

Clinical skills tests should be continuously evaluated and managed for validity in two steps of pre-production and actual implementation. This study has presented specific methods for analyzing the validity of clinical skills training and testing in actual education settings. This study would contribute to the foundation for competency-based evidence-based education in practical clinical training.

Fig. 1

CPR total score according to the year and rater.

Description and scale reliability statistics of CPR examination in year 2019, 2020 and 2021

Item reliability statistics of CPR examination rubric in year 2019, 2020 and 2021

Model fit measures of factor analysis using CPR rubric items in year 2019, 2020 and 2021

Factor loading of CPR rubric items in year 2019, 2020 and 2021

Analysis results of G-study in year 2019, 2020 and 2021

Analysis results of D-study in year 2020

Acknowledgement

본 연구는 부산대학교의 연구비지원을 받았음.

This work was supported by a 2-Year Research Grant of Pusan National University.

Notes

Conflict of interest

We authors declare no potential conflict of interest relevant to this article.

Ethical statement

This study was approved b the Institutional Review Board (Consent No.: PNU IRB/2022_75_HR).

Data availability

The data of this study are available upon reasonable request.

References

1. Chae H, Cho E, Kim SK, et al. Analysis on validity and academic competency of mock test for Korean Medicine National Licensing Examination using Item Response Theory. Keimyung Medical Journal 2023;
2. Park SH. Possibilities and Limits of High Stakes Testing in US. Korean Journal of Comparative Education 2010;20:1–21.
3. Eggen TJ, Stobart G. High-stakes testing–value, fairness and consequences. High-Stakes Testing in Education Routledge: 2015. p. 1–6.
4. Korea Health Personnel Licensing Examination Institute. Clinical skill test https://www.kuksiwon.or.kr/EngHome/cnt/c_3109/view.do?seq=18. (2023, accessed 2023-06-18 2023).
5. Kim KS. Introduction and administration of the clinical skill test of the medical licensing examination, republic of Korea (2009). Journal of Educational Evaluation for Health Professions 2010;:7.
6. Han SY, Lee S.-H, Chae H. Developing a best practice framework for clinical competency education in the traditional East-Asian medicine curriculum. BMC Med Educ 2022;22:352. 2022;05. 11. DOI:https://doi.org/10.1186/s12909-022-03398-4.
7. Shin J, Go Y, Song C, et al. Presentation on research trends and suggestion for further research and education on Objective Structured Clinical Examination and Clinical Performance Examination in Korean Medicine education: Scoping review. Society of Preventive Korean Medicine 2022;26:87–112. 10.25153/SPKOM.2022.26.2.008.
8. Korean Laws Information Center. ACT ON DEVELOPMENT OF E-LEARNING INDUSTRY AND PROMOTION OF UTILIZATION OF E-LEARNING Ministration of Trade, Industry and Engergy. 18358Sejong, Korea: Korean Laws Information Center; 2021.
9. Chae H, Han SY, Yang G, et al. Study on the herbology test items in Korean medicine education using Item Response Theory. Kor J Herbology 2022;37:13–21. DOI:https://doi.org/10.6116/kjh.2022.37.2.13.
10. Chae H, Lee SJ, Han c-h, et al. Study on the Academic Competency Assessment of Herbology Test using Rasch Model. J Korean Med 2022;43:27–41. DOI:https://doi.org/10.13048/jkm.22017.
11. Kang Y. Evaluating the cutoff score of the advanced practice nurse certification examination in Korea. Nurse Education in Practice 2022;63:103407. DOI:https://doi.org/10.1016/j.nepr.2022.103407.
12. Kim S, Kim Y. Generalizability Theory 2nd edth ed. Paju: Education Science Publishing; 2016.
13. Nunnally JC, Bernstein IH. Psychometric theory 3 rd edth ed. New York: McGraw-Hill; 1994.
14. Shin S, Kim GS, Song JA, et al. Development of examination objectives based on nursing competency for the Korean Nursing Licensing Examination: a validity study. J Educ Eval Health Prof 2022;19:19. 2022;08. 23. DOI:https://doi.org/10.3352/jeehp.2022.19.19.
15. Seong T, Kang DJ, Kang E, et al. Introduction to Modern Pedagogy Seoul: Hakjisa; 2018.
16. Lee M-j. Exploring the feasibility of implementing criterion-referenced assessment in Korean medicine education: enhancing comprehension and relevance. J Kor Med Edu 2023;1:10–14. DOI:https://doi.org/10.23215/JKME.PUB.1.1.10.
17. Chae H. Jamovi, an open-source software for teaching data literacy and performing medical research. J Kor Med edu 2023;1:28–36. DOI:https://doi.org/10.23215/JKME.PUB.1.2.28.
18. Navas-Ferrer C, Urcola-Pardo F, Subirón-Valera AB, et al. Validity and reliability of objective structured clinical evaluation in nursing. Clinical Simulation in Nursing 2017;13:531–543.
19. Hur HK, Park SM, Kim KK, et al. Evaluation of Lasater judgment rubric to measure nursing student’ performance of emergency management simulation of hypoglycemia. Journal of Korean Critical Care Nursing 2012;5:15–27.
20. Kim J, Cho L.-R. Analysis of error source in subjective evaluation on patient dentist interaction: Application of Generalizability Theory. The Journal of the Korean Dental Association 2019;57:448–455.
21. Lee SY, Lm SJ, Yune SJ, et al. Assessment of Medical Students in Clinical Clerkships. Korean Medical Education Review 2013;15:120–124.
22. Rim MK, Ahn D.-S, Hwang IH, et al. Validation study to establish a cutoff for the national health personnel licensing examination 2014. Korea Health Personnel Licensing Examination Institute.
23. Ahn S, Choi S. Proposal for a Cut Score for the Physics Ability Test: Comparison between the Modified Angoff, Bookmark, and IDM Methods. New Physics: Sae Mulli 2018;68:599–610. DOI:http://dx.doi.org/10.3938/NPSM.68.599.
24. Schoonheim-Klein M, Muijtjens A, Habets L, et al. Who will pass the dental OSCE? Comparison of the Angoff and the borderline regression standard setting methods. European Journal of Dental Education 2009;13:162–171. DOI:https://doi.org/10.1111/j.1600-0579.2008.00568.x.
25. Pell G, Fuller R, Homer M, et al. How to measure the quality of the OSCE: A review of metrics – AMEE guide no. 49. Medical Teacher 2010;32:802–811. 10.3109/0142159X.2010.507716.
26. Downing SM. Validity: on meaningful interpretation of assessment data. Med Educ 2003;37:830–837. 10.1046/j.1365-2923.2003.01594.x.
27. Harden RM, Lilley P, Patricio M. The Definitive Guide to the OSCE: The Objective Structured Clinical Examination as a performance assessment Elsevier Health Sciences; 2015.
28. Patrício MF, Julião M, Fareleira F, et al. Is the OSCE a feasible tool to assess competencies in undergraduate medical education? Medical teacher 2013;35:503–514.
29. Ghouri A, Boachie C, McDowall S, et al. Gaining an advantage by sitting an OSCE after your peers: a retrospective study. Medical Teacher 2018;40:1136–1142.
30. Iyer A, Dovedi V. Is there such a thing as a fair OSCE? Medical Teacher 2018;40:1192–1192.

Article information Continued

Fig. 1

CPR total score according to the year and rater.

Table 1

Description and scale reliability statistics of CPR examination in year 2019, 2020 and 2021

Year Group n items Cronbach's α Mean Std.Dev. Median Variance skewness kurtosis Min. Max.
2019 47 16 0.646 17.362 2.566 90.000 164.593 −0.615 −0.607 55 100
1 6 16 81.667 11.690 82.500 136.667 0.245 0.959 65 100
2 6 16 75.833 16.558 77.500 274.167 0.128 −0.665 55 100
3 6 16 87.500 13.693 90.000 187.500 −0.876 −0.048 65 100
4 6 16 97.500 4.183 100.000 17.500 −1.537 1.429 90 100
5 6 16 90.833 12.007 95.000 144.167 −1.201 0.847 70 100
6 6 16 96.667 6.055 100.000 36.667 −1.952 3.657 85 100
7 6 16 81.667 12.910 77.500 166.667 0.705 −1.623 70 100
8 5 16 82.000 7.583 80.000 57.500 0.315 −3.081 75 90

2020 49 16 0.457 15.667 1.589 95.000 63.223 −1.104 0.188 70 100
1 6 16 95.000 6.325 97.500 40.000 −0.889 −0.781 85 100
2 6 16 90.833 8.612 90.000 74.167 −0.026 −2.367 80 100
3 6 16 88.333 10.328 90.000 106.667 −1.172 1.970 70 100
4 6 16 90.000 7.746 90.000 60.000 0.000 −1.875 80 100
5 6 16 96.667 6.055 100.000 36.667 −1.952 3.657 85 100
6 6 16 97.500 6.124 100.000 37.500 −2.449 6.000 85 100
7 6 16 92.500 8.216 95.000 67.500 −0.811 −1.029 80 100
8 7 16 94.286 8.864 95.000 78.571 −2.215 5.299 75 100

2021 41 14 0.545 13.500 1.569 94.100 83.106 −0.973 0.542 65 100
1 6 14 90.200 7.101 91.150 50.428 0.086 −1.541 82 100
2 6 14 99.017 2.409 100.000 5.802 −2.449 6.000 94 100
3 6 14 99.017 2.409 100.000 5.802 −2.449 6.000 94 100
4 6 14 91.183 10.343 94.100 106.970 −0.492 −1.928 77 100
5 6 14 88.217 6.427 88.200 41.302 −1.358 2.467 77 94
6 6 14 87.267 10.777 88.250 116.135 −0.517 −0.596 71 100
7 5 14 82.340 10.176 88.200 103.548 −1.932 3.701 65 88

Table 2

Item reliability statistics of CPR examination rubric in year 2019, 2020 and 2021

item 2019 2020 2021

Mean SD Item-rest correlation α if deleted Mean SD Item-rest Correlation α if deleted Mean SD Item-rest correlation α if deleted
1 0.979 0.146 0.140 0.644 1.000 0.000 - - 0.976 0.156 0.056 0.549
2 0.979 0.146 0.140 0.644 0.980 0.143 0.063 0.456 0.951 0.218 0.322 0.509
3 0.979 0.146 0.140 0.644 0.980 0.143 0.063 0.456 0.927 0.264 0.381 0.490
4 0.745 0.441 0.190 0.641 0.857 0.354 0.142 0.443 0.976 0.156 0.056 0.549
5 0.979 0.146 0.319 0.635 1.000 0.000 - - 1.000 0.000 - -
6 0.979 0.146 −0.094 0.655 0.959 0.200 −0.106 0.484 0.927 0.264 0.243 0.519
7 0.809 0.449 0.368 0.613 0.837 0.373 0.133 0.447 0.780 0.419 0.382 0.471
8 1.681 0.515 0.507 0.585 1.898 0.306 0.172 0.435 1.707 0.461 0.499 0.422
9 0.830 0.433 0.298 0.625 0.918 0.277 0.359 0.390 0.902 0.300 0.317 0.500
10 1.213 0.907 0.414 0.621 1.673 0.658 0.356 0.351 0.976 0.156 0.056 0.549
11 0.957 0.204 0.161 0.642 0.959 0.200 0.233 0.429 1.683 0.471 0.308 0.498
12 1.000 0.209 0.000 0.653 1.000 0.000 - - 0.927 0.264 0.047 0.557
13 1.809 0.449 0.260 0.630 1.898 0.421 −0.007 0.498 1.000 0.000 - -
14 0.957 0.204 0.118 0.645 0.939 0.242 0.359 0.398 1.780 0.419 −0.056 0.608
15 0.936 0.247 0.257 0.634 0.939 0.242 0.064 0.459
16 1.532 0.654 0.656 0.539 1.816 0.391 0.323 0.382 · · · ·

Table 3

Model fit measures of factor analysis using CPR rubric items in year 2019, 2020 and 2021

Year RMSEA TLI BIC χ2 df p-value
2019 0.583 −0.047 987.208 1275.969 75 <.001
2020 0.487 −0.055 559.066 808.143 64 <.001
2021 0.136 0.358 −83.077 76.606 43 0.001

RMSEA, Root Mean Square Error of Approximation; Tucker-Lewis Index, TLI; Bayesian information criterion, BIC.

Table 4

Factor loading of CPR rubric items in year 2019, 2020 and 2021

2019 2020 2021



Factor Uniqueness Factor Uniqueness Factor Uniqueness



1 2 3 1 2 1 2
1 0.994 0.002 0.990 0.997
2 0.994 0.002 0.997 0.005 0.935 0.121
3 0.994 0.002 0.997 0.005 0.821 0.309
4 0.451 0.725 0.985 0.946
5 0.426 0.815 - - - - - -
6 0.927 0.131 0.995 0.308 0.892
7 0.546 0.649 0.997 0.926 0.134
8 0.714 0.419 0.990 0.542 0.669
9 0.348 −0.464 0.663 0.967 0.065 0.364 0.839
10 0.503 0.740 0.725 0.467 0.952
11 0.951 0.997 0.460 0.779
12 −0.599 0.627 - - - 0.386 0.839
13 0.633 0.342 0.458 0.992 - - -
14 0.688 0.507 0.602 0.637 0.995
15 0.950 0.918
16 0.917 0.159 0.304 0.893

Table 5

Analysis results of G-study in year 2019, 2020 and 2021

Year Effect degree of freedom T Sum of Square Mean of Square Variance Component
2019 p 40 2843703.645 2214042.579 55351.064 −3489.952
r 3 651972.027 22310.961 7436.987 −1509.119
pr 120 11183319.360 8317304.754 69310.873 69310.873

Total 163 10553658.294

2020 p 48 60633.713 45420.887 946.268 71.335
r 3 15293.153 80.327 26.776 −12.942
pr 144 155887.719 95173.679 660.928 660.928

Total 195 140674.893

2021 p 46 189166927.415 152602872.108 3317453.741 −179236.126
R 3 37879212.951 1315157.643 438385.881 −76510.901
Pr 138 747229042.693 556746957.635 4034398.244 4034398.244

Total 187 710664987.385

Table 6

Analysis results of D-study in year 2020

Size of facets Variance components Coefficients

student rater σ2(τ) σ2(δ) σ2(Δ) Eρ2 Φ
1 4 71.335 165.232 165.232 0.30 0.30
1 8 71.335 82.616 82.616 0.46 0.46
1 12 71.335 55.077 55.077 0.56 0.56
1 21 71.335 31.473 31.473 0.69 0.69
1 22 71.335 30.042 30.042 0.70 0.70
1 23 71.335 28.736 28.736 0.71 0.71