October 22, 2024
Medical coding deserves a smarter approach. We're building a Solution that reimagines ICD-10-CM processes by combining clinical expertise with advanced AI capabilities. Our approach will enhance coding accuracy, streamline documentation, and adapt to each healthcare team's unique workflow. Learn how transformative Approaches to Medical Coding with LLM-Based applications are shaping the future of medical coding—where efficiency meets precision, and technology empowers human expertise.
by Sheijer Silva
Medical coding is a critical process in healthcare, as it directly impacts areas such as billing, resource management, auditing, and epidemiological studies. There are various coding systems, such as the Current Procedural Terminology (CPT) or the Healthcare Common Procedure Coding System (HCPCS), used to code medical procedures and services. However, when it comes to coding diagnoses and diseases, one of the most widely used systems in the United States is the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). The ICD-10-CM is an adaptation of the ICD-10 developed by the World Health Organization, clinically modified for use in the United States. It provides detailed codes for diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injuries or diseases.
Traditionally, this work has been performed by human coders, making it a costly, slow, and error-prone task. However, automating this process has been a research goal for decades, and recent advancements in artificial intelligence (AI) have opened up new possibilities for improving the accuracy and efficiency of medical coding.
According to some studies, large language models (LLMs), despite their impressive capabilities in other areas, have proven to be inefficient at accurately extracting ICD-10-CM codes when evaluated with basic prompts and without the support of advanced techniques or additional approaches such as Retrieval-Augmented Generation (RAG). As multipurpose models without a deep understanding of specialized medical terminology, their effectiveness in medical coding may be limited. Additionally, the lack of validated and specific datasets for rigorous testing also contributes to these shortcomings. The application of advanced techniques, specialized approaches, and a larger volume of high-quality data could significantly improve these results.
To develop an effective coding tool, it is crucial to leverage the power of current LLMs as an integral part of the solution, but not as the sole tool. Advanced prompting techniques and information retrieval, as well as the effective orchestration of specialized services and the application of traditional machine learning methods, among others, should be employed. Below, we will explain some of the building blocks or techniques:
At this stage, a thorough analysis of the medical text is required, as this step is crucial for accurately identifying and isolating relevant medical terms, underlying conditions, and diagnoses present in the input text. This process:
Challenges: Capturing entities that consist of extensive explanations, cases where the specialist who wrote the text omits information because they consider it implicit, or where the diagnosis is simply not entirely clear, accurately capturing the relationships between recognized medical entities, underlying conditions, negated terms, etc.
Tools:
This step ensures that the system has access to the most up-to-date and comprehensive ICD-10-CM code database. Additionally, it reduces the size of the data subset under consideration by adjusting relevant hyperparameters, such as top K, based on experimental results. This guarantees improvements in the following areas:
Challenges: There are numerous synonyms for each medical term, and capturing the meaning solely through embeddings requires additional effort. Semanticity alone is not sufficient. Moreover, although synonym databases are accessible, they have some limitations in terms of agreement and size.
Tools:
Once the subset of medical codes extracted in the previous step has been selected and a fine-tuned Top K applied through experimentation, it is necessary to reorganize and filter the obtained data to provide the LLM that will perform the inference with the maximum amount of relevant information, while minimizing noise and reducing the number of tokens used in the context window.
Leverage the power of LLMs to interpret medical text without specific training in ICD-10-CM coding. This enables:
Challenges: Since the model only has access to the information it was trained on, this can lead to hallucinations in its responses. The prompt might not cover corner cases or scenarios that weren't considered. Additionally, it is crucial to avoid using LLMs in this way if up-to-date information is required
Including relevant and dynamic information in the prompt of a large language model (LLM) has been shown to improve performance, in this case by providing it with examples and information from official ICD-10-CM coding guidelines related to the medical entity being coded.
Challenges: The choice of information and examples used is particularly important for In-Context Learning (ICL). Additional retrieval techniques may need to be considered if bias reduction is desired.
Tools:
This final step involves making logical decisions to select the most appropriate ICD-10-CM code(s) based on the extracted information and the outputs from the LLM, while also avoiding the "black box" effect that AI-based systems often present.
Challenges: The LLM may hallucinate and provide short or incorrect responses, or omit codes it deems unimportant due to biases in its training. This may require prompt engineering, derived through the analysis of results from a large number of experiment iterations.
Automation in medical coding is becoming increasingly important, and although AI-based tools like LLMs offer tremendous potential to enhance the efficiency of the process, they also present significant limitations in terms of accuracy and reliability. While LLMs have the potential to accelerate and optimize coding, their use alone is not sufficient to meet the high standards of accuracy required in the medical field.
To overcome these challenges, it is essential to combine LLMs with more specialized approaches, such as advanced prompting techniques, medical entity extraction, and up-to-date information retrieval. Employing strategies like Few-Shot Prompting, Retrieval, Re-Ranking, and the application of traditional coding rules helps tailor these models to the specific demands of the ICD-10-CM system, enabling greater accuracy and consistency in the results. Although challenges remain, such as the possibility of hallucinations and insufficient generation of correct codes, the hybrid approach that integrates AI with traditional methods holds promise for significantly improving both the efficiency and accuracy of medical code assignment.
Boyle, J. S., Kascenas, A., Lok, P., Liakata, M., & O’Neil, A. Q. (2023). Automated clinical coding using off-the-shelf large language models. Canon Medical Research Europe, Queen Mary University of London, University of Edinburgh, Anglia Ruskin University, The Alan Turing Institute, University of Warwick. Retrieved from arXiv.
Simmons, A., Takkavatakarn, K., McDougal, M., Dilcher, B., Pincavitch, J., Meadows, L., Kauffman, J., Klang, E., Wig, R., Smith, G., Soroush, G. N., Freeman, R., Apakama, D. J., Charney, A. W., Kohli-Seth, R., & Sakhuja, A. (2024). Benchmarking Large Language Models for Extraction of International Classification of Diseases Codes from Clinical Documentation. MedRxiv. Retrieved from MedRxiv.
Nori, H., Lee, Y. T., Zhang, S., Carignan, D., Edgar, R., Fusi, N., King, N., Larson, J., Li, Y., Liu, W., Luo, R., McKinney, S. M., Ness, R. O., Poon, H., Qin, T., Usuyama, N., White, C., & Horvitz, E. (2023). Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. Microsoft. Retrieved from arXiv.
Li, R., Wang, X., & Yu, H. (2023). Exploring LLM Multi-Agents for ICD Coding. UMass Amherst, Microsoft, VA Bedford Healthcare System. Retrieved from arXiv.
Centers for Medicare & Medicaid Services (CMS), & National Center for Health Statistics (NCHS). (2024). ICD-10-CM Official Guidelines for Coding and Reporting FY 2024 (April 1, 2024 - September 30, 2024). U.S. Department of Health and Human Services. Retrieved from CMS.
Soroush, A., Glicksberg, B. S., Zimlichman, E., Barash, Y., Freeman, R., Charney, A. W., Nadkarni, G. N., & Klang, E. (2024). Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying. NEJM AI, 1(5). Massachusetts Medical Society. Retrieved from NEJM AI.
Pinecone. (n.d.). Rerankers and Two-Stage Retrieval. Pinecone. Retrieved from Pinecone.