README.md · TachyHealthResearch/medgemma-4b-medical-coding at main

File size: 4,226 Bytes

a918828

---
library_name: transformers
base_model: medGemma4B
tags:
- medical
- medical-coding
- icd10
- cpt
- hcpcs
- healthcare
- clinical
- fine-tuned
- peft
- lora
license: apache-2.0
language:
- en
pipeline_tag: text-generation
---

# medgemma-4b-medical-coding

## Model Description

This is a **LoRA adapter** fine-tuned on **medGemma4B** for medical coding tasks. The model is specifically designed to:

- Extract diseases and medical conditions from discharge summaries
- Identify medical procedures and interventions  
- Assign appropriate medical codes (ICD-10, CPT, HCPCS)
- Process clinical documentation with high accuracy

**Base Model:** `medGemma4B`  
**Fine-tuning Method:** LoRA (Low-Rank Adaptation)

## Training Details

### LoRA Configuration
- **Rank (r):** 16
- **Alpha:** 32
- **Dropout:** 0.1
- **Target Modules:** v_proj, up_proj, o_proj, gate_proj, k_proj, down_proj, q_proj

## Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("TachyHealthResearch/medgemma-4b-medical-coding")

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "medGemma4B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "TachyHealthResearch/medgemma-4b-medical-coding")
```

### Example Usage

```python
# Define the system prompt for medical coding
system_prompt = """You are an expert medical coding specialist. 
Analyze the discharge summary to extract diseases, procedures, and assign appropriate medical codes. 
Return the response in JSON format with this structure:
{"diseases": ["disease1", "disease2"], "icd10_codes": ["code1", "code2"], 
"procedures": ["procedure1", "procedure2"], "cpt_codes": ["code1", "code2"], 
"hcpcs_codes": ["code1", "code2"]}"""

# Example discharge summary
discharge_summary = """
Patient admitted with chest pain and shortness of breath. 
Diagnosed with acute myocardial infarction and congestive heart failure.
Underwent percutaneous coronary intervention with stent placement.
"""

# Prepare the conversation
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": f"Please analyze this discharge summary:\n\n{discharge_summary}"}
]

# Apply chat template and generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
generated_response = response[len(text):].strip()
print("Generated Medical Codes:")
print(generated_response)
```

## Model Performance

This model has been specifically fine-tuned for medical coding tasks and demonstrates strong performance in:

- Disease extraction from clinical text
- Medical procedure identification
- Medical code assignment (ICD-10, CPT, HCPCS)
- Structured JSON response generation

## Intended Use

### Primary Use Cases
- Medical coding automation
- Clinical documentation analysis  
- Healthcare data processing

### Limitations
- Always verify generated codes with qualified medical coding professionals
- Performance may vary on clinical documents significantly different from training data
- Intended for use in appropriate healthcare environments only

## License

This model is released under the Apache 2.0 License.

## Citation

If you use this model in your research or applications, please cite:

```bibtex
@misc{TachyHealthResearch_medgemma_4b_medical_coding_2024},
  title = {medgemma-4b-medical-coding: Medical Coding Model},
  author = {TachyHealthResearch},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/TachyHealthResearch/medgemma-4b-medical-coding}
}
```

---

**Important**: This model is intended for research and healthcare applications. Always ensure proper validation and human oversight when using AI models in medical contexts.