--- license: mit base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned tags: - text-embeddings - sentence-transformers - llm2vec - medical - chest-xray - radiology - clinical-nlp language: - en pipeline_tag: feature-extraction library_name: transformers --- # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis This model is a fine-tuned version of [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) specifically optimized for chest X-ray report analysis and medical text understanding. ## Model Description LLM2Vec4CXR is a bidirectional language model that converts the base decoder-only LLM into a text encoder optimized for medical text embeddings. The model has been fully fine-tuned with modified pooling strategy (`latent_attention`) to better capture semantic relationships in chest X-ray reports. ### Key Features - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct - **Pooling Mode**: Latent Attention (fine-tuned weights automatically loaded) - **Bidirectional Processing**: Enabled for better context understanding - **Medical Domain**: Specialized for chest X-ray report analysis - **Max Length**: 512 tokens - **Precision**: bfloat16 - **Automatic Loading**: Latent attention weights are automatically loaded from safetensors - **Simple API**: Built-in methods for similarity computation and instruction-based encoding ## Training Details ### Training Data - Fully fine-tuned on chest X-ray reports and medical text data - Training focused on understanding pleural effusion status and other chest X-ray findings ### Training Configuration - **Pooling Mode**: `latent_attention` (modified from base model) - **Enable Bidirectional**: True - **Max Length**: 512 - **Torch Dtype**: bfloat16 - **Full Fine-tuning**: All model weights were updated during training ## Usage ### Installation ```bash # Install the LLM2Vec4CXR package directly from GitHub pip install git+https://github.com/lukeingawesome/llm2vec4cxr.git # Or clone and install in development mode git clone https://github.com/lukeingawesome/llm2vec4cxr.git cd llm2vec4cxr pip install -e . ``` ### Basic Usage ```python import torch from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec # Load the model - latent attention weights are automatically loaded! device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = LLM2Vec.from_pretrained( base_model_name_or_path='lukeingawesome/llm2vec4cxr', pooling_mode="latent_attention", max_length=512, enable_bidirectional=True, torch_dtype=torch.bfloat16, use_safetensors=True, ).to(device).eval() # Configure tokenizer model.tokenizer.padding_side = 'left' # Simple text encoding report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases." embedding = model.encode_text([report]) # Multiple texts at once reports = [ "No acute cardiopulmonary abnormality.", "Small bilateral pleural effusions.", "Large left pleural effusion with compressive atelectasis." ] embeddings = model.encode_text(reports) ``` ### Advanced Usage with Instructions and Similarity ```python # For instruction-following tasks with separator instruction = 'Determine the change or the status of the pleural effusion.' report = 'There is a small increase in the left-sided effusion.' query_text = instruction + '!@#$%^&*()' + report # Compare against multiple options candidates = [ 'No pleural effusion', 'Pleural effusion present', 'Pleural effusion is worsening', 'Pleural effusion is improving' ] # Get similarity scores using the built-in method similarities = model.compute_similarities(query_text, candidates) print(f"Similarities: {similarities}") # For custom separator-based encoding embeddings = model.encode_with_separator([query_text], separator='!@#$%^&*()') ``` **Note**: The model now includes convenient methods like `compute_similarities()` and `encode_with_separator()` that handle complex tokenization automatically. ### Quick Start Example Here's a complete example showing the model's capabilities: ```python import torch from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec # Load model device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = LLM2Vec.from_pretrained( base_model_name_or_path='lukeingawesome/llm2vec4cxr', pooling_mode="latent_attention", max_length=512, enable_bidirectional=True, torch_dtype=torch.bfloat16, use_safetensors=True, ).to(device).eval() # Configure tokenizer model.tokenizer.padding_side = 'left' # Medical text analysis instruction = 'Determine the change or the status of the pleural effusion.' report = 'There is a small increase in the left-sided effusion.' query = instruction + '!@#$%^&*()' + report # Compare with different diagnoses options = [ 'No pleural effusion', 'Pleural effusion is worsening', 'Pleural effusion is stable', 'Pleural effusion is improving' ] # Get similarity scores scores = model.compute_similarities(query, options) best_match = options[torch.argmax(scores)] print(f"Best match: {best_match} (score: {torch.max(scores):.4f})") ``` ## API Reference The model provides several convenient methods: ### Core Methods - **`encode_text(texts)`**: Simple text encoding with automatic embed_mask handling - **`encode_with_separator(texts, separator='!@#$%^&*()')`**: Encoding with instruction/content separation - **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation - **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading ### Migration from Manual Usage If you were previously using manual tokenization, you can now simply use: ```python # Old way (still works) tokenized = model.tokenizer(text, return_tensors="pt", ...) tokenized["embed_mask"] = tokenized["attention_mask"].clone() embeddings = model(tokenized) # New way (recommended) embeddings = model.encode_text([text]) ``` ## Evaluation The model has been evaluated on chest X-ray report analysis tasks, particularly for: - Text retrieval/encoder - Medical text similarity comparison - Clinical finding extraction ### Sample Performance The model shows improved performance compared to the base model on medical text understanding tasks, particularly in distinguishing between different pleural effusion states and medical abbreviations. ## Intended Use ### Primary Use Cases - **Medical Text Embeddings**: Generate embeddings for chest X-ray reports - **Clinical Text Similarity**: Compare medical texts for semantic similarity - **Medical Information Retrieval**: Find relevant medical reports or findings - **Clinical NLP Research**: Foundation model for medical text analysis ### Limitations - Specialized for chest X-ray reports - may not generalize to other medical domains - Requires careful preprocessing for optimal performance - Should be used as part of a larger clinical decision support system, not for standalone diagnosis ## Technical Specifications - **Model Type**: Bidirectional Language Model (LLM2Vec) - **Architecture**: LlamaBiModel (modified Llama 3.2) - **Parameters**: ~1B parameters - **Input Length**: Up to 512 tokens - **Output**: Dense embeddings - **Precision**: bfloat16 ## Citation If you use this model in your research, please cite: ```bibtex @misc{llm2vec4cxr, title={LLM2Vec4CXR: Fine-tuned LLM for Chest X-ray Report Analysis}, author={Hanbin Ko}, year={2025}, howpublished={\\url{https://huggingface.co/lukeingawesome/llm2vec4cxr}}, } ``` A preprint of this model will be released soon. ## Acknowledgments This model is built upon: - [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders - [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models ## License This model is licensed under the MIT License.