Text Generation
PyTorch
English
llama
conversational

EC-RAFT: Automated Generation of Clinical Trial Eligibility Criteria

Model Description

EC-RAFT is a fine-tuned Retrieval-Augmented Fine-Tuning (RAFT) model based on LLaMA-3.1-8B-Instruct architecture.
It is designed to automatically generate structured, high-quality clinical trial eligibility criteria (EC) directly from trial titles and descriptions.

EC-RAFT integrates domain-specific retrieval with synthesized intermediate reasoning steps, enabling it to produce clinically relevant and contextually appropriate EC sets.

Fine-tuning Details

  • Original Model: LLaMA-3.1-8B-Instruct
  • Datasets used for fine-tuning:
    • ClinicalTrials.gov (267,347 trials, 2000–2024) biodatlab/ec-raft-dataset
    • Retrieval corpus constructed using SciNCL model
    • Intermediate reasoning steps R generated using Gemini-1.5-flash-002
    • Fine-tuning method:
      • Retrieval-Augmented Fine-Tuning (RAFT)
      • Low-Rank Adaptation (LoRA)

Model Performance

Evaluated on a held-out ClinicalTrials.gov test split:

Metric Score
BERTScore (semantic similarity) 86.23
Precision (LLM-guided evaluation) 78.84%
Recall (LLM-guided evaluation) 75.89%
Mean LLM-as-a-Judge Score (0–3) 1.7150
Mean Pair-BERTScore 67.76
  • Outperforms zero-shot LLaMA-3.1 and Gemini-1.5-flash baselines
  • Outperforms fine-tuned LLaMA and Meditron baselines
  • Clinically validated: LLM-as-a-Judge scores highly correlated with human physician evaluation

Intended Use

  • Assist researchers, trial designers, and sponsors in drafting clinical trial eligibility criteria.
  • Automate EC generation to reduce manual effort and improve consistency.
  • Support clinical trial design transparency and quality.
  • Enable integration with trial registry platforms, clinical trial matching systems, and EC recommendation tools.

Limitations

  • Requires human validation of generated EC before clinical use.
  • Trained on public ClinicalTrials.gov data β€” may not generalize well to:
    • Rare or novel diseases
    • Specialized or non-standard trial designs
    • Non-public trial data
  • Optimized for English-language clinical trials.
  • As with any LLM-based system, risks include hallucination, subtle errors, and domain shifts.
  • Evaluation metrics (BERTScore, LLM-as-a-Judge) are proxies β€” not full substitutes for domain expert review.

Acknowledgments

This model was developed using resources provided by:

  • RAVIS Technology for feedback and collaboration.
  • Faculty of Medicine Ramathibodi Hospital
  • NSTDA Supercomputer Center (ThaiSC), Project #pv814001

We also acknowledge the contributions of the broader open-source community whose tools and prior works on RAFT, SciNCL, LoRA, LLaMA-3, and biomedical NLP made this project possible.

Downloads last month
105
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train biodatlab/ec-raft