OncoReasoning-3B

This model is a Llama 3.2-3B-Instruct checkpoint distilled from gpt-oss-120b to assist with cancer clinical trial matching tasks.

Tasks

Extract trial “spaces” (target populations) from eligibility criteria
Concepts used:
- Cancer type & histology
- Burden of disease (curative-intent vs palliative/metastatic)
- Prior treatments
- Biomarkers
  Also identifies common cross-trial boilerplate exclusions: pneumonitis, heart failure, renal dysfunction, liver dysfunction, uncontrolled brain metastases, HIV/hepatitis, poor performance status.
Tag individual patient clinical documents (pathology, imaging, oncologist notes) and emit JSON with excerpt + tags, e.g.:
- stage_at_diagnosis, treatment, cancer_burden, cancer_status, adverse_event, biomarker, comorbidity, uncontrolled_brain_met,
- measurable_disease (≥ 1.0 cm lesion or LN ≥ 1.5 cm short axis),
- progressive_disease, pneumonitis, colitis, hepatitis_or_hiv,
- anemia (Hgb < 10), renal_dysfunction (eGFR < 60), liver_dysfunction (↑ bilirubin/AST/ALT), heart_failure, poor_ps (ECOG ≥ 2).
Example JSON (per excerpt):
```
{"excerpt": "CT chest shows new 1.8 cm RLL nodule; prior 0.9 cm.", "tags": ["measurable_disease", "progressive_disease"]}
```
Summarize a patient’s history from concatenated relevant text.
Assess trial-space fit given a patient summary and a trial space.
Screen for common exclusion risks given a patient summary and boilerplate exclusions.

Notes

“Trial space” = intended target patient population. The concept does not incorporate common boilerplate exclusion criteria, such as uncontrolled brain metastases.
This is a research tool in development; it is not a standalone diagnostic tool, intended for clinical practice, or an approved medical device.

Inference (task-by-task) with vLLM

Setup

from vllm import LLM, SamplingParams

# Long context length is mostly needed for patient summarization; for other tasks ~10k is typically fine.
llm = LLM(model="ksg-dfci/OncoReasoning-3B", max_model_len=120000)

# Compatibility alias: some functions reference `llama` internally
llama = llm

tok = llm.get_tokenizer()

1) Extract trial “spaces” from eligibility criteria

def summarize_trials_multi_cohort(eligibility_texts, llama_model):

    tokenizer = llama.get_tokenizer()
    prompts = []
    for trial in eligibility_texts:
        messages = [
            {'role':'system', 'content': """
        Reasoning: high.
        """},      
              
            {'role':'user', 'content': """
            You are an expert clinical oncologist with an encyclopedic knowledge of cancer and its treatments.
        Your job is to review a clinical trial document and extract a list of structured clinical spaces that are eligible for that trial.
        A clinical space is defined as a unique combination of cancer primary site, histology, which treatments a patient must have received, which treatments a patient must not have received, cancer burden (eg presence of metastatic disease), and tumor biomarkers (such as germline or somatic gene mutations or alterations, or protein expression on tumor) that a patient must have or must not have; that renders a patient eligible for the trial.
        Trials often specify that a particular treatment is excluded only if it was given within a short period of time, for example 14 days, one month, etc , prior to trial start. Do not include this type of time-specific treatment eligibility criteria in your output at all.
        Some trials have only one space, while others have several. Do not output a space that contains multiple cancer types and/or histologies. Instead, generate separate spaces for each cancer type/histology combination.
        For biomarkers, if the trial specifies whether the biomarker will be assessed during screening, note that.
        Spell out cancer types; do not abbreviate them. For example, write "non-small cell lung cancer" rather than "NSCLC".
        Structure your output like this, as a list of spaces, with spaces separated by newlines, as below:
        1. Cancer type allowed: <cancer_type_allowed>. Histology allowed: <histology_allowed>. Cancer burden allowed: <cancer_burden_allowed>. Prior treatment required: <prior_treatments_requred>. Prior treatment excluded: <prior_treatments_excluded>. Biomarkers required: <biomarkers_required>. Biomarkers excluded: <biomarkers_excluded>.
        2. Cancer type allowed: <cancer_type_allowed>, etc.
        If a particular concept is not mentioned in the trial text, do not include it in your definition of trial space(s).
        After you output the trial spaces, output a newline, then the text "Boilerplate exclusions:", then another newline.
        Then, list exclusion criteria described in the trial text that are unrelated to the trial space definitions. Such exclusions tend to be common to clinical trials in general.
        Common boilerplate exclusion criteria include a history of pneumonitis, heart failure, renal dysfunction, liver dysfunction, uncontrolled brain metastases, HIV or hepatitis, and poor performance status. """ +  "Here is a clinical trial document: \n" + trial + "\n" + """Now, generate your list of the trial space(s), followed by any boilerplate exclusions, formatted as above.
            Do not provide any introductory, explanatory, concluding, or disclaimer text.
            Reminder: Treatment history is an important component of trial space definitions, but treatment history requirements that are described as applying only in a given period of time prior to trial treatment MUST BE IGNORED."""
            }
        ]
    
        prompts.append(tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False))
    

    
    responses = llama_model.generate(
        prompts,   
        SamplingParams(
        temperature=0.0,
        top_p=0.5,
        max_tokens=10000,
        repetition_penalty=1.3
        #stop_token_ids=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")],  # KEYPOINT HERE
    ))

    response_texts = [x.outputs[0].text for x in responses]


    return responses, response_texts

2) Tag patient clinical documents → JSON excerpts + tags

import re, json

def tag_chunks(patient_texts, llama_model):
    
    tokenizer = llama_model.get_tokenizer()

    prompts = []
    for the_patient in patient_texts:
        temp_patient = re.sub("\n|\r", " ", the_patient.strip())
        temp_patient = re.sub(r'\s+', " ", temp_patient)
        sentences = "<excerpt break>" + re.sub("\\. ", "<excerpt break>", temp_patient) + "<excerpt break>"
    
        messages = [{'role':'system', 'content': """You are an oncology clinical note data extraction bot.
        Your job is to review a list of excerpts from a clinical document and extract the excerpts relevant to a list of questions.
        Reasoning: high
        """        
        
        },

        {'role':'user', 'content': "The list of excerpts, separated by <excerpt break>, is: " + sentences + 
        """Now, list the excerpts relevant to any of the following questions.
        Format your answer as JSON, tagging each excerpt that is relevant to at least one question with each tag to which it is relevant.
        Here is the list of questions:
        What type of cancer (primary site and histology) does the patient have? (Tag: cancer_type )
        What was the stage at diagnosis? (Tag: stage_at_diagnosis)
        What treatments (including surgery, radiation, or systemic therapy) has the patient received? (Tag: treatment)
        How widespread is the cancer currently? (Tag: cancer_burden)
        Is there response to therapy or progressive disease? (Tag: cancer_status)
        Is the patient experiencing an adverse event of treatment? (Tag: adverse_event)
        What biomarkers, such as protein expression and genetic mutations/alterations, does the patient's tumor have? (Tag: biomarker)
        What comorbidities, or diseases other than cancer, does the patient have? (Tag: comorbidity)
        Are there uncontrolled brain metastases? (Tag: uncontrolled_brain_met)
        Is there measurable disease, meaning a tumor at least 1 cm across or lymph node at least 1.5 cm in short axis dimension? (Tag: measurable_disease)
        Is there progressive (worsening) disease? (Tag: progressive_disease)
        Is there a history of pneumonitis? (Tag: pneumonitis)
        Is there a history of colitis? (Tag: colitis)
        Is there a history of hepatitis or HIV? (Tag: hepatitis_or_hiv)
        Is the patient anemic, with hemoglobin under 10? (Tag: anemia)
        Is there a reduced renal function/creatinine clearance, with estimated GFR < 60? (Tag: renal_dysfunction)
        Is there liver dysfunction, with elevated bilirubin, AST, or ALT? (Tag: liver_dysfunction)
        Is there a history of heart failure? (Tag: heart_failure)
        Does the patient have a poor performance status and/or ECOG performance status of 2 or more? (Tag: poor_ps)
        What adverse side effects of treatment has the patient had? (Tag: adverse_event)
        Here is an example of the output format:
        [{"excerpt": "80M with metastatic lung adenocarcinoma.", "tags": ["cancer_type", "cancer_burden"]},
         {"excerpt": "The tumor was HER2 positive.", "tags": ["biomarker"]},
         {"excerpt": "Imaging demonstrated new bilateral lung infiltrates.", "tags": ["pneumonitis", "adverse_event"]},
         {"excerpt": "LV ejection fraction was 35%.", "tags": ["heart_failure"]}
        ]
        Do not include excerpts that are not relevant to the questions. 
        Do not abbreviate or alter excerpts that you do include; copy them verbatim from the prompt.
        Do not add disclaimers or introductory text.
        If there are no excerpts relevant to the above questions, just output blank JSON {} .
        """}
        ]

        prompts.append(messages)

    long_messages = [x[1]['content'] for x in prompts]
    trunc_messages = tokenizer.batch_decode([x[-20000:] for x in tokenizer(long_messages, add_special_tokens=False).input_ids])

    newprompts = []
    for i, messages in enumerate(prompts):
        messages[1]['content'] = trunc_messages[i]
        template_prompt = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
        newprompts.append(template_prompt)
        

    
    responses = llama_model.generate(
        newprompts,     
        SamplingParams(
        temperature=0.1,
        top_p=0.2,
        max_tokens=10000,
        repetition_penalty=1.2,
    ))

    response_texts = [x.outputs[0].text for x in responses]


    return responses, response_texts

3) Summarize patient history

# Expects a list of long documents, one per patient.
# Each patient level document is a concatenation of useful excerpts pulled from all documents using the tagging function in #2 or ksg-dfci/TinyBertTagger.

def summarize_patients(patient_texts, llama_model):
    prompts = []

    tokenizer = llama_model.get_tokenizer()

    prompts = []
    for patient_text in patient_texts:

        patient_text_tokens = tokenizer(patient_text, add_special_tokens=False).input_ids
        if len(patient_text_tokens) > 115000:
            first_part = patient_text_tokens[:57500]
            # Slice the last `slice_size` elements
            last_part = patient_text_tokens[-57500:]
            # Concatenate the two slices
            patient_text = tokenizer.decode(first_part) + " ... " + tokenizer.decode(last_part)


    
        messages = [{'role':'system', 'content': 'Reasoning: high'}, 
                    {'role':'user', 'content': """
                    You are an experienced clinical oncology history summarization bot.
        Your job is to construct a summary of the cancer history for a patient based on an excerpt of the patient's electronic health record. The text in the excerpt is provided in chronological order.     
        Document the cancer type/primary site (eg breast cancer, lung cancer, etc); histology (eg adenocarcinoma, squamous carcinoma, etc); current extent (localized, advanced, metastatic, etc); biomarkers (genomic results, protein expression, etc); and treatment history (surgery, radiation, chemotherapy/targeted therapy/immunotherapy, etc, including start and stop dates and best response if known).
        Do not consider localized basal cell or squamous carcinomas of the skin, or colon polyps, to be cancers for your purposes.
        Do not include the patient's name, but do include relevant dates whenever documented, including dates of diagnosis and start/stop dates of each treatment.
        If a patient has a history of more than one cancer, document the cancers one at a time.
        Format your response as free text, not as a table.
        Also document any history of conditions that might meet "boilerplate" exclusion criteria, including uncontrolled brain metastases, lack of measurable disease, congestive heart failure, pneumonitis, renal dysfunction, liver dysfunction, and HIV or hepatitis infection. For each of these, present the evidence from the history that the patient has a history of such a condition, including dates.
        Clearly separate the "boilerplate" section by labeling it "Boilerplate: " before describing any such conditions.
        Here is an example of the desired output format:
        
        Cancer type: Lung cancer
        Histology: Adenocarcinoma
        Current extent: Metastatic
        Biomarkers: PD-L1 75%, KRAS G12C mutant
        Treatment history: 
        # 1/5/2020-2/5/2021: carboplatin/pemetrexed/pembrolizumab
        # 1/2021: Palliative radiation to progressive spinal metastases
        # 3/2021-present: docetaxel
        Boilerplate:
        No evidence of common boilerplate exclusion criteria

        """ + "The excerpt for you to summarize is is:\n" + patient_text + """\nNow, write your summary. Do not add preceding text before the abstraction, and do not add notes or commentary afterwards. This will not be used for clinical care, so do not write any disclaimers or cautionary notes."""}

                     ]
    


        prompts.append(messages)

    trunc_messages = [x[1]['content'] for x in prompts]
    #trunc_messages = tokenizer.batch_decode([x[-115000:] for x in tokenizer(long_messages, add_special_tokens=False).input_ids])

    newprompts = []
    for i, messages in enumerate(prompts):
        messages[1]['content'] = trunc_messages[i]
        template_prompt = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
        newprompts.append(template_prompt)
        

    
    responses = llama_model.generate(
        newprompts,     
        SamplingParams(
        temperature=0.0,
        top_p=0.2,
        max_tokens=7500,
        repetition_penalty=1.2
    ))

    response_texts = [x.outputs[0].text for x in responses]


    return responses, response_texts

4) Assess candidate patient-trial space match

# This is batched; takes in a list of patient summaries and corresponding list of trial space definitions for checking.
# Trial space definitions should not include boilerplate criteria.
def ask_about_trials_loosely(patient_summaries, trial_summaries, llama_model):
    tokenizer = llama_model.get_tokenizer()

    prompts = []

    for patient_summary, trial_summary in zip(patient_summaries, trial_summaries):
        messages = [{'role':'system', 'content': "Reasoning: high"},
                    {'role':'user', 'content': """You are a brilliant oncologist with encyclopedic knowledge about cancer and its treatment. 
    Your job is to evaluate whether a given clinical trial is a reasonable consideration for a patient, given a clinical trial summary and a patient summary. 
Here is a summary of the clinical trial:\n""" + trial_summary + "\nHere is a summary of the patient:\n" + patient_summary + """
Base your judgment on whether the patient generally fits the cancer type(s), cancer burden, prior treatment(s), and biomarker criteria specified for the trial.
You do not have to determine if the patient is actually eligible; instead please just evaluate whether it is reasonable for the trial to be considered further by the patient's oncologist.
Biomarker criteria have to be considered carefully. Some trials have biomarker requirements that are not assessed until formal trial screening. A trial may therefore sometimes be a reasonable consideration for a patient even if a required biomarker is not known to be present in the patient.
However, if a required biomarker is known to be absent, or can be assumed to be absent based on other information, the trial is not a reasonable consideration. For example, if a trial for lung cancer requires an EGFR mutation, documentation that there is no EGFR mutation indicates the trial is not a reasonable consideration. Similarly, documentation of a KRAS mutation in the patient indicates the trial is not a reasonable consideration, since, as you know, KRAS and EGFR driver mutations in lung cancer are mutually exclusive.
Don't provide ethical judgments or comment on resource constraints with respect whether the trial is a reasonable clinical consideration; just evaluate whether it is, given the available information.
Reason step by step, then answer the question "Is this trial a reasonable consideration for this patient?" with a one-word "Yes!" or "No!" answer.
Make sure to include the exclamation point in your final one-word answer."""}]

    
        prompt = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
        prompts.append(prompt)
        
    responses = llama_model.generate(
        prompts,     
        SamplingParams(
        temperature=0.0,
        top_p=0.2,
        max_tokens=25000,
        repetition_penalty=1.2,
        #stop_token_ids=[tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>")],  # KEYPOINT HERE
    ))

    response_texts = [x.outputs[0].text for x in responses]

    eligibility_results = []

    for response_text in response_texts:
        if ("Yes!" in response_text[-10:]) or ("YES!" in response_text[-10:]):
            eligibility_results.append(1.0)
        else:
            eligibility_results.append(0.0)
    
    return responses, response_texts, eligibility_results

5) Screen for “boilerplate” exclusion criteria

# Batched; expects a list of patient 'boilerplate' texts (one per patient) extracted from patient summaries, and a list of the same length containing trial boilerplate criteria to check each patient boilerplate text against
def ask_about_boilerplate(patient_boilerplates, trial_boilerplates, llama_model):

    tokenizer = llama_model.get_tokenizer()

    prompts = []

    for patient_boilerplate, trial_boilerplate in zip(patient_boilerplates, trial_boilerplates):
        messages = [{'role':'system', 'content': "Reasoning: high"},
                     {'role':'user', 'content': """You are a brilliant oncologist with encyclopedic knowledge about cancer and its treatment. 
    Your job is to evaluate whether a patient has any underlying medical conditions that would exclude him or her from a specific clinical trial.\n 
Here is an extract of the patient's history:\n""" + patient_boilerplate + "\nHere are the exclusion criteria for the trial:\n" + trial_boilerplate + """
Note that the extract was generated by prompting an LLM to determine whether the patient meets specific common exclusion criteria, such as uncontrolled brain metastases, lack of measurable disease, congestive heart failure, pneumonitis, renal dysfunction, liver dysfunction, and HIV or hepatitis infection, and to present evidence for whether the patient met the criterion.
You should therefore not assume that mention of such condition means the patient has the condition; it may represent the LLM reasoning about whether the patient has the condition.
Based on the extract, you should determine whether the patient clearly meets one of the exclusion criteria for this specific trial.
Do not evaluate exclusion criteria other than those listed for this trial.
Reason through one exclusion criterion at a time. Generate a numbered list of the criteria as you go. For each one, decide whether the patient clearly meets the exclusion criteron. If it is not completely clear that the patient meets the exclusion criterion, give the patient the benefit of the doubt, and err on the side of deciding the patient is not excluded. A description in the patient extract that a condition is mild, low-grade, or resolved is even more of a reason not to exclude the patient based on that condition.
Once you have evaluated all exclusion criteria, answer the question "Is this patient clearly excluded from this trial?" with a one-word "Yes!" or "No!" answer, based on whether the patient clearly met any of the individual exclusion criteria. It is critical that your final word be either "Yes!" or "No!", verbatim, and case-sensitive.
Make sure to include the exclamation point in your final one-word answer.
No introductory text or concluding text after that final answer."""}]

    
        prompt = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
        prompts.append(prompt)
        
    responses = llama_model.generate(
        prompts,     
        SamplingParams(
        temperature=0.0,
        top_p=0.2,
        max_tokens=25000,
        repetition_penalty=1.2,
    ))

    response_texts = [x.outputs[0].text for x in responses]

    exclusion_results = []

    for response_text in response_texts:
        if ("Yes!" in response_text[-10:]) or ("YES!" in response_text[-10:]):
            exclusion_results.append(1.0)
        else:
            exclusion_results.append(0.0)
    
    return responses, response_texts, exclusion_results

Usage & post-processing examples

If your code uses llm but a function references llama internally, the Setup section already defined llama = llm.

A) Trial “space” extraction — calling `summarize_trials_multi_cohort` and parsing spaces + boilerplate

eligibility_texts = [
    """Title: A Study of <…>
    Eligibility:
    - Histologically confirmed non-small cell lung cancer (adenocarcinoma).
    - Requires ALK fusion (screening assay permitted).
    - Prior platinum-based chemo-immunotherapy allowed.
    Exclusions: NYHA class III–IV heart failure, active hepatitis B or C, uncontrolled brain metastases…"""
]

# 1) Run the extractor
responses, response_texts = summarize_trials_multi_cohort(eligibility_texts, llm)

# 2) Split model output into trial-space text and boilerplate text.
import re

def split_spaces_and_boilerplate(raw: str):
    # Some chat templates may include a marker like 'assistantfinal'; handle robustly.
    if "assistantfinal" in raw:
        raw = raw.split("assistantfinal", 1)[-1]
    parts = raw.split("Boilerplate exclusions:", 1)
    space_text = parts[0].strip()
    boilerplate_text = parts[1].strip() if len(parts) > 1 else ""
    return space_text, boilerplate_text

# 3) Turn the numbered list into individual spaces
def explode_numbered_spaces(space_text: str):
    # Expect lines like: "1. Cancer type allowed: … Histology allowed: … Biomarkers required: …"
    lines = [ln.strip() for ln in space_text.splitlines() if ln.strip()]
    numbered = [ln for ln in lines if re.match(r"^\s*\d+\.", ln)]
    return numbered

# 4) Apply to batch
trial_spaces = []
trial_boilerplates = []
for txt in response_texts:
    spaces_str, boilerplate_str = split_spaces_and_boilerplate(txt)
    spaces_list = explode_numbered_spaces(spaces_str)
    trial_spaces.append(spaces_list)
    trial_boilerplates.append(boilerplate_str)

print("Spaces:", trial_spaces[0])
print("Boilerplate:", trial_boilerplates[0])

# Optional: row-per-space dataframe
import pandas as pd
rows = []
for trial_idx, (spaces_list, boilerplate_str) in enumerate(zip(trial_spaces, trial_boilerplates)):
    for k, space in enumerate(spaces_list, start=1):
        rows.append({"trial_idx": trial_idx, "space_number": k, "this_space": space, "boilerplate_text": boilerplate_str})
cohort_level_trials = pd.DataFrame(rows)
cohort_level_trials.head()

B) Patient summarization output — extracting main summary and patient boilerplate

# You likely called:
# responses, summary_texts = summarize_patients([long_note], llm)
patient_summary_full = summary_texts[0]

def split_patient_summary_and_boilerplate(summary_text: str):
    parts = summary_text.split("Boilerplate:", 1)
    main_summary = parts[0].strip()
    patient_boilerplate = parts[1].strip() if len(parts) > 1 else ""
    return main_summary, patient_boilerplate

patient_summaries = []
patient_boilerplates = []
main, boiler = split_patient_summary_and_boilerplate(patient_summary_full)
patient_summaries.append(main)
patient_boilerplates.append(boiler)

print("Patient summary:\n", patient_summaries[0][:400], "…")
print("\nPatient boilerplate:\n", patient_boilerplates[0][:400], "…")

C) Trial space reasonable consideration check

responses, texts, yhat = ask_about_trials_loosely(
    [patient_summary],                                 # from step 3
    ["ALK+ metastatic NSCLC, prior chemo-immunotherapy allowed; requires ALK fusion"],  # trial space
    llm
)

D) Boilerplate exclusion checks — calling `ask_about_boilerplate` and reading the “Yes!/No!” result

# Inputs must be same length:
# - patient_boilerplates: list[str] from (B)
# - trial_boilerplates:   list[str] from (A)

responses, response_texts, exclusion_results = ask_about_boilerplate(
    patient_boilerplates, trial_boilerplates, llm
)

for pb, tb, text, res in zip(patient_boilerplates, trial_boilerplates, response_texts, exclusion_results):
    print("Excluded?", bool(res))
    print("LLM reasoning (tail):", text[-400:], "\n")

Downloads last month: 325

Safetensors

Model size

3.21B params

Tensor type

BF16

Model tree for ksg-dfci/OncoReasoning-3B-0825

Quantizations

2 models