--- library_name: transformers tags: - unsloth - trl - sft - med - mistral - quaero --- # Mistral 7B fine-tuned on Quaero for Named Entity Recognition (Generative) This model is a 16-bit merged version of [unsloth/mistral-7b-instruct-v0.3](https://huggingface.co/unsloth/mistral-7b-instruct-v0.3), fine-tuned on the [Quaero French medical dataset](https://quaerofrenchmed.limsi.fr/) using a generative approach to Named Entity Recognition (NER). ## Task The model was trained to extract entities from French biomedical sentences (medlines) using a structured, prompt-based format. | Tag | Description | | ------ | ----------------------------------------------------------- | | `DISO` | **Diseases** or health-related conditions | | `ANAT` | **Anatomical parts** (organs, tissues, body regions, etc.) | | `PROC` | **Medical or surgical procedures** | | `DEVI` | **Medical devices or instruments** | | `CHEM` | **Chemical substances or medications** | | `LIVB` | **Living beings** (e.g. humans, animals, bacteria, viruses) | | `GEOG` | **Geographical locations** (e.g. countries, regions) | | `OBJC` | **Physical objects** not covered by other categories | | `PHEN` | **Biological processes** (e.g. inflammation, mutation) | | `PHYS` | **Physiological functions** (e.g. respiration, vision) | I use `<>` as a separator and the output format is : ``` TAG_1 entity_1 <> TAG_2 entity_2 <> ... <> TAG_n entity_n ``` ## Dataset The original dataset is Quaero French Medical Corpus and I converted it to a JSON format for generative instruction-style training. ```json { "input": "Etude de l'efficacité et de la tolérance de la prazosine à libération prolongée chez des patients hypertendus et diabétiques non insulinodépendants.", "output": "DISO tolérance <> CHEM prazosine <> LIVB patients <> DISO hypertendus <> DISO diabétiques non insulinodépendants" } ``` The QUAERO French Medical corpus features **overlapping entity spans**, including nested structures, for instance : ```json { "input": "Cancer du pancréas", "output": "DISO Cancer <> DISO Cancer du pancréas <> ANAT pancréas" } ``` ## Evaluation Evaluation was performed on the test split by comparing the predicted entity set against the ground truth annotations using exact (type, entity) matching. | Metric | Score | | --------- | ------ | | Precision | 0.6883 | | Recall | 0.7143 | | F1 Score | 0.7011 | ## Other formats This model is also available in the following formats: - **LoRA Adapter** → [yqnis/mistral-7b-quaero-lora](https://huggingface.co/yqnis/mistral-7b-quaero-lora) - **GGUF Q5_K_M** → [yqnis/mistral-7b-quaero-gguf](https://huggingface.co/yqnis/mistral-7b-quaero-gguf) This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.