yqnis
/

llama3-8b-quaero

Text Generation

text-generation-inference

Model card Files Files and versions Community

llama3-8b-quaero / README.md

yqnis's picture

Update README.md

477a99f verified about 1 month ago

|

history blame contribute delete

3 kB

	---
	library_name: transformers
	tags:
	- unsloth
	- trl
	- sft
	- llama
	- ner
	- quaero
	- med
	---

	# LLaMA 3 8B fine-tuned on Quaero for Named Entity Recognition (Generative)

	This model is a 16-bit merged version of [unsloth/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct), fine-tuned on the [Quaero French medical dataset](https://quaerofrenchmed.limsi.fr/) using a generative approach to Named Entity Recognition (NER).

	## Task

	The model was trained to extract entities from French biomedical sentences (medlines) using a structured, prompt-based format.

	\| Tag \| Description \|
	\| ------ \| ----------------------------------------------------------- \|
	\| `DISO` \| Diseases or health-related conditions \|
	\| `ANAT` \| Anatomical parts (organs, tissues, body regions, etc.) \|
	\| `PROC` \| Medical or surgical procedures \|
	\| `DEVI` \| Medical devices or instruments \|
	\| `CHEM` \| Chemical substances or medications \|
	\| `LIVB` \| Living beings (e.g. humans, animals, bacteria, viruses) \|
	\| `GEOG` \| Geographical locations (e.g. countries, regions) \|
	\| `OBJC` \| Physical objects not covered by other categories \|
	\| `PHEN` \| Biological processes (e.g. inflammation, mutation) \|
	\| `PHYS` \| Physiological functions (e.g. respiration, vision) \|

	I use `<>` as a separator and the output format is :

	```
	TAG_1 entity_1 <> TAG_2 entity_2 <> ... <> TAG_n entity_n'
	```

	## Dataset

	The original dataset is Quaero French Medical Corpus and I converted it to a JSON format for generative instruction-style training.


	```json
	{
	"input": "Etude de l'efficacité et de la tolérance de la prazosine à libération prolongée chez des patients hypertendus et diabétiques non insulinodépendants.",
	"output": "DISO tolérance <> CHEM prazosine <> LIVB patients <> DISO hypertendus <> DISO diabétiques non insulinodépendants"
	}
	```

	The QUAERO French Medical corpus features overlapping entity spans, including nested structures, for instance :
	```json
	{
	"input": "Cancer du pancréas",
	"output": "DISO Cancer <> DISO Cancer du pancréas <> ANAT pancréas"
	}
	```

	## Evaluation

	Evaluation was performed on the test split by comparing the predicted entity set against the ground truth annotations using exact (type, entity) matching.

	\| Metric \| Score \|
	\| --------- \| ------ \|
	\| Precision \| 0.6827 \|
	\| Recall \| 0.7121 \|
	\| F1 Score \| 0.6971 \|


	## Other formats

	This model is also available in the following formats:

	- LoRA Adapter
	→ [yqnis/llama3-8b-quaero-lora](https://huggingface.co/yqnis/llama3-8b-quaero-lora)

	- GGUF Q8_0
	→ [yqnis/llama3-8b-quaero-gguf](https://huggingface.co/yqnis/llama3-8b-quaero-gguf)


	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.