File size: 5,274 Bytes

---
base_model: unsloth/phi-4-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
license: apache-2.0
language:
- en
---

# Uploaded  model

- **Developed by:** aksw
- **License:** apache-2.0
- **Finetuned from model :** unsloth/phi-4-unsloth-bnb-4bit

This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

## 📄 Model Card: `aksw/Bike-name`

### 🧠 Model Overview

`Bike-name` is a Medium fine-tuned language model designed to **extract biochemical names from scientific text articles**. It is ideal for Information Retrieval systems based on Biohemical Knowledge Extraction.

---

### 🚨 Disclaimer

This model cannot be used to compare with other methods in the Bike challenge or in scientific articles from the NatUKE Benchmark because it was trained with all the benchmark data. This means that this method used some of the NatUKE test data in its fine-tuning. It is intended for exploration in other benchmarks or for future Bike challenges where the test sets will not come from the NatUKE test sets.

---

### 🔍 Intended Use

* **Input**: Text from a Biochemical PDF file
* **Output**: A **single list** containing the corresponding biochemical names from the text.

---

### 🧩 Applications

* Question Answering systems over Biochemical Datasets
* Biochemical Knowledge graph exploration tools
* Extraction of biochemical names from scientific text articles

---

### ⚙️ Model Details

* **Base model**: Phi 4 14B (via Unsloth)
* **Training**: Scientific text articles
  * 418 unique names 
  * 143 articles
* **Target Ontology**: NatUke Benchmarking (https://github.com/AKSW/natuke)
* **Frameworks**: Unsloth, HuggingFace, Transformers

---

### 📦 Installation

Make sure to install `unsloth`, `torch` and CUDA dependencies:

```bash
pip install unsloth torch
```

---

### 🧪 Example: Inference Code

```python
from unsloth import FastLanguageModel
import torch

class BiKECompoundNameExtractor:
    def __init__(self, model_name: str, max_seq_length: int = 32768, load_in_4bit: bool = True):
        self.model, self.tokenizer = FastLanguageModel.from_pretrained(
            model_name=model_name,
            max_seq_length=max_seq_length,
            load_in_4bit=load_in_4bit
        )
        _ = FastLanguageModel.for_inference(self.model)

    def build_prompt(self, article_text: str) -> list:
        return [
            {"role": "system", "content": (
                "You are a scientist trained in chemistry.\n" 
                "You must extract information from scientific papers identifying relevant properties associated with each natural product discussed in the academic publication.\n"
                "For each paper, you have to analyze the content (text) to identify the *Compound name*. It can be more than one compound name.\n" 
                "Your output should be a list with the names. Return only the list, without any additional information.\n"
            )},
            {"role": "user", "content": article_text}
        ]

    def extract_compound_name(self, article_text: str, temperature: float = 0.01, max_new_tokens: int = 1024) -> str:
        si = "<|im_start|>assistant<|im_sep|>"
        sf = "<|im_end|>"
        messages = self.build_prompt(article_text)
        inputs = self.tokenizer.apply_chat_template(
            messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
        ).to("cuda")
        outputs = self.model.generate(inputs, max_new_tokens=max_new_tokens, use_cache=True, temperature=temperature, min_p=0.1)
        decoded = self.tokenizer.batch_decode(outputs)[0]
        parsed = decoded[decoded.find(si):].replace(si, "").replace(sf, "")
        try:
            l = eval(parsed)
        except:
            l = parsed
            print('Your output is not a list, you will need one more preprocessing step.')

        return l

# --- Using the model ---
if __name__ == "__main__":
    extractor = BiKECompoundNameExtractor(model_name="aksw/Bike-name")
    text = "Title, Abstract, Introduction, Background, Method, Results, Conclusion, References."
    list_names = extractor.extract_compound_name(text)
    print(list_names)
```

---

### 🧪 Evaluation

The model was evaluated using Hits@k on the test sets of the NatUKE Benchmark (do Carmo et al. 2023)

---

Do Carmo, Paulo Viviurka, et al. "NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature." 2023 IEEE 17th International Conference on Semantic Computing (ICSC). IEEE, 2023.


### 📚 Citation

If you use this model in your work, please cite it as:

```
@inproceedings{ref:doCarmo2025,
  title={Improving Natural Product Knowledge Extraction from Academic Literature with Enhanced PDF Text Extraction and Large Language Models},
  author={Viviurka do Carmo, Paulo and Silva G{\^o}lo, Marcos Paulo and Gwozdz, Jonas and Marx, Edgard and Marcondes Marcacini, Ricardo},
  booktitle={Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing},
  pages={980--987},
  year={2025}
}
```