aksw
/

Uploaded model

  • Developed by: aksw
  • License: apache-2.0
  • Finetuned from model : unsloth/phi-4-unsloth-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

📄 Model Card: aksw/Bike-name

🧠 Model Overview

Bike-name is a Medium fine-tuned language model designed to extract biochemical names from scientific text articles. It is ideal for Information Retrieval systems based on Biohemical Knowledge Extraction.


🚨 Disclaimer

This model cannot be used to compare with other methods in the Bike challenge or in scientific articles from the NatUKE Benchmark because it was trained with all the benchmark data. This means that this method used some of the NatUKE test data in its fine-tuning. It is intended for exploration in other benchmarks or for future Bike challenges where the test sets will not come from the NatUKE test sets.


🔍 Intended Use

  • Input: Text from a Biochemical PDF file
  • Output: A single list containing the corresponding biochemical names from the text.

🧩 Applications

  • Question Answering systems over Biochemical Datasets
  • Biochemical Knowledge graph exploration tools
  • Extraction of biochemical names from scientific text articles

⚙️ Model Details

  • Base model: Phi 4 14B (via Unsloth)
  • Training: Scientific text articles
    • 418 unique names
    • 143 articles
  • Target Ontology: NatUke Benchmarking (https://github.com/AKSW/natuke)
  • Frameworks: Unsloth, HuggingFace, Transformers

📦 Installation

Make sure to install unsloth, torch and CUDA dependencies:

pip install unsloth torch

🧪 Example: Inference Code

from unsloth import FastLanguageModel
import torch

class SPARQLQueryGenerator:
    def __init__(self, model_name: str, max_seq_length: int = 32768, load_in_4bit: bool = True):
        self.model, self.tokenizer = FastLanguageModel.from_pretrained(
            model_name=model_name,
            max_seq_length=max_seq_length,
            load_in_4bit=load_in_4bit
        )
        _ = FastLanguageModel.for_inference(self.model)

    def build_prompt(self, article_text: str) -> list:
        return [
            {"role": "system", "content": (
                "You are a scientist trained in chemistry.\n" 
                "You must extract information from scientific papers identifying relevant properties associated with each natural product discussed in the academic publication.\n"
                "For each paper, you have to analyze the content (text) to identify the *Compound name*. It can be more than one compound name.\n" 
                "Your output should be a list with the names. Return only the list, without any additional information.\n"
            )},
            {"role": "user", "content": article_text}
        ]

    def generate_query(self, article_text: str, temperature: float = 0.01, max_new_tokens: int = 1024) -> str:
        si = "<|im_start|>assistant<|im_sep|>"
        sf = "<|im_end|>"
        messages = self.build_prompt(article_text)
        inputs = self.tokenizer.apply_chat_template(
            messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
        ).to("cuda")
        outputs = self.model.generate(inputs, max_new_tokens=max_new_tokens, use_cache=True, temperature=temperature, min_p=0.1)
        decoded = self.tokenizer.batch_decode(outputs)[0]
        parsed = decoded[decoded.find(si):].replace(si, "").replace(sf, "")
        try:
            l = eval(parsed)
        except:
            l = parsed
            print('Your output is not a list, you will need one more preprocessing step.')

        return l

# --- Using the model ---
if __name__ == "__main__":
    generator = SPARQLQueryGenerator(model_name="aksw/Bike-name")
    text = "Title, Abstract, Introduction, Background, Method, Results, Conclusion, References."
    list_names = generator.generate_query(text)
    print(list_names)

🧪 Evaluation

The model was evaluated using Hits@k on the test sets of the NatUKE Benchmark (do Carmo et al. 2023)


Do Carmo, Paulo Viviurka, et al. "NatUKE: A Benchmark for Natural Product Knowledge Extraction from Academic Literature." 2023 IEEE 17th International Conference on Semantic Computing (ICSC). IEEE, 2023.

📚 Citation

If you use this model in your work, please cite it as:

@inproceedings{ref:doCarmo2025,
  title={Improving Natural Product Knowledge Extraction from Academic Literature with Enhanced PDF Text Extraction and Large Language Models},
  author={Viviurka do Carmo, Paulo and Silva G{\^o}lo, Marcos Paulo and Gwozdz, Jonas and Marx, Edgard and Marcondes Marcacini, Ricardo},
  booktitle={Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing},
  pages={980--987},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aksw/Bike-name

Base model

microsoft/phi-4
Finetuned
(311)
this model