🧠 DeBERTa-NCERT-Biology-QA

This model is a fine-tuned version of microsoft/deberta-v3-small on a chunk of the NCERT Class 11 Biology dataset. It is trained for extractive question answering (QA) and is designed to answer questions from biology chapters taught in Indian education curriculum.

📚 Dataset

The dataset was created from the official NCERT Class 11 Biology book, specifically:

Chunk Range: chunk_3000 to chunk_3143
Data Format: CSV with context-question-answer triplets
Task: Extractive QA (start & end position of answer in context)

⚙️ Model Details

Base Model: microsoft/deberta-v3-small
Task: question-answering
Tokenizer: SentencePiece (spm.model) with custom vocabulary
Framework: 🤗 Transformers + PyTorch
Optimized For: Low-resource devices (OpenVINO conversion available)

📈 Performance

Metric	Value
Exact Match (EM)	87.5%
F1 Score	91.2%
Avg Confidence	~0.99 after fine-tuning
Loss Trend	Decreasing steadily from 1.6 to 0.3
Epochs	2

🟢 Confidence before training: ~0.006
🟢 Confidence after training: ~0.99