zltd
/

zbrain_llm_0.1

+---
+license: apache-2.0
+tags:
+- question-answering
+- information-retrieval
+- tf-idf
+- cosine-similarity
+- mahabharata
+- indian-epic
+- text-classification
+- scikit-learn
+- joblib
+- huggingface-hub
+- datasets
+- transformers
+- natural-language-processing
+- nlp
+- text
+---
+# zlt-llm
+[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Card-blue)](https://huggingface.co/vprasenjeet099/zlt-llm)
+This is a simple Question Answering model trained on the Mahabharata dataset. It utilizes TF-IDF for text representation and cosine similarity for retrieval. The answer is extracted based on word overlap.
+## Model Description
+This model is designed to answer questions based on the text of the Mahabharata. It uses a combination of TF-IDF and cosine similarity to retrieve relevant passages and then selects the most likely answer based on word overlap.
+## Intended Uses & Limitations
+### Intended Uses
+-   Answering simple factual questions about the Mahabharata.
+-   Demonstrating basic question-answering techniques.
+-   Serving as a starting point for more advanced QA models.
+### Limitations
+-   This is a basic model and may not handle complex questions effectively.
+-   It relies on simple word overlap and does not understand semantic meaning.
+-   For more advanced QA, consider using transformer-based models like BERT, RoBERTa, or DistilBERT.
+-   The small manually created QA pairs are not sufficient for a comprehensive evaluation.
+-   The model does not handle ambiguous questions well.
+## How to Use
+### Installation
+```bash
+pip install datasets scikit-learn joblib huggingface_hub transformers
+import joblib
+from sklearn.metrics.pairwise import cosine_similarity
+from sklearn.feature_extraction.text import TfidfVectorizer
+from transformers import pipeline
+from huggingface_hub import hf_hub_download
+# Load the model from Hugging Face Hub
+model_path = hf_hub_download(repo_id="vprasenjeet099/zlt-llm", filename="qa_model.joblib")
+loaded_model = joblib.load(model_path)
+vectorizer = loaded_model["vectorizer"]
+tfidf_matrix = loaded_model["tfidf_matrix"]
+paragraphs = loaded_model["paragraphs"]
+def answer_question(question, tfidf_matrix, vectorizer, paragraphs):
+    question_vector = vectorizer.transform([question])
+    similarities = cosine_similarity(question_vector, tfidf_matrix)
+    most_similar_paragraph_index = np.argmax(similarities)
+    most_similar_paragraph = paragraphs[most_similar_paragraph_index]
+    paragraph_sentences = most_similar_paragraph.split(".")
+    best_sentence = ""
+    max_overlap = 0
+    question_words = set(question.lower().split())
+    for sentence in paragraph_sentences:
+        sentence = sentence.strip()
+        if not sentence:
+            continue
+        sentence_words = set(sentence.lower().split())
+        overlap = len(question_words.intersection(sentence_words))
+        if overlap > max_overlap:
+            max_overlap = overlap
+            best_sentence = sentence
+    return best_sentence.strip()
+# Example usage
+question = "Who was Arjuna?"
+answer = answer_question(question, tfidf_matrix, vectorizer, paragraphs)
+print(f"Question: {question}")
+print(f"Answer: {answer}")
+# Example using Transformers pipeline to show how it *could* be improved.
+qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
+context = paragraphs[0] #first paragraph for example.
+result = qa_pipeline(question=question, context=context)
+print(result) ```