zltd
/

Text Generation
Transformers
English
lstm
prasenjeet099 commited on
Commit
2215566
·
verified ·
1 Parent(s): ad5393f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - question-answering
5
+ - information-retrieval
6
+ - tf-idf
7
+ - cosine-similarity
8
+ - mahabharata
9
+ - indian-epic
10
+ - text-classification
11
+ - scikit-learn
12
+ - joblib
13
+ - huggingface-hub
14
+ - datasets
15
+ - transformers
16
+ - natural-language-processing
17
+ - nlp
18
+ - text
19
+ ---
20
+ # zlt-llm
21
+
22
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Card-blue)](https://huggingface.co/vprasenjeet099/zlt-llm)
23
+
24
+ This is a simple Question Answering model trained on the Mahabharata dataset. It utilizes TF-IDF for text representation and cosine similarity for retrieval. The answer is extracted based on word overlap.
25
+
26
+ ## Model Description
27
+
28
+ This model is designed to answer questions based on the text of the Mahabharata. It uses a combination of TF-IDF and cosine similarity to retrieve relevant passages and then selects the most likely answer based on word overlap.
29
+
30
+ ## Intended Uses & Limitations
31
+
32
+ ### Intended Uses
33
+
34
+ - Answering simple factual questions about the Mahabharata.
35
+ - Demonstrating basic question-answering techniques.
36
+ - Serving as a starting point for more advanced QA models.
37
+
38
+ ### Limitations
39
+
40
+ - This is a basic model and may not handle complex questions effectively.
41
+ - It relies on simple word overlap and does not understand semantic meaning.
42
+ - For more advanced QA, consider using transformer-based models like BERT, RoBERTa, or DistilBERT.
43
+ - The small manually created QA pairs are not sufficient for a comprehensive evaluation.
44
+ - The model does not handle ambiguous questions well.
45
+
46
+ ## How to Use
47
+
48
+ ### Installation
49
+
50
+ ```bash
51
+ pip install datasets scikit-learn joblib huggingface_hub transformers
52
+ import joblib
53
+ from sklearn.metrics.pairwise import cosine_similarity
54
+ from sklearn.feature_extraction.text import TfidfVectorizer
55
+ from transformers import pipeline
56
+ from huggingface_hub import hf_hub_download
57
+
58
+ # Load the model from Hugging Face Hub
59
+ model_path = hf_hub_download(repo_id="vprasenjeet099/zlt-llm", filename="qa_model.joblib")
60
+ loaded_model = joblib.load(model_path)
61
+ vectorizer = loaded_model["vectorizer"]
62
+ tfidf_matrix = loaded_model["tfidf_matrix"]
63
+ paragraphs = loaded_model["paragraphs"]
64
+
65
+ def answer_question(question, tfidf_matrix, vectorizer, paragraphs):
66
+ question_vector = vectorizer.transform([question])
67
+ similarities = cosine_similarity(question_vector, tfidf_matrix)
68
+ most_similar_paragraph_index = np.argmax(similarities)
69
+ most_similar_paragraph = paragraphs[most_similar_paragraph_index]
70
+
71
+ paragraph_sentences = most_similar_paragraph.split(".")
72
+ best_sentence = ""
73
+ max_overlap = 0
74
+
75
+ question_words = set(question.lower().split())
76
+
77
+ for sentence in paragraph_sentences:
78
+ sentence = sentence.strip()
79
+ if not sentence:
80
+ continue
81
+ sentence_words = set(sentence.lower().split())
82
+ overlap = len(question_words.intersection(sentence_words))
83
+ if overlap > max_overlap:
84
+ max_overlap = overlap
85
+ best_sentence = sentence
86
+
87
+ return best_sentence.strip()
88
+
89
+ # Example usage
90
+ question = "Who was Arjuna?"
91
+ answer = answer_question(question, tfidf_matrix, vectorizer, paragraphs)
92
+ print(f"Question: {question}")
93
+ print(f"Answer: {answer}")
94
+
95
+ # Example using Transformers pipeline to show how it *could* be improved.
96
+
97
+ qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
98
+ context = paragraphs[0] #first paragraph for example.
99
+ result = qa_pipeline(question=question, context=context)
100
+ print(result) ```