zltd
/

Text Generation
Transformers
English
lstm
prasenjeet099 commited on
Commit
faad643
·
verified ·
1 Parent(s): 2215566

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -57
README.md CHANGED
@@ -2,20 +2,18 @@
2
  license: apache-2.0
3
  tags:
4
  - question-answering
5
- - information-retrieval
6
- - tf-idf
7
- - cosine-similarity
8
- - mahabharata
9
- - indian-epic
10
  - text-classification
11
  - scikit-learn
12
  - joblib
13
- - huggingface-hub
14
  - datasets
15
  - transformers
16
  - natural-language-processing
17
  - nlp
18
- - text
 
 
 
 
19
  ---
20
  # zlt-llm
21
 
@@ -48,53 +46,4 @@ This model is designed to answer questions based on the text of the Mahabharata.
48
  ### Installation
49
 
50
  ```bash
51
- pip install datasets scikit-learn joblib huggingface_hub transformers
52
- import joblib
53
- from sklearn.metrics.pairwise import cosine_similarity
54
- from sklearn.feature_extraction.text import TfidfVectorizer
55
- from transformers import pipeline
56
- from huggingface_hub import hf_hub_download
57
-
58
- # Load the model from Hugging Face Hub
59
- model_path = hf_hub_download(repo_id="vprasenjeet099/zlt-llm", filename="qa_model.joblib")
60
- loaded_model = joblib.load(model_path)
61
- vectorizer = loaded_model["vectorizer"]
62
- tfidf_matrix = loaded_model["tfidf_matrix"]
63
- paragraphs = loaded_model["paragraphs"]
64
-
65
- def answer_question(question, tfidf_matrix, vectorizer, paragraphs):
66
- question_vector = vectorizer.transform([question])
67
- similarities = cosine_similarity(question_vector, tfidf_matrix)
68
- most_similar_paragraph_index = np.argmax(similarities)
69
- most_similar_paragraph = paragraphs[most_similar_paragraph_index]
70
-
71
- paragraph_sentences = most_similar_paragraph.split(".")
72
- best_sentence = ""
73
- max_overlap = 0
74
-
75
- question_words = set(question.lower().split())
76
-
77
- for sentence in paragraph_sentences:
78
- sentence = sentence.strip()
79
- if not sentence:
80
- continue
81
- sentence_words = set(sentence.lower().split())
82
- overlap = len(question_words.intersection(sentence_words))
83
- if overlap > max_overlap:
84
- max_overlap = overlap
85
- best_sentence = sentence
86
-
87
- return best_sentence.strip()
88
-
89
- # Example usage
90
- question = "Who was Arjuna?"
91
- answer = answer_question(question, tfidf_matrix, vectorizer, paragraphs)
92
- print(f"Question: {question}")
93
- print(f"Answer: {answer}")
94
-
95
- # Example using Transformers pipeline to show how it *could* be improved.
96
-
97
- qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
98
- context = paragraphs[0] #first paragraph for example.
99
- result = qa_pipeline(question=question, context=context)
100
- print(result) ```
 
2
  license: apache-2.0
3
  tags:
4
  - question-answering
 
 
 
 
 
5
  - text-classification
6
  - scikit-learn
7
  - joblib
 
8
  - datasets
9
  - transformers
10
  - natural-language-processing
11
  - nlp
12
+ datasets:
13
+ - prasenjeet099/mahabharata_great_india_epic
14
+ metrics:
15
+ - accuracy
16
+ pipeline_tag: question-answering
17
  ---
18
  # zlt-llm
19
 
 
46
  ### Installation
47
 
48
  ```bash
49
+ pip install datasets scikit-learn joblib huggingface_hub transformers