dangkhoa99 commited on
Commit
8f5c4b4
1 Parent(s): a14d91d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -23
README.md CHANGED
@@ -11,41 +11,71 @@ tags:
11
  - falcon-7b
12
  - custom_code
13
  - text-generation-inference
 
14
  metrics:
15
  - exact_match
16
  - f1
17
  pipeline_tag: text-generation
18
  inference: false
19
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Usage
 
21
  ### Prompt
 
22
  The model was trained on the following kind of prompt:
 
23
  ```python
24
- def format_prompt(question, context):
25
- return f"""Answer the question based on the context below. If the question cannot be answered using the information provided answer with 'No answer'. Stop response if end.
26
  >>TITLE<<: Flawless answer.
27
  >>CONTEXT<<: {context}
28
  >>QUESTION<<: {question}
29
  >>ANSWER<<:
30
- """.strip()
31
  ```
32
- ### Example
33
- [Notebook](https://colab.research.google.com/drive/1d2WP-MimF34NN72wGU0gX0uSUTHirN8A?usp=sharing)
 
 
 
 
 
 
34
  ```python
35
  context = '''The Amazon rainforest (Portuguese: Floresta Amaz么nica or Amaz么nia; Spanish: Selva Amaz贸nica, Amazon铆a or usually Amazonia; French: For锚t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'''
36
  question = '''Which name is also used to describe the Amazon rainforest in English?'''
37
 
38
- >>> Amazonia or the Amazon Jungle
39
  ```
40
 
 
41
  ```python
42
- context1 = '''The Amazon rainforest (Portuguese: Floresta Amaz么nica or Amaz么nia; Spanish: Selva Amaz贸nica, Amazon铆a or usually Amazonia; French: For锚t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'''
43
- question1 = '''What is 2 + 2?'''
44
 
45
- >>> No answer
46
  ```
47
- ## Training procedure
48
 
 
49
 
50
  The following `bitsandbytes` quantization config was used during training:
51
  - load_in_8bit: False
@@ -57,26 +87,30 @@ The following `bitsandbytes` quantization config was used during training:
57
  - bnb_4bit_quant_type: nf4
58
  - bnb_4bit_use_double_quant: True
59
  - bnb_4bit_compute_dtype: float16
 
60
  ### Performance
 
61
  Evaluated on the SQuAD 2.0 dev set with the [Metrics](https://huggingface.co/docs/datasets/v2.14.4/en/loading#metrics)
62
 
63
- ```
64
- 'exact': 69.83684838838042
65
- 'f1': 74.4130429770687
66
- 'total': 2513
67
-
68
- 'HasAns_exact': 66.5625
69
- 'HasAns_f1': 75.546857032323
70
- 'HasAns_total': 1280
71
- 'NoAns_exact': 73.2360097323601
72
- 'NoAns_f1': 73.2360097323601
73
- 'NoAns_total': 1233
74
- 'best_exact': 69.8766414643852
75
  'best_exact_thresh': 0.0
76
- 'best_f1': 74.4528360530736
77
  'best_f1_thresh': 0.0
78
  ```
 
79
  ### Framework versions
 
80
  - PEFT 0.5.0.dev0
81
  - Transformers 4.31.0
82
  - Datasets 2.14.4
 
11
  - falcon-7b
12
  - custom_code
13
  - text-generation-inference
14
+ - endpoints-template
15
  metrics:
16
  - exact_match
17
  - f1
18
  pipeline_tag: text-generation
19
  inference: false
20
  ---
21
+ # 馃殌 falcon-7b-finetuned-QA-MRC-4-bit
22
+
23
+ Falcon-7b-finetuned-QA-MRC-4-bit is a model for Machine Reading Comprehension (MRC) with Question Answering (QA). It was built by fine-tuning [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) on the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. This repo only includes the LoRA adapters from fine-tuning with 馃's [peft](https://github.com/huggingface/peft) package.
24
+
25
+ ## Model Summary
26
+
27
+ - **Model Type:** Causal decoder-only
28
+ - **Language(s):** English
29
+ - **Base Model:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) (License: [Apache 2.0](https://huggingface.co/tiiuae/falcon-7b#license))
30
+ - **Dataset:** [SQuAD2.0](https://huggingface.co/datasets/squad_v2) (License: cc-by-sa-4.0)
31
+ - **License(s):** Apache 2.0 inherited from "Base Model" and cc-by-sa-4.0 inherited from "Dataset"
32
+
33
+ ## Model Details
34
+
35
+ The model was fine-tuned in 4-bit precision using 馃 `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. The run took **approximately 5.08 hours** and was executed on a workstation with **a single A100-SXM NVIDIA GPU** with 37 GB of available memory.
36
+
37
+ ### Model Date
38
+
39
+ August 08, 2023
40
+
41
  ## Usage
42
+
43
  ### Prompt
44
+
45
  The model was trained on the following kind of prompt:
46
+
47
  ```python
48
+ """Answer the question based on the context below. If the question cannot be answered using the information provided answer with 'No answer'. Stop response if end.
 
49
  >>TITLE<<: Flawless answer.
50
  >>CONTEXT<<: {context}
51
  >>QUESTION<<: {question}
52
  >>ANSWER<<:
53
+ """
54
  ```
55
+
56
+ ### Inference
57
+
58
+ You will need **at least 6GB of memory** to swiftly run inference.
59
+
60
+ [Colab Notebook](https://colab.research.google.com/drive/1d2WP-MimF34NN72wGU0gX0uSUTHirN8A?usp=sharing)
61
+
62
+ #### Example 1:
63
  ```python
64
  context = '''The Amazon rainforest (Portuguese: Floresta Amaz么nica or Amaz么nia; Spanish: Selva Amaz贸nica, Amazon铆a or usually Amazonia; French: For锚t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'''
65
  question = '''Which name is also used to describe the Amazon rainforest in English?'''
66
 
67
+ >>> 'Amazonia or the Amazon Jungle'
68
  ```
69
 
70
+ #### Example 2 (No answer):
71
  ```python
72
+ context = '''The Amazon rainforest (Portuguese: Floresta Amaz么nica or Amaz么nia; Spanish: Selva Amaz贸nica, Amazon铆a or usually Amazonia; French: For锚t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species.'''
73
+ question = '''What is 2 + 2?'''
74
 
75
+ >>> 'No answer'
76
  ```
 
77
 
78
+ ## Training procedure
79
 
80
  The following `bitsandbytes` quantization config was used during training:
81
  - load_in_8bit: False
 
87
  - bnb_4bit_quant_type: nf4
88
  - bnb_4bit_use_double_quant: True
89
  - bnb_4bit_compute_dtype: float16
90
+
91
  ### Performance
92
+
93
  Evaluated on the SQuAD 2.0 dev set with the [Metrics](https://huggingface.co/docs/datasets/v2.14.4/en/loading#metrics)
94
 
95
+ ```python
96
+ 'exact': 70.1779158429822
97
+ 'f1': 75.06205246831128
98
+ 'total': 3541
99
+
100
+ 'HasAns_exact': 65.49104720564297
101
+ 'HasAns_f1': 74.87505577335338
102
+ 'HasAns_total': 1843
103
+ 'NoAns_exact': 75.26501766784452
104
+ 'NoAns_f1': 75.26501766784452
105
+ 'NoAns_total': 1698
106
+ 'best_exact': 70.20615645297939
107
  'best_exact_thresh': 0.0
108
+ 'best_f1': 75.09029307830835
109
  'best_f1_thresh': 0.0
110
  ```
111
+
112
  ### Framework versions
113
+
114
  - PEFT 0.5.0.dev0
115
  - Transformers 4.31.0
116
  - Datasets 2.14.4