🚀 Falcon-QAMaster
Falcon-7b-QueAns is a chatbot-like model for Question and Answering. It was built by fine-tuning Falcon-7B on the SQuAD, Adversarial_qa, Trimpixel (Self-Made) datasets. This repo only includes the QLoRA adapters from fine-tuning with 🤗's peft package.
Model Summary
- Model Type: Causal decoder-only
- Language(s): English
- Base Model: Falcon-7B (License: Apache 2.0)
- Dataset: SQuAD (License: cc-by-4.0), Adversarial_qa (License: cc-by-sa-4.0), Falcon-RefinedWeb (odc-by), Trimpixel (Self-Made)
- License(s): Apache 2.0 inherited from "Base Model" and "Dataset"
Why use Falcon-7B?
- It outperforms comparable open-source models (e.g., MPT-7B, StableLM, RedPajama etc.), thanks to being trained on 1,500B tokens of RefinedWeb enhanced with curated corpora. See the OpenLLM Leaderboard.
- It features an architecture optimized for inference, with FlashAttention (Dao et al., 2022) and multiquery (Shazeer et al., 2019).
- It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions.
⚠️ This is a finetuned version for specifically question and answering. If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at Falcon-7B-Instruct.
🔥 Looking for an even more powerful model? Falcon-40B is Falcon-7B's big brother!
Model Details
The model was fine-tuned in 4-bit precision using 🤗 peft
adapters, transformers
, and bitsandbytes
. Training relied on a method called "Low Rank Adapters" (LoRA), specifically the QLoRA variant. The run took approximately 12 hours and was executed on a workstation with a single T4 NVIDIA GPU with 25 GB of available memory. See attached [Colab Notebook] used to train the model.
Model Date
July 13, 2023
Open source falcon 7b large language model fine tuned on SQuAD, Adversarial_qa, Trimpixel datasets for question and answering. QLoRA technique used for fine tuning the model on consumer grade GPU SFTTrainer is also used.
Datasets
Dataset used: SQuAD Dataset Size: 87599 Training Steps: 350
Dataset used: Adversarial_qa Dataset Size: 30000 Training Steps: 400
Dataset used: Trimpixel Dataset Size: 1757 Training Steps: 400
Training procedure
The following bitsandbytes
quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16
The following bitsandbytes
quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16
Framework versions
PEFT 0.4.0.dev0
PEFT 0.4.0.dev0
- Downloads last month
- 17