Upload 11 files

Browse files

Files changed (11) hide show

README.md +6 -72
adapter_config.json +22 -0
adapter_model.bin +3 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
scheduler.pt +3 -0
special_tokens_map.json +17 -0
tokenizer.json +0 -0
tokenizer_config.json +7 -0
trainer_state.json +256 -0
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,78 +1,12 @@
 ---
-license: mit
-datasets:
-- squad
-- tiiuae/falcon-refinedweb
-- avnishkr/trimpixel
-language:
-- en
-library_name: adapter-transformers
-pipeline_tag: question-answering
-tags:
-- code
-- falcon-7b
-- llms
-- transformers
-- opensource-llms
-- fine-tuning llms
-- PEFT
-- QLoRA
-- LoRA
-- SFTTrainer
 ---
-# 🚀 Falcon-7b-QueAns
-Falcon-7b-QueAns is a chatbot-like model for Question and Answering. It was built by fine-tuning [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) on the [SQuAD](https://huggingface.co/datasets/squad) dataset. This repo only includes the QLoRA adapters from fine-tuning with 🤗's [peft](https://github.com/huggingface/peft) package.
-## Model Summary
-- **Model Type:** Causal decoder-only
-- **Language(s):** English
-- **Base Model:** Falcon-7B (License: Apache 2.0)
-- **Dataset:** [SQuAD](https://huggingface.co/datasets/squad) (License: cc-by-4.0)
-- **License(s):** Apache 2.0 inherited from "Base Model" and "Dataset"
-## Why use Falcon-7B?
-* **It outperforms comparable open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-* **It features an architecture optimized for inference**, with FlashAttention ([Dao et al., 2022](https://arxiv.org/abs/2205.14135)) and multiquery ([Shazeer et al., 2019](https://arxiv.org/abs/1911.02150)).
-* **It is made available under a permissive Apache 2.0 license allowing for commercial use**, without any royalties or restrictions.
-⚠️ **This is a finetuned version for specifically question and answering.** If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at [Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct).
-🔥 **Looking for an even more powerful model?** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) is Falcon-7B's big brother!
-## Model Details
-The model was fine-tuned in 4-bit precision using 🤗 `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. The run took approximately 4 hours and was executed on a workstation with a single T4 NVIDIA GPU with 15 GB of available memory. See attached [Colab Notebook] used to train the model.
-### Model Date
-July 06, 2023
-Open source falcon 7b large language model fine tuned on SQuAD dataset for question and answering.
-QLoRA technique used for fine tuning the model on consumer grade GPU
-SFTTrainer is also used.
-Dataset used: SQuAD
-Dataset Size: 87278
-Training Steps: 500
 ## Training procedure
 The following `bitsandbytes` quantization config was used during training:
-- load_in_8bit: True
-- load_in_4bit: False
 - llm_int8_threshold: 6.0
 - llm_int8_skip_modules: None
 - llm_int8_enable_fp32_cpu_offload: False
@@ -82,8 +16,8 @@ The following `bitsandbytes` quantization config was used during training:
 - bnb_4bit_compute_dtype: float16
 The following `bitsandbytes` quantization config was used during training:
-- load_in_8bit: True
-- load_in_4bit: False
 - llm_int8_threshold: 6.0
 - llm_int8_skip_modules: None
 - llm_int8_enable_fp32_cpu_offload: False
@@ -95,4 +29,4 @@ The following `bitsandbytes` quantization config was used during training:
 - PEFT 0.4.0.dev0
-- PEFT 0.4.0.dev0

 ---
+library_name: peft
 ---
 ## Training procedure
 The following `bitsandbytes` quantization config was used during training:
+- load_in_8bit: False
+- load_in_4bit: True
 - llm_int8_threshold: 6.0
 - llm_int8_skip_modules: None
 - llm_int8_enable_fp32_cpu_offload: False
 - bnb_4bit_compute_dtype: float16
 The following `bitsandbytes` quantization config was used during training:
+- load_in_8bit: False
+- load_in_4bit: True
 - llm_int8_threshold: 6.0
 - llm_int8_skip_modules: None
 - llm_int8_enable_fp32_cpu_offload: False
 - PEFT 0.4.0.dev0
+- PEFT 0.4.0.dev0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "base_model_name_or_path": "ybelkada/falcon-7b-sharded-bf16",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "lora_alpha": 16,
+  "lora_dropout": 0.1,
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense",
+    "dense_h_to_4h",
+    "dense_4h_to_h"
+  ],
+  "task_type": "CAUSAL_LM"
+}

adapter_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1d39e37b466710757b068a8829408c98e13d50c7aa2d9b80aafa99f82f268c1
+size 522284877

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:65e898b7afc60bf764f1a27f93aeca7043428e519b722bfc4df5acede927ecc5
+size 1044539909

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:86eca6a3cd10c108f35c9ae0264019357607e0f621c5f872dd10abbb2b9ed943
+size 14575

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:13276f15dd2b6acc19b970176aa2db4ac9b58241843e72c89b50e3094e903b19
+size 627

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,17 @@

+{
+  "additional_special_tokens": [
+    ">>TITLE<<",
+    ">>ABSTRACT<<",
+    ">>INTRODUCTION<<",
+    ">>SUMMARY<<",
+    ">>COMMENT<<",
+    ">>ANSWER<<",
+    ">>QUESTION<<",
+    ">>DOMAIN<<",
+    ">>PREFIX<<",
+    ">>SUFFIX<<",
+    ">>MIDDLE<<"
+  ],
+  "eos_token": "<|endoftext|>",
+  "pad_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "add_prefix_space": false,
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 2048,
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,256 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 10.666666666666666,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.27,
+      "learning_rate": 0.0002,
+      "loss": 2.82,
+      "step": 10
+    },
+    {
+      "epoch": 0.53,
+      "learning_rate": 0.0002,
+      "loss": 2.2563,
+      "step": 20
+    },
+    {
+      "epoch": 0.8,
+      "learning_rate": 0.0002,
+      "loss": 2.1476,
+      "step": 30
+    },
+    {
+      "epoch": 1.07,
+      "learning_rate": 0.0002,
+      "loss": 2.1418,
+      "step": 40
+    },
+    {
+      "epoch": 1.33,
+      "learning_rate": 0.0002,
+      "loss": 2.0863,
+      "step": 50
+    },
+    {
+      "epoch": 1.6,
+      "learning_rate": 0.0002,
+      "loss": 1.9899,
+      "step": 60
+    },
+    {
+      "epoch": 1.87,
+      "learning_rate": 0.0002,
+      "loss": 2.0048,
+      "step": 70
+    },
+    {
+      "epoch": 2.13,
+      "learning_rate": 0.0002,
+      "loss": 1.9172,
+      "step": 80
+    },
+    {
+      "epoch": 2.4,
+      "learning_rate": 0.0002,
+      "loss": 1.8451,
+      "step": 90
+    },
+    {
+      "epoch": 2.67,
+      "learning_rate": 0.0002,
+      "loss": 1.9007,
+      "step": 100
+    },
+    {
+      "epoch": 2.93,
+      "learning_rate": 0.0002,
+      "loss": 1.8438,
+      "step": 110
+    },
+    {
+      "epoch": 3.2,
+      "learning_rate": 0.0002,
+      "loss": 1.7509,
+      "step": 120
+    },
+    {
+      "epoch": 3.47,
+      "learning_rate": 0.0002,
+      "loss": 1.6939,
+      "step": 130
+    },
+    {
+      "epoch": 3.73,
+      "learning_rate": 0.0002,
+      "loss": 1.6918,
+      "step": 140
+    },
+    {
+      "epoch": 4.0,
+      "learning_rate": 0.0002,
+      "loss": 1.7208,
+      "step": 150
+    },
+    {
+      "epoch": 4.27,
+      "learning_rate": 0.0002,
+      "loss": 1.5775,
+      "step": 160
+    },
+    {
+      "epoch": 4.53,
+      "learning_rate": 0.0002,
+      "loss": 1.5246,
+      "step": 170
+    },
+    {
+      "epoch": 4.8,
+      "learning_rate": 0.0002,
+      "loss": 1.5304,
+      "step": 180
+    },
+    {
+      "epoch": 5.07,
+      "learning_rate": 0.0002,
+      "loss": 1.5009,
+      "step": 190
+    },
+    {
+      "epoch": 5.33,
+      "learning_rate": 0.0002,
+      "loss": 1.3492,
+      "step": 200
+    },
+    {
+      "epoch": 5.6,
+      "learning_rate": 0.0002,
+      "loss": 1.39,
+      "step": 210
+    },
+    {
+      "epoch": 5.87,
+      "learning_rate": 0.0002,
+      "loss": 1.39,
+      "step": 220
+    },
+    {
+      "epoch": 6.13,
+      "learning_rate": 0.0002,
+      "loss": 1.2891,
+      "step": 230
+    },
+    {
+      "epoch": 6.4,
+      "learning_rate": 0.0002,
+      "loss": 1.2195,
+      "step": 240
+    },
+    {
+      "epoch": 6.67,
+      "learning_rate": 0.0002,
+      "loss": 1.2381,
+      "step": 250
+    },
+    {
+      "epoch": 6.93,
+      "learning_rate": 0.0002,
+      "loss": 1.2431,
+      "step": 260
+    },
+    {
+      "epoch": 7.2,
+      "learning_rate": 0.0002,
+      "loss": 1.07,
+      "step": 270
+    },
+    {
+      "epoch": 7.47,
+      "learning_rate": 0.0002,
+      "loss": 1.0858,
+      "step": 280
+    },
+    {
+      "epoch": 7.73,
+      "learning_rate": 0.0002,
+      "loss": 1.0796,
+      "step": 290
+    },
+    {
+      "epoch": 8.0,
+      "learning_rate": 0.0002,
+      "loss": 1.107,
+      "step": 300
+    },
+    {
+      "epoch": 8.27,
+      "learning_rate": 0.0002,
+      "loss": 0.882,
+      "step": 310
+    },
+    {
+      "epoch": 8.53,
+      "learning_rate": 0.0002,
+      "loss": 0.9132,
+      "step": 320
+    },
+    {
+      "epoch": 8.8,
+      "learning_rate": 0.0002,
+      "loss": 0.9592,
+      "step": 330
+    },
+    {
+      "epoch": 9.07,
+      "learning_rate": 0.0002,
+      "loss": 0.9249,
+      "step": 340
+    },
+    {
+      "epoch": 9.33,
+      "learning_rate": 0.0002,
+      "loss": 0.7599,
+      "step": 350
+    },
+    {
+      "epoch": 9.6,
+      "learning_rate": 0.0002,
+      "loss": 0.7568,
+      "step": 360
+    },
+    {
+      "epoch": 9.87,
+      "learning_rate": 0.0002,
+      "loss": 0.7966,
+      "step": 370
+    },
+    {
+      "epoch": 10.13,
+      "learning_rate": 0.0002,
+      "loss": 0.7164,
+      "step": 380
+    },
+    {
+      "epoch": 10.4,
+      "learning_rate": 0.0002,
+      "loss": 0.6433,
+      "step": 390
+    },
+    {
+      "epoch": 10.67,
+      "learning_rate": 0.0002,
+      "loss": 0.6312,
+      "step": 400
+    }
+  ],
+  "max_steps": 401,
+  "num_train_epochs": 11,
+  "total_flos": 2.0280759172595712e+17,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f96fb321d28a0d32b39ddd539b1d3aba2c8654e5eee796aad68e98adf34b8602
+size 4027