Model save

Browse files

Files changed (10) hide show

.ipynb_checkpoints/aqlm_2bit_training-checkpoint.ipynb +622 -0
.ipynb_checkpoints/test_ft-checkpoint.py +165 -0
=0.27.0 +35 -0
=1.1.0 +0 -0
README.md +52 -0
adapter_config.json +34 -0
adapter_model.safetensors +3 -0
aqlm_2bit_training.ipynb +622 -0
test_ft.py +165 -0
training_args.bin +3 -0

.ipynb_checkpoints/aqlm_2bit_training-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,622 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "XIyP_0r6zuVc"
+   },
+   "source": [
+    "# Training Large Language Models in 2bit with `aqlm`, `transformers` and `PEFT`\n",
+    "\n",
+    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/Vahe1994/AQLM/blob/main/notebooks/aqlm_2bit_training.ipynb\">\n",
+    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
+    "</a>\n",
+    "\n",
+    "Welcome to this notebook that goes through the recent `aqlm` integration that introduces minimal performance degradation 2bit quantization techniques.\n",
+    "\n",
+    "In this notebook, we will learn how to load a large model in 2bit (`Mixtral-8x7b`) and train it using Google Colab and PEFT library from Hugging Face 🤗.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "A_VgSpl4Dsr3"
+   },
+   "source": [
+    "**Install the `aqlm` library**\n",
+    "- It's the only extra dependency to run AQLM models.\n",
+    "- Add `[gpu]` to install the required CUDA specific dependencies.\n",
+    "- Install the latest `accelerate` and `transformers` releases to properly support it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "FuXIFTFapAMI"
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "!pip install aqlm[gpu]>=1.1.0\n",
+    "!pip install git+https://github.com/huggingface/peft.git@main\n",
+    "!pip install accelerate>=0.27.0\n",
+    "!pip install git+https://github.com/huggingface/transformers.git@main\n",
+    "!pip install datasets\n",
+    "!pip install bitsandbytes\n",
+    "# for 8-bit optimizer only"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MJ-5idQwzvg-"
+   },
+   "source": [
+    "First let's load the model we are going to use - `Mixtral-8x7b`! Note that the model itself is around 50GB in half precision"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "id": "E0Nl5mWL0k2T"
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n",
+    "\n",
+    "model_id = \"ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16\"\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", torch_dtype=\"bfloat16\", low_cpu_mem_usage=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Mp2gMi1ZzGET"
+   },
+   "source": [
+    "**Add LoRA**\n",
+    "\n",
+    "To alter model's behavior, we have to make it trainable. We can do that by addind a small set of trainable parameters on top of the untrainable quantized ones."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "Ybeyl20n3dYH",
+    "outputId": "0efda156-4886-4718-9877-e93a17dc02d2"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trainable params: 41,943,040 || all params: 2,084,114,432 || trainable%: 2.0125\n"
+     ]
+    }
+   ],
+   "source": [
+    "from peft import LoraConfig, get_peft_model\n",
+    "\n",
+    "config = LoraConfig(\n",
+    "    r=16,\n",
+    "    lora_alpha=32,\n",
+    "    target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj', ],\n",
+    "    lora_dropout=0.05,\n",
+    "    bias=\"none\",\n",
+    "    task_type=\"CAUSAL_LM\"\n",
+    ")\n",
+    "\n",
+    "model = get_peft_model(model, config)\n",
+    "model.print_trainable_parameters()\n",
+    "model.enable_input_require_grads() # it's needed for gradient checkpointing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "4xSPH1D_Wv9x"
+   },
+   "source": [
+    "Here we add a trainable adapter ontop of every `q_prok`, `k_proj` and `o_proj` linear layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "FCc64bfnmd3j"
+   },
+   "source": [
+    "**Loading a dataset**\n",
+    "\n",
+    "Let's load a common dataset, english quotes, to fine tune our model on famous quotes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "id": "s6f4z8EYmcJ6"
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "9ef07f1bc62e4887817a81d4a3e15da1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Resolving data files:   0%|          | 0/114 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loaded dataset with 100000 examples\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "560c7be6397c4e3aac2318d97f1f8f86",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b667958a3b3d4529b77baf5e5bc9c259",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "10359f3b8d974be49da2d3fd87f89576",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "97835946d4a44460bc1bd48276b8d3d0",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ed37faeff8914b369649cb514981991d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "7a56a1781f3347f8a056a18dc24ea7a9",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/100000 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Processed dataset has 100000 examples\n",
+      "Features: {'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None), 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset, Dataset\n",
+    "import itertools\n",
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "# Load the dataset in streaming mode\n",
+    "ds = load_dataset(\"open-web-math/open-web-math\", split=\"train\", streaming=True)\n",
+    "\n",
+    "# Define the number of examples you want to load\n",
+    "num_examples = 100000  # Adjust this number as needed\n",
+    "\n",
+    "# Create a subset by taking the first num_examples\n",
+    "subset = list(itertools.islice(ds, num_examples))\n",
+    "\n",
+    "# Convert the subset to a Dataset object\n",
+    "data = Dataset.from_list(subset)\n",
+    "print(f\"Loaded dataset with {len(data)} examples\")\n",
+    "\n",
+    "# Initialize tokenizer (replace 'gpt2' with your specific model if different)\n",
+    "tokenizer = AutoTokenizer.from_pretrained('gpt2')\n",
+    "\n",
+    "max_seq_length = 2048\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "tokenizer.model_max_length = max_seq_length\n",
+    "\n",
+    "def preprocess_function(examples):\n",
+    "    # Join the list of strings into a single string\n",
+    "    texts = [\" \".join(text) for text in examples[\"text\"]]\n",
+    "    return tokenizer(texts, truncation=True, max_length=max_seq_length, padding=\"max_length\")\n",
+    "\n",
+    "# Process the dataset\n",
+    "processed_dataset = data.map(preprocess_function, batched=True, remove_columns=data.column_names)\n",
+    "\n",
+    "print(f\"Processed dataset has {len(processed_dataset)} examples\")\n",
+    "print(f\"Features: {processed_dataset.features}\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import argparse\n",
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments\n",
+    "import transformers\n",
+    "from peft import LoraConfig, get_peft_model\n",
+    "from datasets import load_dataset\n",
+    "from transformers.trainer_callback import TrainerCallback\n",
+    "import os\n",
+    "import random\n",
+    "import subprocess\n",
+    "from huggingface_hub import HfApi, hf_hub_download\n",
+    "\n",
+    "\n",
+    "# Custom callback to push to Hub\n",
+    "class PushToHubCallback(TrainerCallback):\n",
+    "    def __init__(self, trainer, push_frequency):\n",
+    "        self.trainer = trainer\n",
+    "        self.push_frequency = push_frequency\n",
+    "\n",
+    "    def on_step_end(self, args, state, control, **kwargs):\n",
+    "        if state.global_step % self.push_frequency == 0:\n",
+    "            self.trainer.save_model()\n",
+    "            self.trainer.push_to_hub(\n",
+    "                commit_message=f\"Training in progress - Step {state.global_step}\"\n",
+    "            )\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "74a0f8d448004c048d8b0608fa3a61fd",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from huggingface_hub import notebook_login\n",
+    "\n",
+    "notebook_login()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "max_steps is given, it will override any value given in num_train_epochs\n",
+      "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
+      "  return fn(*args, **kwargs)\n",
+      "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
+      "  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='22' max='10000' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [   22/10000 09:16 < 77:05:02, 0.04 it/s, Epoch 0.01/4]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>1</td>\n",
+       "      <td>5.558500</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "hub_model_id = \"davisrbr/math-lora\"\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "torch.cuda.empty_cache()\n",
+    "trainer = transformers.Trainer(\n",
+    "    model=model,\n",
+    "    train_dataset=processed_dataset,\n",
+    "    args=TrainingArguments(\n",
+    "        per_device_train_batch_size=4,\n",
+    "        gradient_accumulation_steps=8,\n",
+    "        gradient_checkpointing=True,\n",
+    "        warmup_steps=200,\n",
+    "        max_steps=10000,\n",
+    "        learning_rate=2e-4,\n",
+    "        bf16=True,\n",
+    "        logging_steps=25,\n",
+    "        output_dir=\".\",\n",
+    "        optim=\"adamw_bnb_8bit\",\n",
+    "        logging_first_step=True,\n",
+    "        push_to_hub=True,\n",
+    "        hub_model_id=hub_model_id,\n",
+    "    ),\n",
+    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
+    ")\n",
+    "model.config.use_cache = False\n",
+    "\n",
+    "push_frequency = 100\n",
+    "trainer.add_callback(PushToHubCallback(trainer, push_frequency,))\n",
+    "\n",
+    "trainer.train()\n",
+    "\n",
+    "final_commit_hash = trainer.push_to_hub(\"Training complete\")\n",
+    "print(f\"Training complete. Final commit hash: {final_commit_hash}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "_0MOtwf3zdZp"
+   },
+   "source": [
+    "Run the cell below to run the training! For the sake of the demo, we just ran it for few steps just to showcase how to use this integration with existing tools on the HF ecosystem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 481
+    },
+    "id": "jq0nX33BmfaC",
+    "outputId": "7f470980-c49e-4230-b947-ad43510f1bee"
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='10' max='10' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [10/10 13:02, Epoch 0/1]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>1</td>\n",
+       "      <td>2.042200</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2</td>\n",
+       "      <td>1.293400</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3</td>\n",
+       "      <td>1.447500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>4</td>\n",
+       "      <td>1.433600</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>5</td>\n",
+       "      <td>1.725900</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>6</td>\n",
+       "      <td>1.506400</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>7</td>\n",
+       "      <td>1.549600</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>8</td>\n",
+       "      <td>1.038300</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>9</td>\n",
+       "      <td>1.603300</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>10</td>\n",
+       "      <td>1.676400</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "TrainOutput(global_step=10, training_loss=1.531658697128296, metrics={'train_runtime': 861.2678, 'train_samples_per_second': 0.046, 'train_steps_per_second': 0.012, 'total_flos': 56809829376000.0, 'train_loss': 1.531658697128296, 'epoch': 0.02})"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import transformers\n",
+    "\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "\n",
+    "trainer = transformers.Trainer(\n",
+    "    model=model,\n",
+    "    train_dataset=data[\"train\"],\n",
+    "    args=transformers.TrainingArguments(\n",
+    "        per_device_train_batch_size=1,\n",
+    "        gradient_accumulation_steps=8,\n",
+    "        gradient_checkpointing=True,\n",
+    "        warmup_steps=2,\n",
+    "        max_steps=10,\n",
+    "        learning_rate=2e-4,\n",
+    "        fp16=True,\n",
+    "        logging_steps=1,\n",
+    "        output_dir=\"outputs\",\n",
+    "        optim=\"adamw_bnb_8bit\",\n",
+    "        logging_first_step=True,\n",
+    "    ),\n",
+    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
+    ")\n",
+    "model.config.use_cache = False  # silence the warnings. Please re-enable for inference!\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "05iBmtP6X3Mq"
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "gpuClass": "standard",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

.ipynb_checkpoints/test_ft-checkpoint.py ADDED Viewed

	@@ -0,0 +1,165 @@

+import argparse
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
+import transformers
+from peft import LoraConfig, get_peft_model
+from datasets import load_dataset
+from transformers.trainer_callback import TrainerCallback
+import os
+import random
+import subprocess
+from huggingface_hub import HfApi, hf_hub_download
+def generate_mmlu_slurm(model_path, hub_model_id, output_dir, num_gpus=1):
+    model_short_name = model_path.split('/')[-1]
+    filename = f"run_mmlu_{model_short_name}.sbatch"
+    port = random.randint(10000, 65535)
+    content = f"""#!/bin/bash
+#SBATCH --nodes=1
+#SBATCH --gpus-per-node={num_gpus}
+#SBATCH --time=24:00:00
+#SBATCH --job-name={port}_mmlu_{model_short_name}
+#SBATCH --mail-user=mailto:[email protected]
+#SBATCH --mail-type=ALL
+source /opt/rh/devtoolset-10/enable
+source /data/davis_brown/miniconda3/bin/activate
+conda init
+conda activate quip
+CUDA_VISIBLE_DEVICES=0 lm_eval \\
+    --model hf \\
+    --model_args pretrained={model_path},parallelize=True,peft={hub_model_id} \\
+    --tasks mmlu \\
+    --device cuda:0 \\
+    --batch_size 8 \\
+    --output_path={output_dir}/{hub_model_id} \\
+    --num_fewshot 5
+"""
+    with open(filename, 'w') as f:
+        f.write(content)
+    print(f"Generated MMLU evaluation SLURM script: {filename}")
+    return filename
+def launch_mmlu_evaluation(model_path, hub_model_id, output_dir):
+    slurm_script = generate_mmlu_slurm(model_path, hub_model_id, output_dir)
+    try:
+        subprocess.run(["sbatch", slurm_script], check=True)
+        print(f"Submitted MMLU evaluation job: {slurm_script}")
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to submit MMLU evaluation job: {e}")
+# Custom callback to push to Hub
+class PushToHubCallback(TrainerCallback):
+    def __init__(self, trainer, push_frequency):
+        self.trainer = trainer
+        self.push_frequency = push_frequency
+    def on_step_end(self, args, state, control, **kwargs):
+        if state.global_step % self.push_frequency == 0:
+            self.trainer.save_model()
+            self.trainer.push_to_hub(
+                commit_message=f"Training in progress - Step {state.global_step}"
+            )
+def main(args):
+    if args.only_mmlu:
+        launch_mmlu_evaluation(args.model_id, args.hub_model_id, args.output_dir)
+        return
+    model_id = args.model_id
+    output_dir = args.output_dir
+    hub_model_id = args.hub_model_id
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto", low_cpu_mem_usage=True)
+    target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj', ]# 'lm_head']
+    config = LoraConfig(
+        r=args.lora_rank,
+        lora_alpha=args.lora_rank,
+        target_modules=target_modules,
+        lora_dropout=0.05,
+        bias="none",
+        task_type="CAUSAL_LM",
+        use_rslora=True
+    )
+    model = get_peft_model(model, config)
+    model.print_trainable_parameters()
+    model.enable_input_require_grads()
+    # data = load_dataset("togethercomputer/RedPajama-Data-1T-Sample")
+    data = load_dataset("open-web-math/open-web-math")
+    max_seq_length = args.max_seq_length
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.model_max_length = max_seq_length
+    def preprocess_function(examples):
+        return tokenizer(examples["text"], truncation=True, max_length=max_seq_length, padding="max_length")
+    processed_dataset = data["train"].map(preprocess_function, batched=True)
+    tokenizer.pad_token = tokenizer.eos_token
+    torch.cuda.empty_cache()
+    trainer = transformers.Trainer(
+        model=model,
+        train_dataset=processed_dataset,
+        args=TrainingArguments(
+            per_device_train_batch_size=args.batch_size,
+            gradient_accumulation_steps=args.gradient_accumulation_steps,
+            gradient_checkpointing=True,
+            warmup_steps=200,
+            max_steps=args.max_steps,
+            learning_rate=2e-4,
+            bf16=True,
+            logging_steps=25,
+            output_dir=output_dir,
+            optim="adamw_bnb_8bit",
+            logging_first_step=True,
+            push_to_hub=True,
+            hub_model_id=hub_model_id,
+        ),
+        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
+    )
+    model.config.use_cache = False
+    push_frequency = 100
+    trainer.add_callback(PushToHubCallback(trainer, push_frequency, hub_model_id))
+    trainer.train()
+    final_commit_hash = trainer.push_to_hub("Training complete")
+    print(f"Training complete. Final commit hash: {final_commit_hash}")
+    # MMLU Evaluation
+    if args.run_mmlu:
+        launch_mmlu_evaluation(model_id, hub_model_id, output_dir)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Fine-tune a language model and/or run MMLU evaluation")
+    parser.add_argument("--model_id", type=str, default="ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16",
+                        help="Model ID to fine-tune or evaluate")
+    parser.add_argument("--max_seq_length", type=int, default=2048, help="Maximum sequence length")
+    parser.add_argument("--output_dir", type=str, required=True, help="Output directory for checkpoints and results")
+    parser.add_argument("--hub_model_id", type=str,
+                        default="davisrbr/ISTA-DASLab-Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16-hf-100000_r8_cont",
+                        help="Hub model ID for pushing or LoRA weights")
+    parser.add_argument("--batch_size", type=int, default=1, help="Per-device batch size")
+    parser.add_argument("--gradient_accumulation_steps", type=int, default=8, help="Gradient accumulation steps")
+    parser.add_argument("--max_steps", type=int, default=50000, help="Maximum number of training steps")
+    parser.add_argument("--run_mmlu", action="store_true", help="Run MMLU evaluation after training")
+    parser.add_argument("--lora_rank", type=int, default=8, help="Rank of LoRA adaptation")
+    parser.add_argument("--only_mmlu", action="store_true", help="Only run MMLU evaluation without training")
+    parser.add_argument("--launch_slurm", action="store_true", help="Launch the entire script as a SLURM job")
+    parser.add_argument("--num_gpus", type=int, default=4, help="Number of GPUs to use for training")
+    parser.add_argument("--commit_hash", type=str, help="Specific commit hash to evaluate (for MMLU only)")
+    args = parser.parse_args()
+    main(args)

=0.27.0 ADDED Viewed

	@@ -0,0 +1,35 @@

+Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (0.33.0)
+Requirement already satisfied: numpy<2.0.0,>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate) (1.24.1)
+Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (23.2)
+Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.6)
+Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate) (6.0.1)
+Requirement already satisfied: torch>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.4.0)
+Requirement already satisfied: huggingface-hub>=0.21.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.24.5)
+Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.4.4)
+Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.21.0->accelerate) (3.9.0)
+Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.21.0->accelerate) (2024.6.1)
+Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.21.0->accelerate) (2.31.0)
+Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.21.0->accelerate) (4.66.5)
+Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.21.0->accelerate) (4.12.2)
+Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (1.12)
+Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.0)
+Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.1.2)
+Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.105)
+Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.105)
+Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.105)
+Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (9.1.0.70)
+Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.3.1)
+Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (11.0.2.54)
+Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (10.3.2.106)
+Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (11.4.5.107)
+Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.0.106)
+Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (2.20.5)
+Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (12.1.105)
+Requirement already satisfied: triton==3.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.10.0->accelerate) (3.0.0)
+Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107->torch>=1.10.0->accelerate) (12.6.20)
+Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.10.0->accelerate) (2.1.2)
+Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.21.0->accelerate) (2.1.1)
+Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.21.0->accelerate) (3.4)
+Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.21.0->accelerate) (1.26.13)
+Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.21.0->accelerate) (2022.12.7)
+Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.10.0->accelerate) (1.3.0)

=1.1.0 ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+base_model: ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
+library_name: peft
+tags:
+- generated_from_trainer
+model-index:
+- name: math-lora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# math-lora
+This model is a fine-tuned version of [ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16](https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16) on the None dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 200
+- training_steps: 10000
+### Framework versions
+- PEFT 0.12.1.dev0
+- Transformers 4.45.0.dev0
+- Pytorch 2.4.0+cu121
+- Datasets 2.21.0
+- Tokenizers 0.19.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "down_proj",
+    "k_proj",
+    "o_proj",
+    "up_proj",
+    "v_proj",
+    "q_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82ba54afcc89ac3e9bbbae5baec7816b8511b7f56bc84988f1bdf274f598e5bd
+size 167832240

aqlm_2bit_training.ipynb ADDED Viewed

	@@ -0,0 +1,622 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "XIyP_0r6zuVc"
+   },
+   "source": [
+    "# Training Large Language Models in 2bit with `aqlm`, `transformers` and `PEFT`\n",
+    "\n",
+    "<a target=\"_blank\" href=\"https://colab.research.google.com/github/Vahe1994/AQLM/blob/main/notebooks/aqlm_2bit_training.ipynb\">\n",
+    "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
+    "</a>\n",
+    "\n",
+    "Welcome to this notebook that goes through the recent `aqlm` integration that introduces minimal performance degradation 2bit quantization techniques.\n",
+    "\n",
+    "In this notebook, we will learn how to load a large model in 2bit (`Mixtral-8x7b`) and train it using Google Colab and PEFT library from Hugging Face 🤗.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "A_VgSpl4Dsr3"
+   },
+   "source": [
+    "**Install the `aqlm` library**\n",
+    "- It's the only extra dependency to run AQLM models.\n",
+    "- Add `[gpu]` to install the required CUDA specific dependencies.\n",
+    "- Install the latest `accelerate` and `transformers` releases to properly support it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "FuXIFTFapAMI"
+   },
+   "outputs": [],
+   "source": [
+    "%%capture\n",
+    "!pip install aqlm[gpu]>=1.1.0\n",
+    "!pip install git+https://github.com/huggingface/peft.git@main\n",
+    "!pip install accelerate>=0.27.0\n",
+    "!pip install git+https://github.com/huggingface/transformers.git@main\n",
+    "!pip install datasets\n",
+    "!pip install bitsandbytes\n",
+    "# for 8-bit optimizer only"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MJ-5idQwzvg-"
+   },
+   "source": [
+    "First let's load the model we are going to use - `Mixtral-8x7b`! Note that the model itself is around 50GB in half precision"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "id": "E0Nl5mWL0k2T"
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n",
+    "\n",
+    "model_id = \"ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16\"\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
+    "model = AutoModelForCausalLM.from_pretrained(model_id, device_map=\"auto\", torch_dtype=\"bfloat16\", low_cpu_mem_usage=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Mp2gMi1ZzGET"
+   },
+   "source": [
+    "**Add LoRA**\n",
+    "\n",
+    "To alter model's behavior, we have to make it trainable. We can do that by addind a small set of trainable parameters on top of the untrainable quantized ones."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "Ybeyl20n3dYH",
+    "outputId": "0efda156-4886-4718-9877-e93a17dc02d2"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "trainable params: 41,943,040 || all params: 2,084,114,432 || trainable%: 2.0125\n"
+     ]
+    }
+   ],
+   "source": [
+    "from peft import LoraConfig, get_peft_model\n",
+    "\n",
+    "config = LoraConfig(\n",
+    "    r=16,\n",
+    "    lora_alpha=32,\n",
+    "    target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj', ],\n",
+    "    lora_dropout=0.05,\n",
+    "    bias=\"none\",\n",
+    "    task_type=\"CAUSAL_LM\"\n",
+    ")\n",
+    "\n",
+    "model = get_peft_model(model, config)\n",
+    "model.print_trainable_parameters()\n",
+    "model.enable_input_require_grads() # it's needed for gradient checkpointing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "4xSPH1D_Wv9x"
+   },
+   "source": [
+    "Here we add a trainable adapter ontop of every `q_prok`, `k_proj` and `o_proj` linear layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "FCc64bfnmd3j"
+   },
+   "source": [
+    "**Loading a dataset**\n",
+    "\n",
+    "Let's load a common dataset, english quotes, to fine tune our model on famous quotes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "id": "s6f4z8EYmcJ6"
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "9ef07f1bc62e4887817a81d4a3e15da1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Resolving data files:   0%|          | 0/114 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loaded dataset with 100000 examples\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "560c7be6397c4e3aac2318d97f1f8f86",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b667958a3b3d4529b77baf5e5bc9c259",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "10359f3b8d974be49da2d3fd87f89576",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "97835946d4a44460bc1bd48276b8d3d0",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ed37faeff8914b369649cb514981991d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "7a56a1781f3347f8a056a18dc24ea7a9",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/100000 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Processed dataset has 100000 examples\n",
+      "Features: {'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None), 'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)}\n"
+     ]
+    }
+   ],
+   "source": [
+    "from datasets import load_dataset, Dataset\n",
+    "import itertools\n",
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "# Load the dataset in streaming mode\n",
+    "ds = load_dataset(\"open-web-math/open-web-math\", split=\"train\", streaming=True)\n",
+    "\n",
+    "# Define the number of examples you want to load\n",
+    "num_examples = 100000  # Adjust this number as needed\n",
+    "\n",
+    "# Create a subset by taking the first num_examples\n",
+    "subset = list(itertools.islice(ds, num_examples))\n",
+    "\n",
+    "# Convert the subset to a Dataset object\n",
+    "data = Dataset.from_list(subset)\n",
+    "print(f\"Loaded dataset with {len(data)} examples\")\n",
+    "\n",
+    "# Initialize tokenizer (replace 'gpt2' with your specific model if different)\n",
+    "tokenizer = AutoTokenizer.from_pretrained('gpt2')\n",
+    "\n",
+    "max_seq_length = 2048\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "tokenizer.model_max_length = max_seq_length\n",
+    "\n",
+    "def preprocess_function(examples):\n",
+    "    # Join the list of strings into a single string\n",
+    "    texts = [\" \".join(text) for text in examples[\"text\"]]\n",
+    "    return tokenizer(texts, truncation=True, max_length=max_seq_length, padding=\"max_length\")\n",
+    "\n",
+    "# Process the dataset\n",
+    "processed_dataset = data.map(preprocess_function, batched=True, remove_columns=data.column_names)\n",
+    "\n",
+    "print(f\"Processed dataset has {len(processed_dataset)} examples\")\n",
+    "print(f\"Features: {processed_dataset.features}\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import argparse\n",
+    "import torch\n",
+    "from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments\n",
+    "import transformers\n",
+    "from peft import LoraConfig, get_peft_model\n",
+    "from datasets import load_dataset\n",
+    "from transformers.trainer_callback import TrainerCallback\n",
+    "import os\n",
+    "import random\n",
+    "import subprocess\n",
+    "from huggingface_hub import HfApi, hf_hub_download\n",
+    "\n",
+    "\n",
+    "# Custom callback to push to Hub\n",
+    "class PushToHubCallback(TrainerCallback):\n",
+    "    def __init__(self, trainer, push_frequency):\n",
+    "        self.trainer = trainer\n",
+    "        self.push_frequency = push_frequency\n",
+    "\n",
+    "    def on_step_end(self, args, state, control, **kwargs):\n",
+    "        if state.global_step % self.push_frequency == 0:\n",
+    "            self.trainer.save_model()\n",
+    "            self.trainer.push_to_hub(\n",
+    "                commit_message=f\"Training in progress - Step {state.global_step}\"\n",
+    "            )\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "74a0f8d448004c048d8b0608fa3a61fd",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from huggingface_hub import notebook_login\n",
+    "\n",
+    "notebook_login()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "max_steps is given, it will override any value given in num_train_epochs\n",
+      "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.\n",
+      "  return fn(*args, **kwargs)\n",
+      "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.\n",
+      "  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='22' max='10000' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [   22/10000 09:16 < 77:05:02, 0.04 it/s, Epoch 0.01/4]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>1</td>\n",
+       "      <td>5.558500</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "hub_model_id = \"davisrbr/math-lora\"\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "torch.cuda.empty_cache()\n",
+    "trainer = transformers.Trainer(\n",
+    "    model=model,\n",
+    "    train_dataset=processed_dataset,\n",
+    "    args=TrainingArguments(\n",
+    "        per_device_train_batch_size=4,\n",
+    "        gradient_accumulation_steps=8,\n",
+    "        gradient_checkpointing=True,\n",
+    "        warmup_steps=200,\n",
+    "        max_steps=10000,\n",
+    "        learning_rate=2e-4,\n",
+    "        bf16=True,\n",
+    "        logging_steps=25,\n",
+    "        output_dir=\".\",\n",
+    "        optim=\"adamw_bnb_8bit\",\n",
+    "        logging_first_step=True,\n",
+    "        push_to_hub=True,\n",
+    "        hub_model_id=hub_model_id,\n",
+    "    ),\n",
+    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
+    ")\n",
+    "model.config.use_cache = False\n",
+    "\n",
+    "push_frequency = 100\n",
+    "trainer.add_callback(PushToHubCallback(trainer, push_frequency,))\n",
+    "\n",
+    "trainer.train()\n",
+    "\n",
+    "final_commit_hash = trainer.push_to_hub(\"Training complete\")\n",
+    "print(f\"Training complete. Final commit hash: {final_commit_hash}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "_0MOtwf3zdZp"
+   },
+   "source": [
+    "Run the cell below to run the training! For the sake of the demo, we just ran it for few steps just to showcase how to use this integration with existing tools on the HF ecosystem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 481
+    },
+    "id": "jq0nX33BmfaC",
+    "outputId": "7f470980-c49e-4230-b947-ad43510f1bee"
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "    <div>\n",
+       "      \n",
+       "      <progress value='10' max='10' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
+       "      [10/10 13:02, Epoch 0/1]\n",
+       "    </div>\n",
+       "    <table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       " <tr style=\"text-align: left;\">\n",
+       "      <th>Step</th>\n",
+       "      <th>Training Loss</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <td>1</td>\n",
+       "      <td>2.042200</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>2</td>\n",
+       "      <td>1.293400</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>3</td>\n",
+       "      <td>1.447500</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>4</td>\n",
+       "      <td>1.433600</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>5</td>\n",
+       "      <td>1.725900</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>6</td>\n",
+       "      <td>1.506400</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>7</td>\n",
+       "      <td>1.549600</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>8</td>\n",
+       "      <td>1.038300</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>9</td>\n",
+       "      <td>1.603300</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <td>10</td>\n",
+       "      <td>1.676400</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table><p>"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "TrainOutput(global_step=10, training_loss=1.531658697128296, metrics={'train_runtime': 861.2678, 'train_samples_per_second': 0.046, 'train_steps_per_second': 0.012, 'total_flos': 56809829376000.0, 'train_loss': 1.531658697128296, 'epoch': 0.02})"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import transformers\n",
+    "\n",
+    "tokenizer.pad_token = tokenizer.eos_token\n",
+    "\n",
+    "trainer = transformers.Trainer(\n",
+    "    model=model,\n",
+    "    train_dataset=data[\"train\"],\n",
+    "    args=transformers.TrainingArguments(\n",
+    "        per_device_train_batch_size=1,\n",
+    "        gradient_accumulation_steps=8,\n",
+    "        gradient_checkpointing=True,\n",
+    "        warmup_steps=2,\n",
+    "        max_steps=10,\n",
+    "        learning_rate=2e-4,\n",
+    "        fp16=True,\n",
+    "        logging_steps=1,\n",
+    "        output_dir=\"outputs\",\n",
+    "        optim=\"adamw_bnb_8bit\",\n",
+    "        logging_first_step=True,\n",
+    "    ),\n",
+    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
+    ")\n",
+    "model.config.use_cache = False  # silence the warnings. Please re-enable for inference!\n",
+    "trainer.train()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "05iBmtP6X3Mq"
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "gpuType": "T4",
+   "provenance": []
+  },
+  "gpuClass": "standard",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

test_ft.py ADDED Viewed

	@@ -0,0 +1,165 @@

+import argparse
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
+import transformers
+from peft import LoraConfig, get_peft_model
+from datasets import load_dataset
+from transformers.trainer_callback import TrainerCallback
+import os
+import random
+import subprocess
+from huggingface_hub import HfApi, hf_hub_download
+def generate_mmlu_slurm(model_path, hub_model_id, output_dir, num_gpus=1):
+    model_short_name = model_path.split('/')[-1]
+    filename = f"run_mmlu_{model_short_name}.sbatch"
+    port = random.randint(10000, 65535)
+    content = f"""#!/bin/bash
+#SBATCH --nodes=1
+#SBATCH --gpus-per-node={num_gpus}
+#SBATCH --time=24:00:00
+#SBATCH --job-name={port}_mmlu_{model_short_name}
+#SBATCH --mail-user=mailto:[email protected]
+#SBATCH --mail-type=ALL
+source /opt/rh/devtoolset-10/enable
+source /data/davis_brown/miniconda3/bin/activate
+conda init
+conda activate quip
+CUDA_VISIBLE_DEVICES=0 lm_eval \\
+    --model hf \\
+    --model_args pretrained={model_path},parallelize=True,peft={hub_model_id} \\
+    --tasks mmlu \\
+    --device cuda:0 \\
+    --batch_size 8 \\
+    --output_path={output_dir}/{hub_model_id} \\
+    --num_fewshot 5
+"""
+    with open(filename, 'w') as f:
+        f.write(content)
+    print(f"Generated MMLU evaluation SLURM script: {filename}")
+    return filename
+def launch_mmlu_evaluation(model_path, hub_model_id, output_dir):
+    slurm_script = generate_mmlu_slurm(model_path, hub_model_id, output_dir)
+    try:
+        subprocess.run(["sbatch", slurm_script], check=True)
+        print(f"Submitted MMLU evaluation job: {slurm_script}")
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to submit MMLU evaluation job: {e}")
+# Custom callback to push to Hub
+class PushToHubCallback(TrainerCallback):
+    def __init__(self, trainer, push_frequency):
+        self.trainer = trainer
+        self.push_frequency = push_frequency
+    def on_step_end(self, args, state, control, **kwargs):
+        if state.global_step % self.push_frequency == 0:
+            self.trainer.save_model()
+            self.trainer.push_to_hub(
+                commit_message=f"Training in progress - Step {state.global_step}"
+            )
+def main(args):
+    if args.only_mmlu:
+        launch_mmlu_evaluation(args.model_id, args.hub_model_id, args.output_dir)
+        return
+    model_id = args.model_id
+    output_dir = args.output_dir
+    hub_model_id = args.hub_model_id
+    tokenizer = AutoTokenizer.from_pretrained(model_id)
+    model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto", low_cpu_mem_usage=True)
+    target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj', ]# 'lm_head']
+    config = LoraConfig(
+        r=args.lora_rank,
+        lora_alpha=args.lora_rank,
+        target_modules=target_modules,
+        lora_dropout=0.05,
+        bias="none",
+        task_type="CAUSAL_LM",
+        use_rslora=True
+    )
+    model = get_peft_model(model, config)
+    model.print_trainable_parameters()
+    model.enable_input_require_grads()
+    # data = load_dataset("togethercomputer/RedPajama-Data-1T-Sample")
+    data = load_dataset("open-web-math/open-web-math")
+    max_seq_length = args.max_seq_length
+    tokenizer.pad_token = tokenizer.eos_token
+    tokenizer.model_max_length = max_seq_length
+    def preprocess_function(examples):
+        return tokenizer(examples["text"], truncation=True, max_length=max_seq_length, padding="max_length")
+    processed_dataset = data["train"].map(preprocess_function, batched=True)
+    tokenizer.pad_token = tokenizer.eos_token
+    torch.cuda.empty_cache()
+    trainer = transformers.Trainer(
+        model=model,
+        train_dataset=processed_dataset,
+        args=TrainingArguments(
+            per_device_train_batch_size=args.batch_size,
+            gradient_accumulation_steps=args.gradient_accumulation_steps,
+            gradient_checkpointing=True,
+            warmup_steps=200,
+            max_steps=args.max_steps,
+            learning_rate=2e-4,
+            bf16=True,
+            logging_steps=25,
+            output_dir=output_dir,
+            optim="adamw_bnb_8bit",
+            logging_first_step=True,
+            push_to_hub=True,
+            hub_model_id=hub_model_id,
+        ),
+        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
+    )
+    model.config.use_cache = False
+    push_frequency = 100
+    trainer.add_callback(PushToHubCallback(trainer, push_frequency, hub_model_id))
+    trainer.train()
+    final_commit_hash = trainer.push_to_hub("Training complete")
+    print(f"Training complete. Final commit hash: {final_commit_hash}")
+    # MMLU Evaluation
+    if args.run_mmlu:
+        launch_mmlu_evaluation(model_id, hub_model_id, output_dir)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Fine-tune a language model and/or run MMLU evaluation")
+    parser.add_argument("--model_id", type=str, default="ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16",
+                        help="Model ID to fine-tune or evaluate")
+    parser.add_argument("--max_seq_length", type=int, default=2048, help="Maximum sequence length")
+    parser.add_argument("--output_dir", type=str, required=True, help="Output directory for checkpoints and results")
+    parser.add_argument("--hub_model_id", type=str,
+                        default="davisrbr/ISTA-DASLab-Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16-hf-100000_r8_cont",
+                        help="Hub model ID for pushing or LoRA weights")
+    parser.add_argument("--batch_size", type=int, default=1, help="Per-device batch size")
+    parser.add_argument("--gradient_accumulation_steps", type=int, default=8, help="Gradient accumulation steps")
+    parser.add_argument("--max_steps", type=int, default=50000, help="Maximum number of training steps")
+    parser.add_argument("--run_mmlu", action="store_true", help="Run MMLU evaluation after training")
+    parser.add_argument("--lora_rank", type=int, default=8, help="Rank of LoRA adaptation")
+    parser.add_argument("--only_mmlu", action="store_true", help="Only run MMLU evaluation without training")
+    parser.add_argument("--launch_slurm", action="store_true", help="Launch the entire script as a SLURM job")
+    parser.add_argument("--num_gpus", type=int, default=4, help="Number of GPUs to use for training")
+    parser.add_argument("--commit_hash", type=str, help="Specific commit hash to evaluate (for MMLU only)")
+    args = parser.parse_args()
+    main(args)

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:222b5a3aebccf46c6252f469180d59f06f801f667a1e0d747a680d0a72a218de
+size 5176