--- license: mit datasets: - Replete-AI/code_bagel --- # Phi-nut-Butter-Codebagel-v1 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324ce4d5d0cf5c62c6e3c5a/ayrvhUhdbawRVfNiqoOP7.png) ## Model Details **Model Name:** Phi-nut-Butter-Codebagel-v1 **Base Model:** [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) **Fine-tuning Method:** Supervised Fine-Tuning (SFT) **Dataset:** [Code Bagel](https://huggingface.co/datasets/Replete-AI/code_bagel) **Training Data:** 75,000 randomly selected rows from Code Bagel dataset **Training Duration:** 23 hours **Hardware:** Nvidia RTX A4500 **Epochs:** 3 ## Training Procedure This model was fine-tuned to provide better instructions on code. The training was conducted using PEFT and SFTTrainer on the Code Bagel dataset. Training was completed in 3 epochs over a span of 23 hours on an Nvidia A4500 GPU. ## Intended Use This model is designed to improve instruction-following capabilities, particularly for code-related tasks. ## Getting Started ## Instruct Template ```bash <|system|> {system_message} <|end|> <|user|> {Prompt) <|end|> <|assistant|> ``` ### Transfromers ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig model_name_or_path = "thesven/Phi-nut-Butter-Codebagel-v1" # BitsAndBytesConfig for loading the model in 4-bit precision bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", ) tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_name_or_path, device_map="auto", trust_remote_code=False, revision="main", quantization_config=bnb_config ) model.pad_token = model.config.eos_token_id prompt_template = ''' <|system|> You are an expert developer. Please help me with any coding questions.<|end|> <|user|> Create a function to get the total sum from an array of ints.<|end|> <|assistant|> ''' input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.1, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=256) generated_text = tokenizer.decode(output[0, len(input_ids[0]):], skip_special_tokens=True) print(generated_text) ```