--- base_model: amd/Instella-3B-Instruct --- # MISHANM/amd-Instella-3B-Instruct-fp8 This model represents an fp8 quantized adaptation of the Instella-3B-Instruct, specifically engineered for deployment on compatible hardware platforms. It offers enhanced computational efficiency, ensuring faster processing and reduced resource usage, while consistently maintaining the high-quality performance characteristics of the original model. ## Model Details 1. Tasks: Causal Language Modeling, Text Generation 2. Base Model: amd/Instella-3B-Instruct 3. Quantization Format: fp8 # Device Used 1. GPUs: 1*AMD Instinctâ„¢ MI210 Accelerators ## Inference with HuggingFace ```python3 import torch from transformers import AutoModelForCausalLM, AutoTokenizer # Load the fine-tuned model and tokenizer model_path = "MISHANM/amd-Instella-3B-Instruct-fp8" model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_path) # Function to generate text def generate_text(prompt, max_length=1000, temperature=0.9): # Format the prompt according to the chat template messages = [ { "role": "system", "content": "Give response to the user query.", }, {"role": "user", "content": prompt} ] # Apply the chat template formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>" # Tokenize and generate output inputs = tokenizer(formatted_prompt, return_tensors="pt") output = model.generate( # Use model.module for DataParallel **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True ) return tokenizer.decode(output[0], skip_special_tokens=True) # Example usage prompt = """Give a poem on LLM .""" text = generate_text(prompt) print(text) ``` ## Citation Information ``` @misc{MISHANM/amd-Instella-3B-Instruct-fp8, author = {Mishan Maurya}, title = {Introducing fp8 quantized version of amd/Instella-3B-Instruct}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face repository}, } ```