File size: 2,149 Bytes
ebb2823
883ca40
ebb2823
883ca40
ebb2823
883ca40
ebb2823
883ca40
ebb2823
 
 
883ca40
 
 
ebb2823
 
 
883ca40
ebb2823
883ca40
 
 
ebb2823
 
883ca40
 
 
 
 
ebb2823
883ca40
 
ebb2823
883ca40
ebb2823
883ca40
ebb2823
883ca40
 
 
 
 
 
 
 
 
 
ebb2823
883ca40
 
ebb2823
883ca40
 
 
 
 
 
ebb2823
883ca40
 
 
 
ebb2823
 
 
883ca40
ebb2823
883ca40
 
 
 
 
 
 
 
 
 
 
ebb2823
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
base_model: amd/Instella-3B-Instruct

---

# MISHANM/amd-Instella-3B-Instruct-fp8

This model represents an fp8 quantized adaptation of the Instella-3B-Instruct, specifically engineered for deployment on compatible hardware platforms. It offers enhanced computational efficiency, ensuring faster processing and reduced resource usage, while consistently maintaining the high-quality performance characteristics of the original model.


## Model Details
1. Tasks: Causal Language Modeling, Text Generation
2. Base Model: amd/Instella-3B-Instruct
3. Quantization Format: fp8



# Device Used

1. GPUs: 1*AMD Instinct™ MI210 Accelerators 
  
   


 ## Inference with HuggingFace
 ```python3
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = "MISHANM/amd-Instella-3B-Instruct-fp8"

model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to generate text
def generate_text(prompt, max_length=1000, temperature=0.9):
    # Format the prompt according to the chat template
    messages = [
        {
            "role": "system",
            "content": "Give response to the user query.",
        },
        {"role": "user", "content": prompt}
    ]

    # Apply the chat template
    formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"

    # Tokenize and generate output
    inputs = tokenizer(formatted_prompt, return_tensors="pt")
    output = model.generate(  # Use model.module for DataParallel
        **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = """Give a poem on LLM ."""
text = generate_text(prompt)
print(text)



```

## Citation Information
```
@misc{MISHANM/amd-Instella-3B-Instruct-fp8,
  author = {Mishan Maurya},
  title = {Introducing fp8 quantized version of amd/Instella-3B-Instruct},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  
}
```