Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ license_link: https://ai.google.dev/gemma/terms
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# Gemma Model Card
|
| 21 |
-
|
| 22 |
|
| 23 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
|
| 24 |
|
|
@@ -54,32 +54,15 @@ state of the art AI models and helping foster innovation for everyone.
|
|
| 54 |
|
| 55 |
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
|
| 56 |
|
| 57 |
-
#### Running the model on a CPU
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
```python
|
| 61 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 62 |
-
|
| 63 |
-
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
|
| 64 |
-
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
|
| 65 |
-
|
| 66 |
-
input_text = "Write me a poem about Machine Learning."
|
| 67 |
-
input_ids = tokenizer(input_text, return_tensors="pt")
|
| 68 |
-
|
| 69 |
-
outputs = model.generate(**input_ids)
|
| 70 |
-
print(tokenizer.decode(outputs[0]))
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
|
| 74 |
#### Running the model on a single / multi GPU
|
| 75 |
|
| 76 |
|
| 77 |
```python
|
| 78 |
-
# pip install accelerate
|
| 79 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 80 |
|
| 81 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 82 |
-
model = AutoModelForCausalLM.from_pretrained("
|
| 83 |
|
| 84 |
input_text = "Write me a poem about Machine Learning."
|
| 85 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# Gemma Model Card
|
| 21 |
+
This model card is copied from the original google/gemma-2b-it with edits on running this auto-gptq quantized version of the model. This auto-gptq quantized version of the model had only been tested to work on cuda GPU.
|
| 22 |
|
| 23 |
**Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
|
| 24 |
|
|
|
|
| 54 |
|
| 55 |
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
#### Running the model on a single / multi GPU
|
| 58 |
|
| 59 |
|
| 60 |
```python
|
| 61 |
+
# !pip install --upgrade -q transformers accelerate auto-gptq optimum
|
| 62 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 63 |
|
| 64 |
+
tokenizer = AutoTokenizer.from_pretrained("eralFlare/gemma-2b-it")
|
| 65 |
+
model = AutoModelForCausalLM.from_pretrained("eralFlare/gemma-2b-it", device_map="auto")
|
| 66 |
|
| 67 |
input_text = "Write me a poem about Machine Learning."
|
| 68 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|