eralFlare commited on
Commit
2adbc8c
·
verified ·
1 Parent(s): 8fa5748

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -21
README.md CHANGED
@@ -18,7 +18,7 @@ license_link: https://ai.google.dev/gemma/terms
18
  ---
19
 
20
  # Gemma Model Card
21
-
22
 
23
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
24
 
@@ -54,32 +54,15 @@ state of the art AI models and helping foster innovation for everyone.
54
 
55
  Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
56
 
57
- #### Running the model on a CPU
58
-
59
-
60
- ```python
61
- from transformers import AutoTokenizer, AutoModelForCausalLM
62
-
63
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
64
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
65
-
66
- input_text = "Write me a poem about Machine Learning."
67
- input_ids = tokenizer(input_text, return_tensors="pt")
68
-
69
- outputs = model.generate(**input_ids)
70
- print(tokenizer.decode(outputs[0]))
71
- ```
72
-
73
-
74
  #### Running the model on a single / multi GPU
75
 
76
 
77
  ```python
78
- # pip install accelerate
79
  from transformers import AutoTokenizer, AutoModelForCausalLM
80
 
81
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
82
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto")
83
 
84
  input_text = "Write me a poem about Machine Learning."
85
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
18
  ---
19
 
20
  # Gemma Model Card
21
+ This model card is copied from the original google/gemma-2b-it with edits on running this auto-gptq quantized version of the model. This auto-gptq quantized version of the model had only been tested to work on cuda GPU.
22
 
23
  **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
24
 
 
54
 
55
  Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers`, then copy the snippet from the section that is relevant for your usecase.
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  #### Running the model on a single / multi GPU
58
 
59
 
60
  ```python
61
+ # !pip install --upgrade -q transformers accelerate auto-gptq optimum
62
  from transformers import AutoTokenizer, AutoModelForCausalLM
63
 
64
+ tokenizer = AutoTokenizer.from_pretrained("eralFlare/gemma-2b-it")
65
+ model = AutoModelForCausalLM.from_pretrained("eralFlare/gemma-2b-it", device_map="auto")
66
 
67
  input_text = "Write me a poem about Machine Learning."
68
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")