eduardo-alvarez commited on
Commit
8d9ad4b
·
1 Parent(s): 46f3e87

updating deployment tips

Browse files
Files changed (3) hide show
  1. app.py +3 -3
  2. info/deployment.py +48 -108
  3. info/programs.py +0 -6
app.py CHANGED
@@ -27,9 +27,9 @@ from info.about import(
27
  ABOUT)
28
  from src.processing import filter_benchmarks_table
29
 
30
- inference_endpoint_url = os.environ['inference_endpoint_url']
31
- submission_form_endpoint_url = os.environ['submission_form_endpoint_url']
32
- inference_concurrency_limit = os.environ['inference_concurrency_limit']
33
 
34
  demo = gr.Blocks()
35
 
 
27
  ABOUT)
28
  from src.processing import filter_benchmarks_table
29
 
30
+ #inference_endpoint_url = os.environ['inference_endpoint_url']
31
+ #submission_form_endpoint_url = os.environ['submission_form_endpoint_url']
32
+ #inference_concurrency_limit = os.environ['inference_concurrency_limit']
33
 
34
  demo = gr.Blocks()
35
 
info/deployment.py CHANGED
@@ -19,31 +19,15 @@ helps you choose the best option for your specific use case. Happy building!
19
  <th>Arc GPU</th>
20
  <th>Core Ultra</th>
21
  </tr>
22
- <tr>
23
- <td>Optimum Habana</td>
24
- <td>🚀</td>
25
- <td></td>
26
- <td></td>
27
- <td></td>
28
- <td></td>
29
  </tr>
30
- <tr>
31
- <td>Intel Extension for PyTorch</td>
32
- <td></td>
33
- <td>🚀</td>
34
  <td>🚀</td>
35
  <td>🚀</td>
36
- <td></td>
37
- </tr>
38
- <tr>
39
- <td>Intel Extension for Transformers</td>
40
- <td></td>
41
  <td>🚀</td>
42
  <td>🚀</td>
43
  <td>🚀</td>
44
- <td></td>
45
  </tr>
46
- <tr>
47
  <td>OpenVINO</td>
48
  <td></td>
49
  <td>🚀</td>
@@ -52,53 +36,20 @@ helps you choose the best option for your specific use case. Happy building!
52
  <td>🚀</td>
53
  </tr>
54
  <tr>
55
- <td>BigDL</td>
56
- <td></td>
57
- <td>🚀</td>
58
- <td>🚀</td>
59
- <td>🚀</td>
60
  <td>🚀</td>
61
- </tr>
62
- <tr>
63
- <td>NPU Acceleration Library</td>
64
- <td></td>
65
- <td></td>
66
- <td></td>
67
- <td></td>
68
  <td>🚀</td>
69
- </tr>
70
- </tr>
71
- <tr>
72
- <td>PyTorch</td>
73
  <td>🚀</td>
74
  <td>🚀</td>
75
- <td></td>
76
- <td></td>
77
  <td>🚀</td>
78
  </tr>
79
- </tr>
80
- <tr>
81
- <td>Tensorflow</td>
82
- <td>🚀</td>
83
- <td>🚀</td>
84
- <td></td>
85
- <td></td>
86
- <td>🚀</td>
87
- </tr>
88
  </table>
89
  </div>
90
 
91
  <hr>
92
 
93
  # Intel® Gaudi® Accelerators
94
- The Intel Gaudi 2 accelerator is Intel's most capable deep learning chip. You can learn about Gaudi 2 [here](https://habana.ai/products/gaudi2/).
95
-
96
- Intel Gaudi Software supports PyTorch and DeepSpeed for accelerating LLM training and inference.
97
- The Intel Gaudi Software graph compiler will optimize the execution of the operations accumulated in the graph
98
- (e.g. operator fusion, data layout management, parallelization, pipelining and memory management,
99
- and graph-level optimizations).
100
-
101
- Optimum Habana provides covenient functionality for various tasks. Below is a command line snippet to run inference on Gaudi with meta-llama/Llama-2-7b-hf.
102
 
103
  👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
104
 
@@ -118,40 +69,7 @@ python run_generation.py \
118
 
119
  <hr>
120
 
121
- # Intel® Max Series GPU
122
- The Intel® Data Center GPU Max Series is Intel's highest performing, highest density, general-purpose discrete GPU, which packs over 100 billion transistors into one package and contains up to 128 Xe Cores--Intel's foundational GPU compute building block. You can learn more about this GPU [here](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html).
123
-
124
- ### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for Python
125
- Intel® Extension for Transformers is an innovative toolkit designed to accelerate GenAI/LLM everywhere with the optimal performance of Transformer-based models on various Intel platforms, including Intel Gaudi2, Intel CPU, and Intel GPU.
126
-
127
- 👍 [Intel Extension for Transformers GitHub](https://github.com/intel/intel-extension-for-transformers)
128
-
129
- Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
130
-
131
- 👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
132
-
133
- ```python
134
- import intel_extension_for_pytorch as ipex
135
- from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
136
- from transformers import AutoTokenizer
137
-
138
- device_map = "xpu"
139
- model_name ="Qwen/Qwen-7B"
140
- tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
141
- prompt = "When winter becomes spring, the flowers..."
142
- inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)
143
-
144
- model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
145
- device_map=device_map, load_in_4bit=True)
146
-
147
- model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)
148
-
149
- output = model.generate(inputs)
150
- ```
151
- <hr>
152
-
153
  # Intel® Xeon® CPUs
154
- The Intel® Xeon® CPUs have the most built-in accelerators of any CPU on the market, including Advanced Matrix Extensions (AMX) to accelerate matrix multiplication in deep learning training and inference. Learn more about the Xeon CPUs [here](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html).
155
 
156
  ### Optimum Intel and Intel Extension for PyTorch (no quantization)
157
  🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
@@ -205,12 +123,53 @@ outputs = model.generate(inputs)
205
 
206
  <hr>
207
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  # Intel® Core Ultra (NPUs and iGPUs)
209
- Intel® Core™ Ultra Processors are optimized for premium thin and powerful laptops, featuring 3D performance hybrid architecture, advanced AI capabilities, and available with built-in Intel® Arc™ GPU. Learn more about Intel Core Ultra [here](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html). For now, there is support for smaller models like [TinyLama-1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0).
210
 
211
- ### Intel® NPU Acceleration Library
212
- The Intel® NPU Acceleration Library is a Python library designed to boost the efficiency of your applications by leveraging the power of the Intel Neural Processing Unit (NPU) to perform high-speed computations on compatible hardware.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
 
 
214
  👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
215
 
216
  ```python
@@ -244,25 +203,6 @@ print("Run inference")
244
  _ = model.generate(**generation_kwargs)
245
  ```
246
 
247
- ### OpenVINO Tooling with Optimum Intel
248
- OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
249
-
250
- 👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
251
-
252
- ```python
253
- from optimum.intel import OVModelForCausalLM
254
- from transformers import AutoTokenizer, pipeline
255
-
256
- model_id = "helenai/gpt2-ov"
257
- model = OVModelForCausalLM.from_pretrained(model_id)
258
- tokenizer = AutoTokenizer.from_pretrained(model_id)
259
-
260
- pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
261
-
262
- pipe("In the spring, beautiful flowers bloom...")
263
-
264
- ```
265
-
266
  <hr>
267
 
268
  # Intel® Arc GPUs
 
19
  <th>Arc GPU</th>
20
  <th>Core Ultra</th>
21
  </tr>
 
 
 
 
 
 
 
22
  </tr>
23
+ <td>PyTorch</td>
 
 
 
24
  <td>🚀</td>
25
  <td>🚀</td>
 
 
 
 
 
26
  <td>🚀</td>
27
  <td>🚀</td>
28
  <td>🚀</td>
 
29
  </tr>
30
+ <tr>
31
  <td>OpenVINO</td>
32
  <td></td>
33
  <td>🚀</td>
 
36
  <td>🚀</td>
37
  </tr>
38
  <tr>
39
+ <td>Hugging Face</td>
 
 
 
 
40
  <td>🚀</td>
 
 
 
 
 
 
 
41
  <td>🚀</td>
 
 
 
 
42
  <td>🚀</td>
43
  <td>🚀</td>
 
 
44
  <td>🚀</td>
45
  </tr>
 
 
 
 
 
 
 
 
 
46
  </table>
47
  </div>
48
 
49
  <hr>
50
 
51
  # Intel® Gaudi® Accelerators
52
+ Gaudi is Intel's most capable deep learning chip. You can learn about Gaudi [here](https://habana.ai/products/gaudi2/).
 
 
 
 
 
 
 
53
 
54
  👍[Optimum Habana GitHub](https://github.com/huggingface/optimum-habana)
55
 
 
69
 
70
  <hr>
71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  # Intel® Xeon® CPUs
 
73
 
74
  ### Optimum Intel and Intel Extension for PyTorch (no quantization)
75
  🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel to accelerate end-to-end pipelines on Intel architectures.
 
123
 
124
  <hr>
125
 
126
+ # Intel® Max Series GPU
127
+
128
+ ### INT4 Inference (GPU) with Intel Extension for Transformers and Intel Extension for PyTorch
129
+ 👍 [Intel Extension for PyTorch GitHub](https://github.com/intel/intel-extension-for-pytorch)
130
+
131
+ ```python
132
+ import intel_extension_for_pytorch as ipex
133
+ from intel_extension_for_transformers.transformers.modeling import AutoModelForCausalLM
134
+ from transformers import AutoTokenizer
135
+
136
+ device_map = "xpu"
137
+ model_name ="Qwen/Qwen-7B"
138
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
139
+ prompt = "When winter becomes spring, the flowers..."
140
+ inputs = tokenizer(prompt, return_tensors="pt").input_ids.to(device_map)
141
+
142
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True,
143
+ device_map=device_map, load_in_4bit=True)
144
+
145
+ model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq=True, device=device_map)
146
+
147
+ output = model.generate(inputs)
148
+ ```
149
+
150
+ <hr>
151
+
152
  # Intel® Core Ultra (NPUs and iGPUs)
 
153
 
154
+ ### OpenVINO Tooling with Optimum Intel
155
+
156
+ 👍 [OpenVINO GitHub](https://github.com/openvinotoolkit/openvino)
157
+
158
+ ```python
159
+ from optimum.intel import OVModelForCausalLM
160
+ from transformers import AutoTokenizer, pipeline
161
+
162
+ model_id = "helenai/gpt2-ov"
163
+ model = OVModelForCausalLM.from_pretrained(model_id)
164
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
165
+
166
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
167
+
168
+ pipe("In the spring, beautiful flowers bloom...")
169
+
170
+ ```
171
 
172
+ ### Intel® NPU Acceleration Library
173
  👍 [Intel NPU Acceleration Library GitHub](https://github.com/intel/intel-npu-acceleration-library)
174
 
175
  ```python
 
203
  _ = model.generate(**generation_kwargs)
204
  ```
205
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
206
  <hr>
207
 
208
  # Intel® Arc GPUs
info/programs.py CHANGED
@@ -41,10 +41,4 @@ others in the community​ and within Intel
41
 
42
  Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
43
 
44
- <hr>
45
-
46
- ## Intel DevHub Discord
47
-
48
- Join 5000+ developers on the [Intel DevHub Discord](https://discord.gg/yNYNxK2k) to get support with your submission and talk about everything from GenAI, HPC, to Quantum Computing.
49
-
50
  """
 
41
 
42
  Learn more and apply through the program at https://www.intel.com/content/www/us/en/developer/community/innovators/oneapi-innovator.html
43
 
 
 
 
 
 
 
44
  """