Instructions to use hpcai-tech/grok-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hpcai-tech/grok-1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="hpcai-tech/grok-1", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("hpcai-tech/grok-1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use hpcai-tech/grok-1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hpcai-tech/grok-1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hpcai-tech/grok-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/hpcai-tech/grok-1
- SGLang
How to use hpcai-tech/grok-1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hpcai-tech/grok-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hpcai-tech/grok-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hpcai-tech/grok-1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hpcai-tech/grok-1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use hpcai-tech/grok-1 with Docker Model Runner:
docker model run hf.co/hpcai-tech/grok-1
Grok-1 (PyTorch Version)
This repository contains the model and weights of the torch version of Grok-1 open-weights model. You could find a complete example code of using the torch-version Grok-1 in ColossalAI GitHub Repository. We also applies parallelism techniques from ColossalAI framework (Tensor Parallelism for now) to accelerate the inference.
You could find the original weights released by xAI in Hugging Face and the original model in the Grok open release GitHub Repository.
Conversion
We translated the original modeling written in JAX into PyTorch version, and converted the weights by mapping tensor files with parameter keys, de-quantizing the tensors with corresponding packed scales, and save to checkpoint file with torch APIs.
A transformers-compatible version of tokenizer is contributed by Xenova and ArthurZ.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_dtype(torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("hpcai-tech/grok-1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"hpcai-tech/grok-1",
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16,
)
model.eval()
text = "Replace this with your text"
input_ids = tokenizer(text, return_tensors="pt").input_ids
input_ids = input_ids.cuda()
attention_mask = torch.ones_like(input_ids)
generate_kwargs = {} # Add any additional args if you want
inputs = {
"input_ids": input_ids,
"attention_mask": attention_mask,
**generate_kwargs,
}
outputs = model.generate(**inputs)
print(outputs)
Note: A multi-GPU machine is required to test the model with the example code (For now, a 8x80G multi-GPU machine is required).
- Downloads last month
- 41,703