Running the example

#23

by Tomas245 - opened 19 days ago

19 days ago

I was able to run this. add use_cache=False to model.generate and also, if you have problem with attention_chunk_size, add it into config before initialization:

model_id = "meta-llama/Llama-Guard-4-12B"
config = AutoConfig.from_pretrained(model_id)
config.text_config.attention_chunk_size = 8192
model = Llama4ForConditionalGeneration.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    config=config,
)

Source = https://github.com/llamastack/llama-stack/issues/2871

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment