sorryhyun/HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0

llama.cpp를 사용해 gguf로 변환했습니다.

from llama_cpp import Llama

llm = Llama(
    model_path="HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0.gguf",
    n_gpu_layers=-1,
    main_gpu=0,
    n_ctx=2048
)

output = llm(
    "재미있는 이야기 하나 만들어줘. 1000자 이상이어야 해. 시작:", # Prompt
    max_tokens=2048,
    echo=True,

)
print(output)

geforce 3070 RTX로 테스트했으며, 성능은 다음과 같습니다.

q8_0, peak: 1.8G
llama_perf_context_print:        load time =     186.23 ms
llama_perf_context_print: prompt eval time =     186.15 ms /    19 tokens (    9.80 ms per token,   102.07 tokens per second)
llama_perf_context_print:        eval time =    3141.36 ms /   536 runs   (    5.86 ms per token,   170.63 tokens per second)
llama_perf_context_print:       total time =    3758.06 ms /   555 tokens

sorryhyun
/

HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0

Model tree for sorryhyun/HyperCLOVAX-SEED-Text-Instruct-1.5B-gguf-q8_0