## Description
This repo contains GGUF format model which is a quantization of the model: https://huggingface.co/ngoantech/Llama-2-7b-vietnamese-20k

# Inference Code Example (Langchain+Python)

```python
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template = """Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Chào Bob.
Bob: Chào bạn. Tôi có thể giúp gì cho bạn?
User: Thủ đô của Việt Nam là thành phố nào?
Bob: Hà Nội là thủ đô của Việt Nam
User: {question}"""

# template = """<<SYS>>\nYou are a helpful assistant. Bạn là một trợ lí hữu ích.\n<</SYS>>\n\n[INST] {question} [/INST] """

# template = """[INST] <<SYS>>
# You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
# <</SYS>>

# {question} [/INST]
# """

prompt = PromptTemplate(template=template, input_variables=["question"])

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/path/to/model/gguf-model-q4_0.bin",
    temperature=0.1,
    max_tokens=1024,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True, # Verbose is required to pass to the callback manager
)

llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Quốc khánh của Việt Nam diễn ra vào ngày nào?"
print(prompt.format(question=question))
llm_chain.run(question)
```

# Inference Code Example (Llama.cpp)

```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp/ && make -j
./main -m /path/to/model/gguf-model-q4_0.bin --temp 0.1 -t 8 -n 1024 --color -p "VNG Corporation là công ty công nghệ hàng đầu "
./main -m /path/to/model/gguf-model-q4_0.bin --temp 0.1 -t 8 -n 1024 --color -r "User:" -f /path/to/chat/prompt/chat.txt
```
---
license: apache-2.0
---