## Description This repo contains GGUF format model which is a quantization of the model: https://huggingface.co/ngoantech/Llama-2-7b-vietnamese-20k # Inference Code Example (Langchain+Python) ```python from langchain.llms import LlamaCpp from langchain.prompts import PromptTemplate from langchain.chains import LLMChain from langchain.callbacks.manager import CallbackManager from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler template = """Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. User: Chào Bob. Bob: Chào bạn. Tôi có thể giúp gì cho bạn? User: Thủ đô của Việt Nam là thành phố nào? Bob: Hà Nội là thủ đô của Việt Nam User: {question}""" # template = """<>\nYou are a helpful assistant. Bạn là một trợ lí hữu ích.\n<>\n\n[INST] {question} [/INST] """ # template = """[INST] <> # You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. # <> # {question} [/INST] # """ prompt = PromptTemplate(template=template, input_variables=["question"]) # Callbacks support token-wise streaming callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) # Make sure the model path is correct for your system! llm = LlamaCpp( model_path="/path/to/model/gguf-model-q4_0.bin", temperature=0.1, max_tokens=1024, top_p=1, callback_manager=callback_manager, verbose=True, # Verbose is required to pass to the callback manager ) llm_chain = LLMChain(prompt=prompt, llm=llm) question = "Quốc khánh của Việt Nam diễn ra vào ngày nào?" print(prompt.format(question=question)) llm_chain.run(question) ``` # Inference Code Example (Llama.cpp) ```bash git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp/ && make -j ./main -m /path/to/model/gguf-model-q4_0.bin --temp 0.1 -t 8 -n 1024 --color -p "VNG Corporation là công ty công nghệ hàng đầu " ./main -m /path/to/model/gguf-model-q4_0.bin --temp 0.1 -t 8 -n 1024 --color -r "User:" -f /path/to/chat/prompt/chat.txt ``` --- license: apache-2.0 ---