AmtIa commited on
Commit
095bf0c
·
verified ·
1 Parent(s): ec04096

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - cosimoiaia/Loquace-7B-Mistral
4
+ license: apache-2.0
5
+ datasets:
6
+ - cosimoiaia/Loquace-102k
7
+ language:
8
+ - it
9
+ tags:
10
+ - bnb-my-repo
11
+ - Italian
12
+ - Qlora
13
+ - Mistral
14
+ - finetuning
15
+ - Text Generation
16
+ pipeline_tag: text-generation
17
+ ---
18
+ # cosimoiaia/Loquace-7B-Mistral (Quantized)
19
+
20
+ ## Description
21
+ This model is a quantized version of the original model [`cosimoiaia/Loquace-7B-Mistral`](https://huggingface.co/cosimoiaia/Loquace-7B-Mistral).
22
+
23
+ It's quantized using the BitsAndBytes library to 4-bit using the [bnb-my-repo](https://huggingface.co/spaces/bnb-community/bnb-my-repo) space.
24
+
25
+ ## Quantization Details
26
+ - **Quantization Type**: int4
27
+ - **bnb_4bit_quant_type**: nf4
28
+ - **bnb_4bit_use_double_quant**: True
29
+ - **bnb_4bit_compute_dtype**: float16
30
+ - **bnb_4bit_quant_storage**: uint8
31
+
32
+
33
+
34
+ # 📄 Original Model Information
35
+
36
+
37
+ Model Card for Loquace-7B-Mistral [(Versione in Italiano tradotta da Loquace)](https://huggingface.co/cosimoiaia/Loquace-7B-Mistral/blob/main/Readme-ITA.md)
38
+
39
+ # 🇮🇹 Loquace-7B-Mistral v0.1 🇮🇹
40
+
41
+ Loquace is an Italian speaking, instruction finetuned, Large Language model. 🇮🇹
42
+
43
+ Loquace-7B-Mistral's peculiar features:
44
+
45
+ - Is pretty good a following istructions in Italian.
46
+ - Responds well to prompt-engineering.
47
+ - Works well in a RAG (Retrival Augmented Generation) setup.
48
+ - It has been trained on a relatively raw dataset [Loquace-102K](https://huggingface.co/datasets/cosimoiaia/Loquace-102k) using QLoRa and Mistral-7B-Instruct as base.
49
+ - Training took only 4 hours on a 3090, costing a little more than <b>1 euro</b>! On [Genesis Cloud](https://gnsiscld.co/26qhlf) GPU.
50
+ - It is <b><i>Truly Open Source</i></b>: Model, Dataset and Code to replicate the results are completely released.
51
+ - Created in a garage in the south of Italy.
52
+
53
+ The Loquace Italian LLM models are created with the goal of democratizing AI and LLM in the Italian Landscape.
54
+
55
+ <b>No more need for expensive GPU, large funding, Big Corporation or Ivory Tower Institution, just download the code and train on your dataset on your own PC (or a cheap and reliable cloud provider like [Genesis Cloud](https://gnsiscld.co/26qhlf) )</b>
56
+
57
+ ### Fine-tuning Instructions:
58
+ The related code can be found at:
59
+ https://github.com/cosimoiaia/Loquace
60
+
61
+ ### GGUF Version for CPU Inference:
62
+ 8bit quantized Version of Loquace can be found [here](https://huggingface.co/cosimoiaia/Loquace-7B-Mistral-GGUF)
63
+
64
+ Here is an incomplate list of clients and libraries that are known to support GGUF (thanks to [TheBloke](https://huggingface.co/TheBloke) for this list and his awesome work) ):
65
+
66
+ * [llama.cpp](https://github.com/ggerganov/llama.cpp). The source project for GGUF. Offers a CLI and a server option.
67
+ * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
68
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
69
+ * [LM Studio](https://lmstudio.ai/), an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration.
70
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with many interesting and unique features, including a full model library for easy model selection.
71
+ * [Faraday.dev](https://faraday.dev/), an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
72
+ * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
73
+ * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
74
+ * [candle](https://github.com/huggingface/candle), a Rust ML framework with a focus on performance, including GPU support, and ease of use.
75
+
76
+ #### Previous releases of the Loquace family:
77
+
78
+ The Loquace family started in the beginning of 2023 to prove it was possible to fine-tune a Large Language model in a different language, you can find the other members of the family here:
79
+
80
+ - https://huggingface.co/cosimoiaia/Loquace-70m - Based on pythia-70m
81
+ - https://huggingface.co/cosimoiaia/Loquace-410m - Based on pythia-410m
82
+ - https://huggingface.co/cosimoiaia/Loquace-7B - Based on Falcon-7B
83
+ - https://huggingface.co/cosimoiaia/Loquace-12B - Based on pythia-12B
84
+ - https://huggingface.co/cosimoiaia/Loquace-20B - Based on gpt-neox-20B
85
+
86
+
87
+ These model, although not producing good results, are proof that cheap finetuning is actually possible in a short amount of time, with a relaively raw dataset and extremely low resources.
88
+
89
+ ## Inference:
90
+
91
+ ```python
92
+ from transformers import MistralForCausalLM, AutoTokenizer
93
+
94
+
95
+ def generate_prompt(instruction):
96
+ prompt = f"""### Instruction: {instruction}
97
+
98
+ ### Response:
99
+ """
100
+ return prompt
101
+
102
+ model_name = "cosimoiaia/Loquace-7B-Mistral"
103
+
104
+ model = MistralForCausalLM.from_pretrained(
105
+ model_name,
106
+ device_map="auto",
107
+ torch_dtype=torch.bfloat16
108
+ )
109
+
110
+ model.config.use_cache = True
111
+
112
+
113
+ tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token=False)
114
+
115
+ prompt = generate_prompt("Chi era Dante Alighieri?")
116
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
117
+
118
+ outputs = model.generate(**inputs, do_sample = True, num_beams = 2, top_k=50, top_p= 0.95, max_new_tokens=2046, early_stopping = True)
119
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True).split("Response:")[1].strip())
120
+ ```
121
+
122
+ ## Examples:
123
+ ```
124
+ > Pretendi di essere Dante Alighieri e raccontami la tua vita.
125
+ Sono Dante Alighieri, un poeta e scrittore italiano del XIV secolo. Nacqui a Firenze nel 1265 e ho trascorso gran parte della mia vita viaggiando in Europa e scrivendo poesie. Ho anche scritto la Divina Commedia, una delle opere letterarie più famose di tutti i tempi.
126
+
127
+ > Quanto fa 7 x 14 - 98 ?
128
+ 7 x 14 = 98, quindi la risposta è 0.
129
+
130
+ > La radice quadrata di 144 ?
131
+ La radice quadrata di 144 è 12.
132
+
133
+ > Inventa una ricetta pugliese con la polenta.
134
+ La ricetta è chiamata "Polenta al burro di arachidi e formaggio". Per prepararla, inizia bollendo la polenta secondo le istruzioni sulla confezione. Mentre la polenta si cucina, mescola insieme 1/2 tazza di burro di arachidi, 1/4 di tazza di formaggio grattugiato e 1/4 di tazza di parmigiano grattugiato. Una volta che la polenta è cotta, versala su una teglia da forno e distribuire il composto di burro di arachidi e formaggio sopra la polenta. Metti in forno a 350 gradi per 15 minuti o fino a quando la parte superiore è dorata. Servi caldo con un'insalata di pomodoro e basilico fresco.
135
+ ```
136
+
137
+ ## Limitations
138
+
139
+ - Loquace-7B may not handle complex or nuanced queries well and may struggle with ambiguous or poorly formatted inputs.
140
+ - The model may generate responses that are factually incorrect or nonsensical. It should be used with caution, and outputs should be carefully verified.
141
+
142
+ ## Model Author:
143
+ Cosimo Iaia <[email protected]>
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "cosimoiaia/Loquace-7B-Mistral",
3
+ "architectures": [
4
+ "MistralModel"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 14336,
14
+ "max_position_embeddings": 32768,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 32,
18
+ "num_key_value_heads": 8,
19
+ "quantization_config": {
20
+ "_load_in_4bit": true,
21
+ "_load_in_8bit": false,
22
+ "bnb_4bit_compute_dtype": "float16",
23
+ "bnb_4bit_quant_storage": "uint8",
24
+ "bnb_4bit_quant_type": "nf4",
25
+ "bnb_4bit_use_double_quant": true,
26
+ "llm_int8_enable_fp32_cpu_offload": false,
27
+ "llm_int8_has_fp16_weight": false,
28
+ "llm_int8_skip_modules": null,
29
+ "llm_int8_threshold": 6.0,
30
+ "load_in_4bit": true,
31
+ "load_in_8bit": false,
32
+ "quant_method": "bitsandbytes"
33
+ },
34
+ "rms_norm_eps": 1e-05,
35
+ "rope_theta": 10000.0,
36
+ "sliding_window": 4096,
37
+ "tie_word_embeddings": false,
38
+ "torch_dtype": "float16",
39
+ "transformers_version": "4.49.0",
40
+ "use_cache": true,
41
+ "vocab_size": 32000
42
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:681d138abc599e3b6f9c533097bbd7d34e653b6bb8e4a1056b3acc4e51b90540
3
+ size 3863534963
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
34
+ "clean_up_tokenization_spaces": false,
35
+ "eos_token": "</s>",
36
+ "extra_special_tokens": {},
37
+ "legacy": true,
38
+ "model_max_length": 1000000000000000019884624838656,
39
+ "pad_token": null,
40
+ "sp_model_kwargs": {},
41
+ "spaces_between_special_tokens": false,
42
+ "tokenizer_class": "LlamaTokenizer",
43
+ "unk_token": "<unk>",
44
+ "use_default_system_prompt": true
45
+ }