Text Generation
GGUF
English
Chinese
medical
llama-cpp
gguf-my-repo
imatrix
conversational
hellork commited on
Commit
7afc79e
·
verified ·
1 Parent(s): 8bfef2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -14,6 +14,8 @@ tags:
14
  - gguf-my-repo
15
  ---
16
 
 
 
17
  # hellork/HuatuoGPT-o1-7B-IQ3_XXS-GGUF
18
  This model was converted to GGUF format from [`FreedomIntelligence/HuatuoGPT-o1-7B`](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
19
  Refer to the [original model card](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) for more details on the model.
@@ -25,6 +27,25 @@ Install llama.cpp through brew (works on Mac and Linux)
25
  brew install llama.cpp
26
 
27
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  Invoke the llama.cpp server or the CLI.
29
 
30
  ### CLI:
 
14
  - gguf-my-repo
15
  ---
16
 
17
+ # TESTING...TESTING! The quantization used on this model may reduce quality, but it is hopefully faster, and maybe usable with 4GB VRAM. TESTING...
18
+
19
  # hellork/HuatuoGPT-o1-7B-IQ3_XXS-GGUF
20
  This model was converted to GGUF format from [`FreedomIntelligence/HuatuoGPT-o1-7B`](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
21
  Refer to the [original model card](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) for more details on the model.
 
27
  brew install llama.cpp
28
 
29
  ```
30
+
31
+ # Compile to take advantage of `Nvidia CUDA` hardware:
32
+
33
+ ```bash
34
+ git clone https://github.com/ggerganov/llama.cpp.git
35
+ cd llama*
36
+ # look at docs for other hardware builds or to make sure none of this has changed.
37
+
38
+ cmake -B build -DGGML_CUDA=ON
39
+ CMAKE_ARGS="-DGGML_CUDA=on" cmake --build build --config Release # -j6 (optional: use a number less than the number of cores)
40
+
41
+ # If your version of gcc is > 12 and it gives errors, use conda to install gcc-12 and activate it.
42
+ # Run the above cmake commands again.
43
+ # Then run conda deactivate and re-run the last line once more to link the build outside of conda.
44
+
45
+ # Add the -ngl 33 flag to the commands below to take advantage of all the GPU layers.
46
+ # If it uses too much GPU and crashes, use some lower number.
47
+ ```
48
+
49
  Invoke the llama.cpp server or the CLI.
50
 
51
  ### CLI: