hellork
/

HuatuoGPT-o1-7B-IQ3_XXS-GGUF

Text Generation

Model card Files Files and versions Community

hellork commited on Feb 2

Commit

7afc79e

·

verified ·

1 Parent(s): 8bfef2d

Update README.md

Files changed (1) hide show

README.md +21 -0

README.md CHANGED Viewed

@@ -14,6 +14,8 @@ tags:
 - gguf-my-repo
 ---
 # hellork/HuatuoGPT-o1-7B-IQ3_XXS-GGUF
 This model was converted to GGUF format from [`FreedomIntelligence/HuatuoGPT-o1-7B`](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) for more details on the model.
@@ -25,6 +27,25 @@ Install llama.cpp through brew (works on Mac and Linux)
 brew install llama.cpp
 ```
 Invoke the llama.cpp server or the CLI.
 ### CLI:

 - gguf-my-repo
 ---
+# TESTING...TESTING! The quantization used on this model may reduce quality, but it is hopefully faster, and maybe usable with 4GB VRAM. TESTING...
 # hellork/HuatuoGPT-o1-7B-IQ3_XXS-GGUF
 This model was converted to GGUF format from [`FreedomIntelligence/HuatuoGPT-o1-7B`](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) for more details on the model.
 brew install llama.cpp
 ```
+# Compile to take advantage of `Nvidia CUDA` hardware:
+```bash
+git clone https://github.com/ggerganov/llama.cpp.git
+cd llama*
+# look at docs for other hardware builds or to make sure none of this has changed.
+cmake -B build -DGGML_CUDA=ON
+CMAKE_ARGS="-DGGML_CUDA=on" cmake --build build --config Release # -j6 (optional: use a number less than the number of cores)
+# If your version of gcc is > 12 and it gives errors, use conda to install gcc-12 and activate it.
+# Run the above cmake commands again.
+# Then run conda deactivate and re-run the last line once more to link the build outside of conda.
+# Add the -ngl 33 flag to the commands below to take advantage of all the GPU layers.
+# If it uses too much GPU and crashes, use some lower number.
+```
 Invoke the llama.cpp server or the CLI.
 ### CLI: