Q4_K_M static quant of deepseek-ai/DeepSeek-V3.1-Base

Using llama.cpp release b6182 for quantization.

Original model: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base

Uploading this since I'm using it to calculate imatrix, figured might as well provide it in the meantime

Remember, this is a BASE model, so it likely will not chat properly unless you give it multiple turns of examples, for instance I've had success with:

./llama-cli -m /models/deepseek-ai_DeepSeek-V3.1-Base-Q4_K_M-00001-of-00011.gguf -p "You are a helpful assistant.<User>Hello, who are you?<Assistant>I am DeepSeek, a helpful AI assistant.<User>How are you today?<Assistant>I'm doing well! Is there anything I can assist you with?<User>Can you explain the laws of thermodynamics?<Assistant>" -no-cnv -ngl 0 --reverse-prompt "<User>"

Prompt for easier viewing:

You are a helpful assistant.<User>Hello, who are you?<Assistant>I am DeepSeek, a helpful AI assistant.<User>How are you today?<Assistant>I'm doing well! Is there anything I can assist you with?<User>Can you explain the laws of thermodynamics?<Assistant>" -no-cnv -ngl 0 --reverse-prompt "<User>"

Yes, I am using <User> and <Assistant> as opposed to the special tokens <|User|> and <|Assistant|>, for some reason this seems to be more stable?

This resulted in a completely coherent reply:

Sure, here's a brief explanation of the laws of thermodynamics: 1. Zeroth Law of Thermodynamics: If two thermodynamic systems are each in thermal equilibrium with a third system, then they are in thermal equilibrium with each other. 2. First Law of Thermodynamics: The total energy of an isolated system is constant; energy can be transformed from one form to another, but cannot be created or destroyed. 3. Second Law of Thermodynamics: The entropy of an isolated system not in equilibrium will tend to increase over time, approaching a maximum value at equilibrium. 4. Third Law of Thermodynamics: As the temperature of a system approaches absolute zero, the entropy of the system approaches a minimum value. Would you like more details on any of these laws?

The idea is that you need to teach the base model what a conversation looks like first, base models aren't usually capable of one-shotting a conversation since it hasn't been tuned to understand roles.

382G total size

Downloads last month
987
GGUF
Model size
671B params
Architecture
deepseek2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bartowski/deepseek-ai_DeepSeek-V3.1-Base-Q4_K_M-GGUF

Quantized
(7)
this model