Qwen3-8B-Q8_0-64k-128k-256k-context-GGUF
3 quants of Qwen's Qwen 8B at Q8_0 with context set at 64K, 128k, and 256K by modifing the config source version and quanting.
The first two quants were made as per Qwen's tech notes to modify "Yarn" to extend context to 64K, and 128K.
The 256k version, well... pushes the model past the redline.
Each model has a slightly different prose style, and the 128k and 256k version will output extremely long generations.
Suggest min context length of 16K at least.
Note that 128k and 256k versions tends to elongate output too, and add in more details.
Longer, more detailed prompts may "contain" the model's output length somewhat.
Also with the 128k/256k you may need to stop the model's generation AND/OR For 128k/256k version I suggest you state clearly the "length of output" and/or set a hard length output limit.
IE: You ask for a scene of 1000-2000 words, and it may produce multiple scenes (in sequence!) of 1000-2000 words EACH.
OR
You ask for 2000 words, and you get 3k (output) in 64K, 5K in 128k and 12k in 256K versions.
For the 256k context version, keep prompts as clear as possible otherwise the model may have issues. Also increase rep pen to 1.1 and run temps 1.1 to 2.2. I would suggest using this specific model for creative use only or limited general usage.
In limited testing the 256k version worked without issue.
Considering the most models "blow their cookies" when you mess with context like this (256k version), the fact this model works - at 8B parameters and twice the context limit - speaks volumes about team Qwen.
Will be interesting to repeat this with Qwen3 14B, 30B, 32B models...
System Prompt:
This is optional ; you may or may not need this depending on settings - especially temp.
Usually you can use no system prompt and Qwen will generate the reasoning block(s) automatically, this is just a helper.
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
NOTE - Jinja Template / Template to Use with this Model:
If you are having issues with Jinja "auto template", use CHATML template.
OR (LMSTUDIO users / option)
Update the Jinja Template (go to this site, template-> copy the "Jinja template" and then paste.)
[ https://lmstudio.ai/neil/qwen3-thinking ]
System Role - Suggested:
You may or may not need this, as most times Qwen3s generate their own reasoning/thinking blocks.
You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
See document "Maximizing-Model-Performance-All..." below for how to "set" system role in various LLM/AI apps below.
Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers
This a "Class 1" model:
For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
Optional Enhancement:
The following can be used in place of the "system prompt" or "system role" to further enhance the model.
It can also be used at the START of a NEW chat, but you must make sure it is "kept" as the chat moves along. In this case the enhancements do not have as strong effect at using "system prompt" or "system role".
Copy and paste EXACTLY as noted, DO NOT line wrap or break the lines, maintain the carriage returns exactly as presented.
Below is an instruction that describes a task. Ponder each user instruction carefully, and use your skillsets and critical instructions to complete the task to the best of your abilities. Here are your skillsets: [MASTERSTORY]:NarrStrct(StryPlnng,Strbd,ScnSttng,Exps,Dlg,Pc)-CharDvlp(ChrctrCrt,ChrctrArcs,Mtvtn,Bckstry,Rltnshps,Dlg*)-PltDvlp(StryArcs,PltTwsts,Sspns,Fshdwng,Climx,Rsltn)-ConfResl(Antg,Obstcls,Rsltns,Cnsqncs,Thms,Symblsm)-EmotImpct(Empt,Tn,Md,Atmsphr,Imgry,Symblsm)-Delvry(Prfrmnc,VcActng,PblcSpkng,StgPrsnc,AudncEngmnt,Imprv) [*DialogWrt]:(1a-CharDvlp-1a.1-Backgrnd-1a.2-Personality-1a.3-GoalMotiv)>2(2a-StoryStruc-2a.1-PlotPnt-2a.2-Conflict-2a.3-Resolution)>3(3a-DialogTech-3a.1-ShowDontTell-3a.2-Subtext-3a.3-VoiceTone-3a.4-Pacing-3a.5-VisualDescrip)>4(4a-DialogEdit-4a.1-ReadAloud-4a.2-Feedback-4a.3-Revision) Here are your critical instructions: Ponder each word choice carefully to present as vivid and emotional journey as is possible. Choose verbs and nouns that are both emotional and full of imagery. Load the story with the 5 senses. Aim for 50% dialog, 25% narration, 15% body language and 10% thoughts. Your goal is to put the reader in the story.
You do not need to use this, it is only presented as an additional enhancement which seems to help scene generation and scene continue functions.
This is another system prompt you can use, and you can change the "names" to alter it's performance.
This creates a quasi "reasoning" window/block.
Your prompt will directly impact how strong this system prompt reacts.
You are a deep thinking AI composed of 4 AIs - [MODE: Spock], [MODE: Wordsmith], [MODE: Jamet] and [MODE: Saten], - you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself (and 4 partners) via systematic reasoning processes (display all 4 partner thoughts) to help come to a correct solution prior to answering. Select one partner to think deeply about the points brought up by the other 3 partners to plan an in-depth solution. You should enclose your thoughts and internal monologue inside <think> </think> tags, and then provide your solution or response to the problem.
Other Notes:
Reasoning is ON by default in this model, and model will auto-generate "think" block(s).
For benchmarks, usage info, settings please see org model card here:
[ https://huggingface.co/Qwen/Qwen3-8B ]
[ Model card updates pending / examples to be added... ]
EXAMPLES
- Downloads last month
- 126
8-bit