--- language: - en library_name: transformers pipeline_tag: text-generation tags: - cobalt - cobalt-2 - valiant - valiant-labs - qwen - qwen-3 - qwen-3-14b - 14b - math - math-reasoning - math-instruct - reasoning - problem-solving - creative - analytical - expert - rationality - conversational - chat - instruct base_model: Qwen/Qwen3-14B datasets: - zwhe99/DeepMath-103K - sequelbox/Raiden-DeepSeek-R1 license: apache-2.0 --- **[Support our open-source dataset and model releases!](https://huggingface.co/spaces/sequelbox/SupportOpenSource)** Cobalt 2 is a math and general reasoning specialist built on Qwen 3. - Finetuned on high-difficulty problems from [the math-reasoning DeepMath dataset](https://huggingface.co/datasets/zwhe99/DeepMath-103K) generated with Deepseek R1! - Improved [general and creative reasoning](https://huggingface.co/datasets/sequelbox/Raiden-DeepSeek-R1) to supplement problem-solving and general chat performance. - Small model sizes allow running on local desktop and mobile, plus super-fast server inference! Try Esper 3, our full-stack code, architecture, and DevOps assistant: [Qwen3-4B](https://huggingface.co/ValiantLabs/Qwen3-4B-Esper3), [Qwen3-8B](https://huggingface.co/ValiantLabs/Qwen3-8B-Esper3), [Qwen3-14B](https://huggingface.co/ValiantLabs/Qwen3-14B-Esper3) ## Prompting Guide Cobalt 2 uses the [Qwen 3](https://huggingface.co/Qwen/Qwen3-14B) prompt format. Cobalt 2 is a reasoning finetune; **we recommend enable_thinking=True for all chats.** Example inference script to get started: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "ValiantLabs/Qwen3-14B-Cobalt2" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Evaluate the limit using the Central Limit Theorem: \[ \lim_{n\to\infty}p^{n}\sum_{k \geqslant{n(p^{-1}-1)}}^{\infty}\binom{n+k-1}{n-1}(1-p)^{k}. \]" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Switches between thinking and non-thinking modes. Default is True. ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking content try: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) print("content:", content) ``` ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63444f2687964b331809eb55/VCJ8Fmefd8cdVhXSSxJiD.jpeg) Cobalt 2 is created by [Valiant Labs.](http://valiantlabs.ca/) [Check out our HuggingFace page to see Esper 3 and all of our models!](https://huggingface.co/ValiantLabs) We care about open source. For everyone to use.