Spestly commited on
Commit
bfdc9cd
·
verified ·
1 Parent(s): 0191eba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -78
README.md CHANGED
@@ -122,131 +122,139 @@ language:
122
  - sw
123
 
124
  ---
125
- ![Header](./Nous-V1-Banner.png)
126
- # Nous-V1 8B
127
 
128
- ## Overview
 
 
129
 
130
- **Nous-V1 2B** is a cutting-edge 8 billion parameter language model developed by Apexion AI, based on the architecture of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B). Designed for versatility across diverse NLP tasks, Nous-V1 4B delivers strong performance in conversational AI, knowledge reasoning, code generation, and content creation.
 
131
 
132
- **Key Features:**
133
-
134
- - **⚡ Efficient 2B Parameter Scale:** Balances model capability with practical deployment on modern hardware
135
- - **🧠 Enhanced Contextual Understanding:** Supports an 128k token context window, enabling complex multi-turn conversations and document analysis
136
- - **🌐 Multilingual & Multi-domain:** Trained on a diverse dataset for broad language and domain coverage
137
- - **🤖 Instruction-Following & Adaptability:** Fine-tuned to respond accurately and adaptively across tasks
138
- - **🚀 Optimized Inference:** Suitable for GPU environments such as NVIDIA A100, T4, and P100 for low-latency applications
139
 
140
  ---
141
 
142
- ## Why Choose Nous-V1 2B?
143
-
144
- While larger models can offer more raw power, Nous-V1 2B strikes a practical balance — optimized for deployment efficiency without significant compromise on language understanding or generation quality. It’s ideal for applications requiring:
145
 
146
- - Real-time conversational agents
147
- - Code completion and programming assistance
148
- - Content generation and summarization
149
- - Multilingual natural language understanding
 
 
 
 
 
 
 
150
 
151
  ---
152
 
153
- ## 🖥️ How to Run Locally
 
 
 
 
 
 
 
 
154
 
155
- You can easily integrate Nous-V1 2B via the Hugging Face Transformers library or deploy it on popular serving platforms.
156
 
157
- ### Using Hugging Face Transformers
158
 
159
  ```python
160
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
161
 
162
- model_name = "apexion-ai/Nous-1-2B"
163
 
164
- # load the tokenizer and the model
165
- tokenizer = AutoTokenizer.from_pretrained(model_name)
166
  model = AutoModelForCausalLM.from_pretrained(
167
- model_name,
168
- torch_dtype="auto",
169
- device_map="auto"
 
170
  )
171
 
172
- # prepare the model input
173
- prompt = "Give me a short introduction to large language model."
174
  messages = [
175
- {"role": "user", "content": prompt}
 
176
  ]
177
- text = tokenizer.apply_chat_template(
178
- messages,
179
- tokenize=False,
180
- add_generation_prompt=True,
181
- enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
182
- )
183
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
184
 
185
- # conduct text completion
186
- generated_ids = model.generate(
187
- **model_inputs,
188
- max_new_tokens=32768
189
- )
190
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
191
 
192
- # parsing thinking content
193
- try:
194
- # rindex finding 151668 (</think>)
195
- index = len(output_ids) - output_ids[::-1].index(151668)
196
- except ValueError:
197
- index = 0
198
 
199
- thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
200
- content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
 
201
 
202
- print("thinking content:", thinking_content)
203
- print("content:", content)
204
 
205
- ```
206
 
207
- ### Deployment Options
208
 
209
- - Compatible with [vLLM](https://github.com/vllm-project/vllm) for efficient serving
210
- - Works with [llama.cpp](https://github.com/ggerganov/llama.cpp) for lightweight inference
 
 
 
 
211
 
212
  ---
213
 
214
- ## Recommended Sampling Parameters
215
 
216
- ```yaml
217
- Temperature: 0.7
218
- Top-p: 0.9
219
- Top-k: 40
220
- Min-p: 0.0
221
- ```
222
 
223
  ---
224
 
225
- ## FAQ
226
 
227
- - **Q:** Can I fine-tune Nous-V1 2B on my custom data?
228
- **A:** Yes, the model supports fine-tuning workflows via Hugging Face Trainer or custom scripts.
 
 
229
 
230
- - **Q:** What hardware is recommended?
231
- **A:** NVIDIA GPUs with at least 16GB VRAM (e.g., A100, 3090) are optimal for inference and fine-tuning.
232
 
233
- - **Q:** Is the model safe to use for production?
234
- **A:** Nous-V1 2B includes safety mitigations but should be used with human oversight and proper filtering for sensitive content.
235
 
 
 
 
236
 
237
  ---
238
 
239
- ## 📄 Citation
 
 
240
 
241
  ```bibtex
242
- @misc{apexion2025nousv14b,
243
- title={Nous-V1 2B: Efficient Large Language Model for Versatile NLP Applications},
244
- author={Apexion AI Team},
245
  year={2025},
246
- url={https://huggingface.co/apexion-ai/Nous-V1-2B}
247
  }
248
  ```
249
 
250
  ---
251
 
252
- *Nous-V1 2B — Powering practical AI applications with intelligent language understanding.*
 
 
 
 
 
 
 
122
  - sw
123
 
124
  ---
125
+ # Apollo-1-2B
 
126
 
127
+ [![Model](https://img.shields.io/badge/Model-Apollo--1--2B-blue)](https://huggingface.co/NoemaResearch/Apollo-1-2B)
128
+ [![Base](https://img.shields.io/badge/Base-Qwen3--1.7B-green)](https://huggingface.co/Qwen/Qwen3-1.7B)
129
+ [![License](https://img.shields.io/badge/License-Apache_2.0-yellow)](LICENSE)
130
 
131
+ Apollo-1-2B is a **2 billion parameter instruction-tuned model** developed by **Noema Research**.
132
+ It is based on [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) and optimized for **general reasoning, language understanding, and lightweight deployment**.
133
 
134
+ This model is the first release in the **Apollo series**, intended as a foundation for scalable experimentation and real-world applications in constrained environments.
 
 
 
 
 
 
135
 
136
  ---
137
 
138
+ ## Model Overview
 
 
139
 
140
+ - **Base model:** `Qwen3-1.7B`
141
+ - **Architecture:** Decoder-only transformer
142
+ - **Parameters:** ~2B
143
+ - **Context length:** up to 32k tokens (inherits Qwen3 long-context support)
144
+ - **Domain:** General-purpose reasoning and instruction following
145
+ - **Primary applications:**
146
+ - Conversational AI
147
+ - Lightweight reasoning tasks
148
+ - Education and tutoring
149
+ - Prototype agents and assistants
150
+ - **License:** anvdl-1.0
151
 
152
  ---
153
 
154
+ ## Key Features
155
+
156
+ - **Instruction tuned**: More reliable responses in conversational and task-oriented settings
157
+ - **Lightweight deployment**: Optimized for environments with limited compute or memory resources
158
+ - **Extended context**: Inherits long-context capability from Qwen3 base
159
+ - **Balanced outputs**: Improved refusal behaviors and reduced hallucinations compared to the base model
160
+ - **Multilingual ability**: Retains multilingual knowledge from Qwen3 family
161
+
162
+ ---
163
 
164
+ ## Usage
165
 
166
+ The model is available in Hugging Face Transformers format. Example:
167
 
168
  ```python
169
+ from transformers import AutoTokenizer, AutoModelForCausalLM
170
+ import torch
171
 
172
+ model_id = "NoemaResearch/Apollo-1-2B"
173
 
174
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 
175
  model = AutoModelForCausalLM.from_pretrained(
176
+ model_id,
177
+ torch_dtype=torch.bfloat16,
178
+ device_map="auto",
179
+ trust_remote_code=True
180
  )
181
 
 
 
182
  messages = [
183
+ {"role":"system", "content":"You are Apollo, a reasoning assistant."},
184
+ {"role":"user", "content":"Explain the difference between supervised and unsupervised learning."}
185
  ]
 
 
 
 
 
 
 
186
 
187
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
188
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
189
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
190
+ ````
 
 
191
 
192
+ **Recommended settings:**
 
 
 
 
 
193
 
194
+ * `temperature=0.5–0.9`
195
+ * `top_p=0.85–0.95`
196
+ * For structured outputs (e.g. JSON), use lower temperatures for stability
197
 
198
+ ---
 
199
 
200
+ ## Evaluation
201
 
202
+ Apollo-1-2B has been evaluated internally on a range of reasoning and language tasks. Key findings:
203
 
204
+ * Improved **instruction following** relative to Qwen3-1.7B
205
+ * More **concise and accurate responses** in structured tasks
206
+ * Maintains **multilingual performance** from the base model
207
+ * Effective for **lightweight assistant applications**
208
+
209
+ Future work will include publishing comprehensive benchmark comparisons against other models in the 1–3B parameter range.
210
 
211
  ---
212
 
213
+ ## Limitations
214
 
215
+ * **Reasoning depth**: As a 2B parameter model, Apollo cannot match larger-scale LLMs on complex reasoning tasks
216
+ * **Knowledge coverage**: May lack depth in specialized or low-resource domains
217
+ * **Hallucinations**: Although reduced, the model may still generate incorrect or fabricated information
218
+ * **Sensitivity to prompts**: Outputs vary with prompt phrasing; careful prompt design recommended
 
 
219
 
220
  ---
221
 
222
+ ## Responsible Use
223
 
224
+ * Do not rely on Apollo for critical decision-making without human oversight
225
+ * Generated outputs may contain inaccuracies; verification is required for factual or sensitive use cases
226
+ * Avoid providing personal, private, or sensitive information in prompts
227
+ * This model should not be used to generate disallowed, unsafe, or harmful content
228
 
229
+ ---
 
230
 
231
+ ## Model Variants
 
232
 
233
+ * **Full precision (safetensors)** — research and full-fidelity inference
234
+ * **bf16 / fp16** — optimized for inference on GPUs/TPUs
235
+ * **Quantized versions (int8 / int4)** — for deployment in constrained hardware environments
236
 
237
  ---
238
 
239
+ ## Citation
240
+
241
+ If you use this model, please cite both Apollo-1-2B and the Qwen3 base model:
242
 
243
  ```bibtex
244
+ @misc{noema2025apollo,
245
+ title={Apollo-1-2B},
246
+ author={Noema Research},
247
  year={2025},
248
+ howpublished={\url{https://huggingface.co/NoemaResearch/Apollo-1-2B}}
249
  }
250
  ```
251
 
252
  ---
253
 
254
+ ## Acknowledgements
255
+
256
+ Apollo-1-2B builds upon the [Qwen3](https://huggingface.co/Qwen) series of models.
257
+ We thank the Qwen team for making their work openly available under permissive terms, which enabled this derivative research.
258
+
259
+ ---
260
+