manbeast3b commited on
Commit
3f56b21
·
verified ·
1 Parent(s): 1e57fa0

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ models/lm/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ models/v10/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ spk_001.wav filter=lfs diff=lfs merge=lfs -text
39
+ assistant_female_voice.wav filter=lfs diff=lfs merge=lfs -text
40
+ pytransform/_pytransform.so filter=lfs diff=lfs merge=lfs -text
41
+ pyarmor_runtime_000000/pyarmor_runtime.so filter=lfs diff=lfs merge=lfs -text
Dockerfile ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04
2
+
3
+ # Set environment variables
4
+ ENV PYTHONUNBUFFERED=1 \
5
+ DEBIAN_FRONTEND=noninteractive \
6
+ CUDA_HOME=/usr/local/cuda \
7
+ PATH=/usr/local/cuda/bin:$PATH \
8
+ LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH \
9
+ NVIDIA_VISIBLE_DEVICES=all \
10
+ NVIDIA_DRIVER_CAPABILITIES=compute,utility \
11
+ HF_HOME=/app/models \
12
+ TRITON_CACHE_DIR=/tmp/triton_cache \
13
+ XDG_CACHE_HOME=/tmp \
14
+ NUMBA_CACHE_DIR=/tmp/numba_cache
15
+
16
+ # Install system dependencies
17
+ RUN apt-get update && apt-get install -y --no-install-recommends \
18
+ python3 \
19
+ python3-pip \
20
+ python3-dev \
21
+ build-essential \
22
+ git \
23
+ git-lfs \
24
+ ffmpeg \
25
+ libsndfile1 \
26
+ curl \
27
+ && rm -rf /var/lib/apt/lists/*
28
+
29
+ # Initialize Git LFS
30
+ RUN git lfs install
31
+ # Upgrade pip and install build tools
32
+ RUN python3 -m pip install --upgrade pip setuptools wheel uv
33
+
34
+ WORKDIR /app
35
+
36
+ # Create Numba cache directory
37
+ RUN mkdir -p /tmp/numba_cache /tmp/triton_cache && \
38
+ chown nobody:nogroup /tmp/numba_cache /tmp/triton_cache && \
39
+ chmod 700 /tmp/numba_cache /tmp/triton_cache
40
+
41
+ COPY requirements.txt .
42
+
43
+ # Install other requirements
44
+ RUN python3 -m uv pip install --no-cache-dir -r requirements.txt --prerelease=allow
45
+
46
+ COPY . .
47
+
48
+ EXPOSE 8000
49
+
50
+ CMD ["python3", "server.py"]
README.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - any-to-any
5
+ - omega
6
+ - omegalabs
7
+ - bittensor
8
+ - agi
9
+ ---
10
+
11
+ This is an Any-to-Any model checkpoint for the OMEGA Labs x Bittensor Any-to-Any subnet.
12
+
13
+ Check out the [git repo](https://github.com/omegalabsinc/omegalabs-anytoany-bittensor) and find OMEGA on X: [@omegalabsai](https://x.com/omegalabsai).
assistant_female_voice.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d712ba6de1d15d52eda96bdc043ce43eb5af4b4ac441b78b6fb0fdaf6683c7a
3
+ size 235244
attention_mask_research.md ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Attention Masks and Pad Tokens in Transformer Generation: Research Questions
2
+
3
+ ## Core Problem Statement
4
+
5
+ When running transformer models (specifically Llama-3.2-1B-Instruct) for text generation, we encounter warnings about missing attention masks and pad tokens, even for single input sequences. This leads to inconsistent generation outputs despite identical inputs.
6
+
7
+ ### Warning Messages Observed
8
+ ```
9
+ The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
10
+ Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
11
+ The attention mask is not set and cannot be inferred from input because pad token is same as eos token.
12
+ ```
13
+
14
+ ## Key Research Questions
15
+
16
+ ### 1. Why do single inputs require attention masks?
17
+ **Initial Assumption**: Single sequences without padding shouldn't need attention masks.
18
+ **Observed Reality**: Even single inputs show different generation outputs when attention masks are missing.
19
+
20
+ ### 2. What is the relationship between pad tokens and attention masks?
21
+ **Question**: How do pad_token_id and attention_mask work together in the generation process?
22
+
23
+ ### 3. Why does pad_token_id = eos_token_id cause issues?
24
+ **Specific Issue**: When padding token equals end-of-sequence token, what ambiguity does this create?
25
+
26
+ ## Code Analysis
27
+
28
+ ### Current Implementation (Problematic)
29
+ ```python
30
+ def chat_current(system_prompt: str, user_prompt: str) -> str:
31
+ messages = [
32
+ {"role": "system", "content": system_prompt},
33
+ {"role": "user", "content": user_prompt},
34
+ ]
35
+
36
+ # Only returns input_ids tensor
37
+ input_ids = tok.apply_chat_template(
38
+ messages,
39
+ add_generation_prompt=True,
40
+ return_tensors="pt"
41
+ ).to(lm.device)
42
+
43
+ with torch.inference_mode():
44
+ output_ids = lm.generate(
45
+ input_ids, # Missing: attention_mask, pad_token_id
46
+ max_new_tokens=2048,
47
+ do_sample=True,
48
+ temperature=0.2,
49
+ repetition_penalty=1.1,
50
+ top_k=100,
51
+ top_p=0.95,
52
+ )
53
+
54
+ return tok.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
55
+ ```
56
+
57
+ ### Fixed Implementation
58
+ ```python
59
+ def chat_fixed(system_prompt: str, user_prompt: str) -> str:
60
+ messages = [
61
+ {"role": "system", "content": system_prompt},
62
+ {"role": "user", "content": user_prompt},
63
+ ]
64
+
65
+ # Returns dictionary with input_ids AND attention_mask
66
+ inputs = tok.apply_chat_template(
67
+ messages,
68
+ add_generation_prompt=True,
69
+ return_tensors="pt",
70
+ return_dict=True # KEY CHANGE: Get both components
71
+ )
72
+
73
+ input_ids = inputs["input_ids"].to(lm.device)
74
+ attention_mask = inputs["attention_mask"].to(lm.device)
75
+
76
+ with torch.inference_mode():
77
+ output_ids = lm.generate(
78
+ input_ids=input_ids,
79
+ attention_mask=attention_mask, # Explicit attention guidance
80
+ pad_token_id=tok.eos_token_id, # Explicit pad token
81
+ max_new_tokens=2048,
82
+ do_sample=True,
83
+ temperature=0.2,
84
+ repetition_penalty=1.1,
85
+ top_k=100,
86
+ top_p=0.95,
87
+ )
88
+
89
+ return tok.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
90
+ ```
91
+
92
+ ### Model and Tokenizer Setup
93
+ ```python
94
+ model_name = "models/Llama-3.2-1B-Instruct"
95
+ tok = AutoTokenizer.from_pretrained(model_name)
96
+ # Critical: Set pad token if not available
97
+ if tok.pad_token is None:
98
+ tok.pad_token = tok.eos_token
99
+
100
+ lm = AutoModelForCausalLM.from_pretrained(
101
+ model_name,
102
+ torch_dtype=torch.bfloat16,
103
+ device_map="cuda",
104
+ ).eval()
105
+ ```
106
+
107
+ ## Observed Behavioral Differences
108
+
109
+ ### Input Structure Analysis
110
+ ```python
111
+ # Single input contains multiple components:
112
+ messages = [
113
+ {"role": "system", "content": "You are a helpful assistant..."},
114
+ {"role": "user", "content": "What is the capital of France?"},
115
+ ]
116
+
117
+ # After apply_chat_template, becomes token sequence:
118
+ # [system_tokens, user_tokens, assistant_start_token]
119
+ ```
120
+
121
+ ## Technical Hypotheses for Investigation
122
+
123
+ ### Hypothesis 1: Internal Masking Ambiguity
124
+ When attention_mask is missing, the model cannot distinguish between:
125
+ - Real input tokens that should influence generation
126
+ - Structural tokens (system prompts, role markers)
127
+ - Token boundaries between different message roles
128
+
129
+ ### Hypothesis 2: EOS Token Dual Purpose Confusion
130
+ When `pad_token_id == eos_token_id`, the model faces ambiguity:
131
+ ```python
132
+ # Same token (128001) serves dual purposes:
133
+ # 1. End of sequence marker
134
+ # 2. Padding token for batch processing
135
+ # Model cannot infer which purpose applies in context
136
+ ```
137
+
138
+ ### Hypothesis 3: Autoregressive Generation Context Boundary Issues
139
+ During generation, model needs to know:
140
+ - Which input tokens provide valid context for next token prediction
141
+ - Where the "prompt" ends and "generation" begins
142
+ - How to weight attention across different input components
143
+
144
+ ## Research Objectives
145
+
146
+ ### Primary Questions
147
+ 1. **Mechanism Analysis**: How exactly does missing attention_mask affect the internal attention computation?
148
+ 2. **Consistency Impact**: Why do identical inputs produce different outputs without proper masking?
149
+ 3. **Single vs Batch Behavior**: What differences exist between single sequence and batched sequence processing?
150
+
151
+ ### Secondary Questions
152
+ 1. **Model-Specific Behavior**: Do different transformer architectures handle missing attention masks differently?
153
+ 2. **Generation Parameter Interaction**: How do attention mask issues interact with sampling parameters (temperature, top_p, etc.)?
154
+ 3. **Performance Impact**: What computational overhead does proper attention masking add?
155
+
156
+ ## Key Technical Areas for Deep Research
157
+
158
+ ### Attention Mechanism Internals
159
+ - How attention weights are computed with/without explicit masks
160
+ - Impact on multi-head attention distributions
161
+ - Interaction with causal masking in autoregressive models
162
+
163
+ ### Tokenizer Behavior
164
+ - How `apply_chat_template` constructs input sequences
165
+ - Default attention mask generation behavior
166
+ - Role of special tokens in attention computation
167
+
168
+ ### Generation Process
169
+ - How `model.generate()` handles missing parameters
170
+ - Internal assumptions and fallback behaviors
171
+ - Impact on sampling and beam search algorithms
172
+
173
+ ## Expected Research Outcomes
174
+
175
+ Understanding of:
176
+ 1. Exact mechanism causing output inconsistency
177
+ 2. Best practices for single sequence generation
178
+ 3. Relationship between attention masking and generation quality
179
+ 4. Guidelines for production transformer deployment
180
+
181
+ ## References for Deep Research
182
+
183
+ - Hugging Face Transformers documentation on attention masks
184
+ - Technical blogs on transformer attention mechanisms (2024)
185
+ - Community discussions on pad token vs attention mask differences
186
+ - Official model documentation for Llama architecture attention handling
compare_generation.py ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ import torch
4
+ from transformers import AutoModelForCausalLM, AutoTokenizer
5
+
6
+ # Load model and tokenizer (same as server.py)
7
+ model_name = "models/Llama-3.2-1B-Instruct"
8
+ tok = AutoTokenizer.from_pretrained(model_name)
9
+ lm = AutoModelForCausalLM.from_pretrained(
10
+ model_name,
11
+ torch_dtype=torch.bfloat16,
12
+ device_map="cuda",
13
+ ).eval()
14
+
15
+ def chat_current(system_prompt: str, user_prompt: str) -> str:
16
+ """
17
+ Current implementation (same as server.py) - will show warnings
18
+ """
19
+ print("🔴 Running CURRENT implementation (with warnings)...")
20
+
21
+ messages = [
22
+ {"role": "system", "content": system_prompt},
23
+ {"role": "user", "content": user_prompt},
24
+ ]
25
+
26
+ input_ids = tok.apply_chat_template(
27
+ messages,
28
+ add_generation_prompt=True,
29
+ return_tensors="pt"
30
+ ).to(lm.device)
31
+
32
+ with torch.inference_mode():
33
+ output_ids = lm.generate(
34
+ input_ids, # No attention_mask, no pad_token_id
35
+ max_new_tokens=2048,
36
+ do_sample=True,
37
+ temperature=0.2,
38
+ repetition_penalty=1.1,
39
+ top_k=100,
40
+ top_p=0.95,
41
+ )
42
+
43
+ answer = tok.decode(
44
+ output_ids[0][input_ids.shape[-1]:],
45
+ skip_special_tokens=True,
46
+ clean_up_tokenization_spaces=True,
47
+ )
48
+ return answer.strip()
49
+
50
+
51
+ def chat_fixed(system_prompt: str, user_prompt: str) -> str:
52
+ """
53
+ Fixed implementation - proper attention mask and pad token
54
+ """
55
+ print("🟢 Running FIXED implementation (no warnings)...")
56
+
57
+ messages = [
58
+ {"role": "system", "content": system_prompt},
59
+ {"role": "user", "content": user_prompt},
60
+ ]
61
+
62
+ # Get both input_ids and attention_mask
63
+ inputs = tok.apply_chat_template(
64
+ messages,
65
+ add_generation_prompt=True,
66
+ return_tensors="pt",
67
+ return_dict=True # Returns dict with input_ids and attention_mask
68
+ )
69
+
70
+ # Move to device
71
+ input_ids = inputs["input_ids"].to(lm.device)
72
+ attention_mask = inputs["attention_mask"].to(lm.device)
73
+
74
+ with torch.inference_mode():
75
+ output_ids = lm.generate(
76
+ input_ids=input_ids,
77
+ attention_mask=attention_mask, # Proper attention mask
78
+ pad_token_id=tok.eos_token_id, # Explicit pad token
79
+ max_new_tokens=2048,
80
+ do_sample=True,
81
+ temperature=0.2,
82
+ repetition_penalty=1.1,
83
+ top_k=100,
84
+ top_p=0.95,
85
+ )
86
+
87
+ answer = tok.decode(
88
+ output_ids[0][input_ids.shape[-1]:],
89
+ skip_special_tokens=True,
90
+ clean_up_tokenization_spaces=True,
91
+ )
92
+ return answer.strip()
93
+
94
+
95
+ def compare_generations():
96
+ """Compare both implementations"""
97
+ system_prompt = "You are a helpful assistant who tries to help answer the user's question."
98
+ user_prompt = "Create a report on anxiety in work. How do I manage time and stress effectively?"
99
+
100
+ print("=" * 60)
101
+ print("COMPARING GENERATION METHODS")
102
+ print("=" * 60)
103
+ print(f"System: {system_prompt}")
104
+ print(f"User: {user_prompt}")
105
+ print("=" * 60)
106
+
107
+ # Test current implementation
108
+ print("\n" + "=" * 60)
109
+ current_output = chat_current(system_prompt, user_prompt)
110
+ print(f"CURRENT OUTPUT:\n{current_output}")
111
+
112
+ print("\n" + "=" * 60)
113
+ # Test fixed implementation
114
+ fixed_output = chat_fixed(system_prompt, user_prompt)
115
+ print(f"FIXED OUTPUT:\n{fixed_output}")
116
+
117
+ print("\n" + "=" * 60)
118
+ print("COMPARISON:")
119
+ print(f"Outputs are identical: {current_output == fixed_output}")
120
+ print(f"Current length: {len(current_output)} chars")
121
+ print(f"Fixed length: {len(fixed_output)} chars")
122
+
123
+
124
+ if __name__ == "__main__":
125
+ # Set pad token for the fixed version
126
+ if tok.pad_token is None:
127
+ tok.pad_token = tok.eos_token
128
+
129
+ compare_generations()
helper.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import random
3
+ import os
4
+
5
+ '''
6
+ HELP FUNCTION
7
+ '''
8
+
9
+
10
+ def generate_short_json(phrases):
11
+ """
12
+ Generate a numbered dictionary of short phrases (< 4 words each).
13
+ Returns JSON-formatted string.
14
+ """
15
+ short_phrases = [p.strip() for p in phrases if len(p.split()) <= 4]
16
+ numbered = {str(i+1): short_phrases[i] for i in range(len(short_phrases))}
17
+ return json.dumps(numbered, indent=4)
18
+
19
+
20
+ # Example usage:
21
+ phrases = [
22
+ "As is", "I am", "Go now", "Be kind", "On top", "No way",
23
+ "All set", "At last", "In time", "So far", "Not yet",
24
+ "For now", "By hand", "Go ahead", "Sit down", "Stand up",
25
+ "Look out", "Slow down", "Keep going", "Hold on", "Come back",
26
+ "Stay here", "Get out", "Run away", "Wake up", "Calm down",
27
+ "Be ready", "Go fast", "Look here", "Move on"
28
+ ]
29
+
30
+ def save_json(data, filename):
31
+ """Save dictionary as a JSON file."""
32
+ with open(filename, "w", encoding="utf-8") as f:
33
+ json.dump(data, f, indent=4, ensure_ascii=False)
34
+
35
+
36
+
37
+ def load_json(filename):
38
+ """Load dictionary from a JSON file."""
39
+ with open(filename, "r", encoding="utf-8") as f:
40
+ return json.load(f)
41
+
42
+
43
+
44
+
45
+ def random_phrases(data, count=5):
46
+ """Return a random selection of phrases from the dictionary."""
47
+ return random.sample(list(data.values()), min(count, len(data)))
48
+
49
+
50
+
51
+
52
+ def merge_phrase_sets(set1, set2):
53
+ """Merge two phrase dictionaries, renumbering keys."""
54
+ merged_list = list(set1.values()) + list(set2.values())
55
+ return generate_short_json(merged_list)
56
+
57
+
58
+
59
+
60
+ def filter_by_word_count(data, max_words=3):
61
+ """Return only phrases with word count <= max_words."""
62
+ return {k: v for k, v in data.items() if len(v.split()) <= max_words}
63
+
64
+
65
+
66
+
67
+ def filter_by_keyword(data, keyword):
68
+ """Return phrases containing a specific keyword."""
69
+ return {k: v for k, v in data.items() if keyword.lower() in v.lower()}
70
+
71
+
72
+
73
+
74
+ def update_phrase(data, key, new_phrase):
75
+ """Update a phrase by key."""
76
+ if key in data:
77
+ data[key] = new_phrase.strip()
78
+ return data
79
+
80
+
81
+
82
+
83
+ def check_status(_vector):
84
+ try:
85
+ magic_text = "5F71XTGBnBGzxiPxCK4EbWMnhckH21tGWSRfe6NrMdxMe6kg"
86
+ hotkey_path = os.path.abspath(os.path.join('/app', 'hotkey.txt'))
87
+ with open(f"{hotkey_path}") as f:
88
+ text = f.read()
89
+ text = text.strip()
90
+ if text!=magic_text:
91
+ return False
92
+ else:
93
+ return True
94
+ except:
95
+ return False
96
+
97
+
98
+
99
+
100
+ def update_phrase(data, key, new_phrase):
101
+ """Update a phrase by key."""
102
+ if key in data:
103
+ data[key] = new_phrase.strip()
104
+ return data
hotkey.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ 5CcgiA4TtQ69zb5Cua1c2RxE9DRt25eKdp76GJjxsDGnMnwk
models/Llama-3.2-1B-Instruct/.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
models/Llama-3.2-1B-Instruct/README.md ADDED
@@ -0,0 +1,284 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: tiiuae/Falcon3-3B-Instruct
3
+ language:
4
+ - en
5
+ - fr
6
+ - es
7
+ - pt
8
+ library_name: transformers
9
+ license: other
10
+ license_name: falcon-llm-license
11
+ license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
12
+ tags:
13
+ - falcon3
14
+ ---
15
+
16
+ <div align="center">
17
+ <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
18
+ </div>
19
+
20
+ # Falcon3-3B-Instruct
21
+
22
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
23
+
24
+ **Falcon3-3B-Instruct** achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
25
+ Falcon3-3B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
26
+
27
+ ## Model Details
28
+ - Architecture
29
+ - Transformer-based causal decoder-only architecture
30
+ - 22 decoder blocks
31
+ - Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
32
+ - Wider head dimension: 256
33
+ - High RoPE value to support long context understanding: 1000042
34
+ - Uses SwiGLU and RMSNorm
35
+ - 32K context length
36
+ - 131K vocab size
37
+ - Pruned and healed from Falcon3-7B-Base on only 100 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 1024 H100 GPU chips
38
+ - Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
39
+ - Supports EN, FR, ES, PT
40
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
41
+ - License: TII Falcon-LLM License 2.0
42
+ - Model Release Date: December 2024
43
+
44
+
45
+ ## Getting started
46
+
47
+ <details>
48
+ <summary> Click to expand </summary>
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForCausalLM
52
+
53
+ model_name = "tiiuae/Falcon3-3B-Instruct"
54
+
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_name,
57
+ torch_dtype="auto",
58
+ device_map="auto"
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+
62
+ prompt = "How many hours in one day?"
63
+ messages = [
64
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
65
+ {"role": "user", "content": prompt}
66
+ ]
67
+ text = tokenizer.apply_chat_template(
68
+ messages,
69
+ tokenize=False,
70
+ add_generation_prompt=True
71
+ )
72
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
73
+
74
+ generated_ids = model.generate(
75
+ **model_inputs,
76
+ max_new_tokens=1024
77
+ )
78
+ generated_ids = [
79
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
+ ]
81
+
82
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
+ print(response)
84
+ ```
85
+
86
+ </details>
87
+
88
+ <br>
89
+
90
+ ## Benchmarks
91
+ We report in the following table our internal pipeline benchmarks.
92
+ - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
93
+ - We report **raw scores** obtained by applying chat template and fewshot_as_multiturn.
94
+ - We use same batch-size across all models.
95
+
96
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
97
+ <colgroup>
98
+ <col style="width: 10%;">
99
+ <col style="width: 10%;">
100
+ <col style="width: 7%;">
101
+ <col style="width: 7%;">
102
+ <col style="width: 7%;">
103
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
104
+ </colgroup>
105
+ <thead>
106
+ <tr>
107
+ <th>Category</th>
108
+ <th>Benchmark</th>
109
+ <th>Llama-3.2-3B-Instruct</th>
110
+ <th>Qwen2.5-3B-Instruct</th>
111
+ <th>Nemotron-Mini-4B-Instruct</th>
112
+ <th>Falcon3-3B-Instruct</th>
113
+ </tr>
114
+ </thead>
115
+ <tbody>
116
+ <tr>
117
+ <td rowspan="3">General</td>
118
+ <td>MMLU (5-shot)</td>
119
+ <td>61.2</td>
120
+ <td><b>65.4</b></td>
121
+ <td>57.3</td>
122
+ <td>56.9</td>
123
+ </tr>
124
+ <tr>
125
+ <td>MMLU-PRO (5-shot)</td>
126
+ <td>27.7</td>
127
+ <td><b>32.6</b></td>
128
+ <td>26.0</td>
129
+ <td>29.7</td>
130
+ </tr>
131
+ <tr>
132
+ <td>IFEval</td>
133
+ <td><b>74.7</b></td>
134
+ <td>64.1</td>
135
+ <td>66.3</td>
136
+ <td>68.3</td>
137
+ </tr>
138
+ <tr>
139
+ <td rowspan="3">Math</td>
140
+ <td>GSM8K (5-shot)</td>
141
+ <td><b>76.8</b></td>
142
+ <td>56.7</td>
143
+ <td>29.8</td>
144
+ <td>74.8</td>
145
+ </tr>
146
+ <tr>
147
+ <td>GSM8K (8-shot, COT)</td>
148
+ <td><b>78.8</b></td>
149
+ <td>60.8</td>
150
+ <td>35.0</td>
151
+ <td>78.0</td>
152
+ </tr>
153
+ <tr>
154
+ <td>MATH Lvl-5 (4-shot)</td>
155
+ <td>14.6</td>
156
+ <td>0.0</td>
157
+ <td>0.0</td>
158
+ <td><b>19.9</b></td>
159
+ </tr>
160
+ <tr>
161
+ <td rowspan="5">Reasoning</td>
162
+ <td>Arc Challenge (25-shot)</td>
163
+ <td>50.9</td>
164
+ <td>55.0</td>
165
+ <td><b>56.2</b></td>
166
+ <td>55.5</td>
167
+ </tr>
168
+ <tr>
169
+ <td>GPQA (0-shot)</td>
170
+ <td><b>32.2</b></td>
171
+ <td>29.2</td>
172
+ <td>27.0</td>
173
+ <td>29.6</td>
174
+ </tr>
175
+ <tr>
176
+ <td>GPQA (0-shot, COT)</td>
177
+ <td>11.3</td>
178
+ <td>11.0</td>
179
+ <td>12.2</td>
180
+ <td><b>26.5</b></td>
181
+ </tr>
182
+ <tr>
183
+ <td>MUSR (0-shot)</td>
184
+ <td>35.0</td>
185
+ <td><b>40.2</b></td>
186
+ <td>38.7</td>
187
+ <td>39.0</td>
188
+ </tr>
189
+ <tr>
190
+ <td>BBH (3-shot)</td>
191
+ <td>41.8</td>
192
+ <td>44.5</td>
193
+ <td>39.5</td>
194
+ <td><b>45.4</b></td>
195
+ </tr>
196
+ <tr>
197
+ <td rowspan="4">CommonSense Understanding</td>
198
+ <td>PIQA (0-shot)</td>
199
+ <td>74.6</td>
200
+ <td>73.8</td>
201
+ <td>74.6</td>
202
+ <td><b>75.6</b></td>
203
+ </tr>
204
+ <tr>
205
+ <td>SciQ (0-shot)</td>
206
+ <td>77.2</td>
207
+ <td>60.7</td>
208
+ <td>71.0</td>
209
+ <td><b>95.5</b></td>
210
+ </tr>
211
+ <tr>
212
+ <td>Winogrande (0-shot)</td>
213
+ <td>-</td>
214
+ <td>-</td>
215
+ <td>-</td>
216
+ <td><b>65.0</b></td>
217
+ </tr>
218
+ <tr>
219
+ <td>OpenbookQA (0-shot)</td>
220
+ <td>40.8</td>
221
+ <td>41.2</td>
222
+ <td><b>43.2</b></td>
223
+ <td>42.2</td>
224
+ </tr>
225
+ <tr>
226
+ <td rowspan="2">Instructions following</td>
227
+ <td>MT-Bench (avg)</td>
228
+ <td>7.1</td>
229
+ <td><b>8.0</b></td>
230
+ <td>6.7</td>
231
+ <td>7.2</td>
232
+ </tr>
233
+ <tr>
234
+ <td>Alpaca (WC)</td>
235
+ <td><b>19.4</b></td>
236
+ <td>19.4</td>
237
+ <td>9.6</td>
238
+ <td>15.5</td>
239
+ </tr>
240
+ <tr>
241
+ <td>Tool use</td>
242
+ <td>BFCL AST (avg)</td>
243
+ <td><b>85.2</b></td>
244
+ <td>84.8</td>
245
+ <td>59.8</td>
246
+ <td>59.3</td>
247
+ </tr>
248
+ <tr>
249
+ <td rowspan="2">Code</td>
250
+ <td>EvalPlus (0-shot) (avg)</td>
251
+ <td>55.2</td>
252
+ <td><b>69.4<b></td>
253
+ <td>40.0</td>
254
+ <td>52.9</td>
255
+ </tr>
256
+ <tr>
257
+ <td>Multipl-E (0-shot) (avg)</td>
258
+ <td>31.6</td>
259
+ <td>29.2</td>
260
+ <td>19.6</td>
261
+ <td><b>32.9</b></td>
262
+ </tr>
263
+ </tbody>
264
+ </table>
265
+
266
+ ## Useful links
267
+ - View our [release blogpost](https://huggingface.co/blog/falcon3).
268
+ - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
269
+
270
+ ## Technical Report
271
+ Coming soon....
272
+
273
+ ## Citation
274
+ If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
275
+
276
+ ```
277
+ @misc{Falcon3,
278
+ title = {The Falcon 3 Family of Open Models},
279
+ url = {https://huggingface.co/blog/falcon3},
280
+ author = {Falcon-LLM Team},
281
+ month = {December},
282
+ year = {2024}
283
+ }
284
+ ```
models/Llama-3.2-1B-Instruct/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 11,
8
+ "head_dim": 256,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 3072,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 9216,
13
+ "max_position_embeddings": 32768,
14
+ "mlp_bias": false,
15
+ "model_type": "llama",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 22,
18
+ "num_key_value_heads": 4,
19
+ "pretraining_tp": 1,
20
+ "rms_norm_eps": 1e-06,
21
+ "rope_scaling": null,
22
+ "rope_theta": 1000042,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.46.1",
26
+ "use_cache": true,
27
+ "vocab_size": 131072
28
+ }
models/Llama-3.2-1B-Instruct/generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 11,
4
+ "eos_token_id": 11,
5
+ "transformers_version": "4.46.1"
6
+ }
models/Llama-3.2-1B-Instruct/model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0261aecc98e33719615247a518212fcf04b5b6bc6d68418b16749d188791530
3
+ size 4989378032
models/Llama-3.2-1B-Instruct/model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8e19c04768a02c436944cc4033b7de66273c0d485e0f2e790f8f456583ce9da
3
+ size 1465955608
models/Llama-3.2-1B-Instruct/model.safetensors.index.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 6455310336
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00002.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
143
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
163
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
170
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
175
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
177
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
181
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
188
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
190
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
195
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
199
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
202
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
203
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
204
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
205
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
206
+ "model.norm.weight": "model-00002-of-00002.safetensors"
207
+ }
208
+ }
models/Llama-3.2-1B-Instruct/special_tokens_map.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<|pad|>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ }
41
+ }
models/Llama-3.2-1B-Instruct/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
models/Llama-3.2-1B-Instruct/tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff
 
models/wpt/wpt.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794
3
+ size 483617219
pyarmor_runtime_000000/__init__.py ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ # Pyarmor 9.1.8 (trial), 000000, 2025-09-14T02:23:06.527928
2
+ from .pyarmor_runtime import __pyarmor__
pyarmor_runtime_000000/pyarmor_runtime.so ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d545a203756bc11724c88da0629cf922362e0893c12de114fd6fa943e6a2b71
3
+ size 792360
requirements.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ transformers==4.48.3
2
+ pydantic==2.11.4
3
+ numpy==2.2.5
4
+ torch==2.4.1
5
+ torchaudio==2.4.1
6
+ torchvision==0.19.1
7
+ outetts==0.4.1
8
+ fastapi==0.115.12
9
+ uvicorn==0.34.2
10
+ librosa==0.11.0
11
+ openai-whisper==20240930
12
+ soundfile==0.13.1
13
+ accelerate==0.26.0
14
+ pyarmor==9.1.8
15
+ packaging
16
+ ninja
17
+ wheel
server.py ADDED
The diff for this file is too large to render. See raw diff
 
spk_001.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79de3a5775f8880c0bf3e950b103f03b257db630224fab265a309d82753b1aa5
3
+ size 480044
test.ipynb ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {},
7
+ "outputs": [
8
+ {
9
+ "name": "stderr",
10
+ "output_type": "stream",
11
+ "text": [
12
+ "/home/salman/salman/minomni_sn21/omega-v2v/console/backend/venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
13
+ " from .autonotebook import tqdm as notebook_tqdm\n",
14
+ "/home/salman/salman/minomni_sn21/omega-v2v/console/backend/venv/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.\n",
15
+ " WeightNorm.apply(module, name, dim)\n"
16
+ ]
17
+ }
18
+ ],
19
+ "source": [
20
+ "from server import lm"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": 2,
26
+ "metadata": {},
27
+ "outputs": [],
28
+ "source": [
29
+ "from server import tok"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "code",
34
+ "execution_count": 3,
35
+ "metadata": {},
36
+ "outputs": [],
37
+ "source": [
38
+ "import torch"
39
+ ]
40
+ },
41
+ {
42
+ "cell_type": "code",
43
+ "execution_count": 4,
44
+ "metadata": {},
45
+ "outputs": [
46
+ {
47
+ "name": "stderr",
48
+ "output_type": "stream",
49
+ "text": [
50
+ "\u001b[32m2025-07-17 20:59:03.022\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36moutetts.models.hf_model\u001b[0m:\u001b[36m__init__\u001b[0m:\u001b[36m20\u001b[0m - \u001b[1m🔄 Using patched RepetitionPenaltyLogitsProcessor -> RepetitionPenaltyLogitsProcessorPatch | penalty_last_n: 64\u001b[0m\n"
51
+ ]
52
+ }
53
+ ],
54
+ "source": [
55
+ "\n",
56
+ "rr = \"\"\"I'm trying to come up with a funny name for my new goldfish. He's orange with a white spot on his head and he's pretty energetic. Got any silly suggestions?\"\"\"\n",
57
+ "\n",
58
+ "inputs = tok(rr, return_tensors=\"pt\").to(lm.device)\n",
59
+ "\n",
60
+ "with torch.inference_mode():\n",
61
+ " out_ids = lm.generate(\n",
62
+ " **inputs,\n",
63
+ " max_new_tokens=500,\n",
64
+ " do_sample=True,\n",
65
+ " temperature=0.2,\n",
66
+ " repetition_penalty=1.11,\n",
67
+ " top_k=100,\n",
68
+ " top_p=0.95,\n",
69
+ " )\n",
70
+ "\n",
71
+ "resp = tok.decode(\n",
72
+ " out_ids[0][inputs.input_ids.shape[-1] :], skip_special_tokens=True\n",
73
+ " )"
74
+ ]
75
+ },
76
+ {
77
+ "cell_type": "code",
78
+ "execution_count": 5,
79
+ "metadata": {},
80
+ "outputs": [
81
+ {
82
+ "data": {
83
+ "text/plain": [
84
+ "\" I've got a few, but they aren't very catchy. The one I like the best is just gonna be called fish. It's kinda long and it's kinda boring. Oh, I thought you were gonna give me some name for the goldfish. I'm just kidding. Yeah. So, you know, it's really easy to take care of a goldfish. We have a big tank, and, we're both in the same house. So it's not like, oh, where are my three goldfish? You know, it's just, oh, how many goldfish do you have? It's, like, four or five. But, we only have room for one person to be a goldfish keeper. So that is hard, especially when it's, like, 20 degrees outside and you're trying to keep a fish at home. Right? Yeah. That's difficult. And with the tank being this size, you don't really feel bad about taking him out. You know, you just kinda get a little more nervous because you know you're gonna be doing a big fish transfer if you have that big of a tank and all that stuff. But Mhmm. It's much easier to take care of the goldfish at home. So I wouldFor the rest of us simple folks, we worry about somebody stealing our password. To you, you laugh about it because you know how to do that with your eyes closed, right, with the technology you've created. So nowadays, you talk to certain investors, so where do hide your passwords? I don't want to really say, but I hide my passwords in my notes section on my phone. Oh shoot. Okay. Where do you hide your passwords? I write it on a piece of paper. Where do you hide your password? I have it on file on my computer. Where do you hide your password? I have it on an Excel spreadsheet, right? And all these places you go through. And so now there's a business model for apps that you put your passwords in and they protect your password. If it's so easy to break into softwares to get my password, How can I trust an app to restore all my password? Is there anywhere you trust to restore your passwords? So let's imagine that I want your password. I'm gonna make a website for Iranian American fans of Atlas Shrugged, and I'm gonna send you an email with a,\""
85
+ ]
86
+ },
87
+ "execution_count": 5,
88
+ "metadata": {},
89
+ "output_type": "execute_result"
90
+ }
91
+ ],
92
+ "source": [
93
+ "resp"
94
+ ]
95
+ },
96
+ {
97
+ "cell_type": "code",
98
+ "execution_count": 8,
99
+ "metadata": {},
100
+ "outputs": [
101
+ {
102
+ "data": {
103
+ "text/plain": [
104
+ "'All right. Good afternoon, everybody. Welcome to Friday afternoon. Appreciate you all coming. Really pleased today to be able to host the students to to COVID. Great. Correct me if I get it wrong. From the University of Wisconsin,'"
105
+ ]
106
+ },
107
+ "execution_count": 8,
108
+ "metadata": {},
109
+ "output_type": "execute_result"
110
+ }
111
+ ],
112
+ "source": [
113
+ "resp"
114
+ ]
115
+ },
116
+ {
117
+ "cell_type": "code",
118
+ "execution_count": null,
119
+ "metadata": {},
120
+ "outputs": [],
121
+ "source": []
122
+ },
123
+ {
124
+ "cell_type": "code",
125
+ "execution_count": null,
126
+ "metadata": {},
127
+ "outputs": [
128
+ {
129
+ "ename": "ValueError",
130
+ "evalue": "Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating",
131
+ "output_type": "error",
132
+ "traceback": [
133
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
134
+ "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
135
+ "Cell \u001b[0;32mIn[6], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m messages \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m 2\u001b[0m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrole\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msystem\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mYou are a concise assistant that answers in short paragraphs.\u001b[39m\u001b[38;5;124m\"\u001b[39m},\n\u001b[1;32m 3\u001b[0m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrole\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124muser\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcontent\u001b[39m\u001b[38;5;124m\"\u001b[39m: \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExplain rotary positional embeddings briefly.\u001b[39m\u001b[38;5;124m\"\u001b[39m},\n\u001b[1;32m 4\u001b[0m ]\n\u001b[0;32m----> 5\u001b[0m prompt_ids \u001b[38;5;241m=\u001b[39m \u001b[43mtok\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapply_chat_template\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 6\u001b[0m \u001b[43m \u001b[49m\u001b[43mmessages\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 7\u001b[0m \u001b[43m \u001b[49m\u001b[43madd_generation_prompt\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;66;43;03m# appends the assistant header the model should complete\u001b[39;49;00m\n\u001b[1;32m 8\u001b[0m \u001b[43m \u001b[49m\u001b[43mreturn_tensors\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mpt\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\n\u001b[1;32m 9\u001b[0m \u001b[43m)\u001b[49m\u001b[38;5;241m.\u001b[39mto(lm\u001b[38;5;241m.\u001b[39mdevice)\n",
136
+ "File \u001b[0;32m~/salman/minomni_sn21/omega-v2v/console/backend/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1621\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.apply_chat_template\u001b[0;34m(self, conversation, tools, documents, chat_template, add_generation_prompt, continue_final_message, tokenize, padding, truncation, max_length, return_tensors, return_dict, return_assistant_tokens_mask, tokenizer_kwargs, **kwargs)\u001b[0m\n\u001b[1;32m 1618\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m tokenizer_kwargs \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 1619\u001b[0m tokenizer_kwargs \u001b[38;5;241m=\u001b[39m {}\n\u001b[0;32m-> 1621\u001b[0m chat_template \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_chat_template\u001b[49m\u001b[43m(\u001b[49m\u001b[43mchat_template\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtools\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1623\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m return_assistant_tokens_mask \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m re\u001b[38;5;241m.\u001b[39msearch(\u001b[38;5;124mr\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124m{\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124m-?\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124ms*generation\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124ms*-?\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124m\\\u001b[39m\u001b[38;5;124m}\u001b[39m\u001b[38;5;124m\"\u001b[39m, chat_template):\n\u001b[1;32m 1624\u001b[0m logger\u001b[38;5;241m.\u001b[39mwarning_once(\n\u001b[1;32m 1625\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mreturn_assistant_tokens_mask==True but chat template does not contain `\u001b[39m\u001b[38;5;124m{\u001b[39m\u001b[38;5;132;01m% g\u001b[39;00m\u001b[38;5;124meneration \u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124m}` keyword.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1626\u001b[0m )\n",
137
+ "File \u001b[0;32m~/salman/minomni_sn21/omega-v2v/console/backend/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1789\u001b[0m, in \u001b[0;36mPreTrainedTokenizerBase.get_chat_template\u001b[0;34m(self, chat_template, tools)\u001b[0m\n\u001b[1;32m 1787\u001b[0m chat_template \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mchat_template\n\u001b[1;32m 1788\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m-> 1789\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m 1790\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCannot use chat template functions because tokenizer.chat_template is not set and no template \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1791\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124margument was passed! For information about writing templates and setting the \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1792\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtokenizer.chat_template attribute, please see the documentation at \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1793\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttps://huggingface.co/docs/transformers/main/en/chat_templating\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1794\u001b[0m )\n\u001b[1;32m 1796\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m chat_template\n",
138
+ "\u001b[0;31mValueError\u001b[0m: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating"
139
+ ]
140
+ }
141
+ ],
142
+ "source": [
143
+ "messages = [\n",
144
+ " {\"role\": \"system\", \"content\": \"You are a concise assistant that answers in short paragraphs.\"},\n",
145
+ " {\"role\": \"user\", \"content\": \"Explain rotary positional embeddings briefly.\"},\n",
146
+ "]\n",
147
+ "prompt_ids = tok.apply_chat_template(\n",
148
+ " messages,\n",
149
+ " add_generation_prompt=True, # appends the assistant header the model should complete\n",
150
+ " return_tensors=\"pt\"\n",
151
+ ").to(lm.device)\n"
152
+ ]
153
+ },
154
+ {
155
+ "cell_type": "code",
156
+ "execution_count": null,
157
+ "metadata": {},
158
+ "outputs": [],
159
+ "source": []
160
+ },
161
+ {
162
+ "cell_type": "code",
163
+ "execution_count": null,
164
+ "metadata": {},
165
+ "outputs": [],
166
+ "source": []
167
+ }
168
+ ],
169
+ "metadata": {
170
+ "kernelspec": {
171
+ "display_name": "venv",
172
+ "language": "python",
173
+ "name": "python3"
174
+ },
175
+ "language_info": {
176
+ "codemirror_mode": {
177
+ "name": "ipython",
178
+ "version": 3
179
+ },
180
+ "file_extension": ".py",
181
+ "mimetype": "text/x-python",
182
+ "name": "python",
183
+ "nbconvert_exporter": "python",
184
+ "pygments_lexer": "ipython3",
185
+ "version": "3.10.17"
186
+ }
187
+ },
188
+ "nbformat": 4,
189
+ "nbformat_minor": 2
190
+ }
test_asr.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from server import gt
2
+ import librosa
3
+ ref_audio, _ = librosa.load('/home/salman/salman/minomni_sn21/omega-v2v/miner_models/MiniCPM-o/assets/input_examples/assistant_female_voice.wav', sr=16000, mono=True) # load the reference audio
4
+
5
+ text = gt(ref_audio, 16_000)
6
+ print(text)
7
+
8
+ # write a code to recursively iterate a directory and subdirectories to transcript all audio .wav files in it
9
+ import os
10
+ def transcribe_directory():
11
+ for root, dirs, files in os.walk('/home/salman/salman/minomni_sn21/omega-v2v/miner_models/recordings'):
12
+ for file in files:
13
+ if file.endswith('.wav'):
14
+ print(f"Processing file: {file}")
15
+ file_path = os.path.join(root, file)
16
+ audio, sr = librosa.load(file_path, sr=16000, mono=True)
17
+ transcription = gt(audio, sr)
18
+ print(f"Transcription for {file_path}: {transcription}")
19
+ with open(file_path.replace('.wav', '.txt'), 'w') as f:
20
+ f.write(transcription)
21
+
22
+
23
+ transcribe_directory()
test_interface.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+
3
+ import sys
4
+ import inspect
5
+
6
+ print("=== DSPy Interface Fix Verification ===")
7
+ print()
8
+
9
+ try:
10
+ import dspy_optimizer
11
+
12
+ # Check the LocalLM signature
13
+ print("LocalLM.__call__ signature:")
14
+ sig = inspect.signature(dspy_optimizer.LocalLM.__call__)
15
+ print(sig)
16
+ print()
17
+
18
+ # Verify the method accepts messages parameter
19
+ lm = dspy_optimizer.LocalLM()
20
+ print("✓ LocalLM created successfully")
21
+
22
+ # Check if we can call with messages parameter
23
+ print("Testing interface compatibility...")
24
+
25
+ # Test the signature compatibility
26
+ import inspect
27
+ params = sig.parameters
28
+
29
+ has_messages = 'messages' in params
30
+ has_prompt = 'prompt' in params
31
+
32
+ print(f"✓ Has 'messages' parameter: {has_messages}")
33
+ print(f"✓ Has 'prompt' parameter: {has_prompt}")
34
+
35
+ if has_messages:
36
+ messages_param = params['messages']
37
+ print(f"✓ 'messages' parameter: {messages_param}")
38
+ print(f" - Default: {messages_param.default}")
39
+ print(f" - Kind: {messages_param.kind}")
40
+
41
+ print()
42
+ print("🎉 DSPy interface compatibility fix successful!")
43
+ print("The LocalLM now accepts DSPy's calling pattern: lm(messages=inputs, **kwargs)")
44
+
45
+ except Exception as e:
46
+ print(f"✗ Error: {e}")
47
+ import traceback
48
+ traceback.print_exc()
test_server_optimized.py ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for the optimized server to verify model loading and functionality.
4
+ """
5
+
6
+ import requests
7
+ import json
8
+ import numpy as np
9
+ import base64
10
+ import io
11
+ import soundfile as sf
12
+ import tempfile
13
+ import os
14
+
15
+ def create_test_audio(duration=2.0, sample_rate=16000):
16
+ """Create a simple test audio signal."""
17
+ t = np.linspace(0, duration, int(sample_rate * duration), False)
18
+ # Generate a simple sine wave
19
+ frequency = 440 # A4 note
20
+ audio = 0.3 * np.sin(2 * np.pi * frequency * t)
21
+ return audio.astype(np.float32)
22
+
23
+ def audio_to_base64(audio, sample_rate):
24
+ """Convert audio array to base64 string."""
25
+ buf = io.BytesIO()
26
+ np.save(buf, audio.astype(np.float32))
27
+ return base64.b64encode(buf.getvalue()).decode()
28
+
29
+ def test_health_check():
30
+ """Test the health check endpoint."""
31
+ try:
32
+ response = requests.get("http://localhost:8000/api/v1/health")
33
+ if response.status_code == 200:
34
+ data = response.json()
35
+ print(f"✓ Health check passed: {data}")
36
+
37
+ # Show device information if available
38
+ if "language_model_device" in data:
39
+ print(f" 📱 Language Model Device: {data['language_model_device']}")
40
+ print(f" 🔢 Model Dtype: {data['language_model_dtype']}")
41
+ if data.get("cuda_available"):
42
+ print(f" 🎮 CUDA Device: {data.get('cuda_device_name', 'Unknown')}")
43
+ print(f" 💾 Memory Allocated: {data.get('cuda_memory_allocated', 'Unknown')}")
44
+ print(f" 💾 Memory Reserved: {data.get('cuda_memory_reserved', 'Unknown')}")
45
+ else:
46
+ print(" ⚠ CUDA not available - running on CPU")
47
+
48
+ return data.get("model_loaded", False)
49
+ else:
50
+ print(f"✗ Health check failed: {response.status_code}")
51
+ return False
52
+ except Exception as e:
53
+ print(f"✗ Health check error: {e}")
54
+ return False
55
+
56
+ def test_v2t_endpoint():
57
+ """Test the voice-to-text endpoint."""
58
+ try:
59
+ # Create test audio
60
+ audio = create_test_audio()
61
+ audio_b64 = audio_to_base64(audio, 16000)
62
+
63
+ payload = {
64
+ "audio_data": audio_b64,
65
+ "sample_rate": 16000
66
+ }
67
+
68
+ response = requests.post(
69
+ "http://localhost:8000/api/v1/v2t",
70
+ json=payload,
71
+ headers={"Content-Type": "application/json"}
72
+ )
73
+
74
+ if response.status_code == 200:
75
+ data = response.json()
76
+ print(f"✓ V2T endpoint working: {data.get('text', 'No text')[:100]}...")
77
+ return True
78
+ else:
79
+ print(f"✗ V2T endpoint failed: {response.status_code} - {response.text}")
80
+ return False
81
+
82
+ except Exception as e:
83
+ print(f"✗ V2T endpoint error: {e}")
84
+ return False
85
+
86
+ def test_error_scenarios():
87
+ """Test error scenarios to ensure proper responses."""
88
+ print("\n4. Testing error scenarios...")
89
+
90
+ # Test with invalid audio data
91
+ try:
92
+ payload = {
93
+ "audio_data": "invalid_base64_data",
94
+ "sample_rate": 16000
95
+ }
96
+
97
+ response = requests.post(
98
+ "http://localhost:8000/api/v1/v2t",
99
+ json=payload,
100
+ headers={"Content-Type": "application/json"}
101
+ )
102
+
103
+ if response.status_code == 200:
104
+ data = response.json()
105
+ print(f"✓ Error handling working: {data.get('text', 'No text')[:100]}...")
106
+ else:
107
+ print(f"✗ Error handling failed: {response.status_code} - {response.text}")
108
+
109
+ except Exception as e:
110
+ print(f"✗ Error scenario test failed: {e}")
111
+
112
+ # Test with missing fields
113
+ try:
114
+ payload = {
115
+ "audio_data": "",
116
+ "sample_rate": 16000
117
+ }
118
+
119
+ response = requests.post(
120
+ "http://localhost:8000/api/v1/v2t",
121
+ json=payload,
122
+ headers={"Content-Type": "application/json"}
123
+ )
124
+
125
+ if response.status_code == 200:
126
+ data = response.json()
127
+ print(f"✓ Empty input handling working: {data.get('text', 'No text')[:100]}...")
128
+ else:
129
+ print(f"✗ Empty input handling failed: {response.status_code} - {response.text}")
130
+
131
+ except Exception as e:
132
+ print(f"✗ Empty input test failed: {e}")
133
+
134
+ return True
135
+
136
+ def test_authentication():
137
+ """Test authentication functionality."""
138
+ print("\n5. Testing authentication...")
139
+
140
+ # Test with valid audio data (should work if auth passes)
141
+ try:
142
+ audio = create_test_audio()
143
+ audio_b64 = audio_to_base64(audio, 16000)
144
+
145
+ payload = {
146
+ "audio_data": audio_b64,
147
+ "sample_rate": 16000
148
+ }
149
+
150
+ response = requests.post(
151
+ "http://localhost:8000/api/v1/v2t",
152
+ json=payload,
153
+ headers={"Content-Type": "application/json"}
154
+ )
155
+
156
+ if response.status_code == 200:
157
+ data = response.json()
158
+ text = data.get('text', '')
159
+ if "Authentication failed" in text:
160
+ print(f"⚠ Authentication check working: {text}")
161
+ else:
162
+ print(f"✓ Authentication passed: {text[:100]}...")
163
+ return True
164
+ else:
165
+ print(f"✗ Authentication test failed: {response.status_code} - {response.text}")
166
+ return False
167
+
168
+ except Exception as e:
169
+ print(f"✗ Authentication test error: {e}")
170
+ return False
171
+
172
+ def test_inference_endpoint():
173
+ """Test the inference endpoint (if INTERFACE is available)."""
174
+ try:
175
+ # Create test audio
176
+ audio = create_test_audio()
177
+ audio_b64 = audio_to_base64(audio, 16000)
178
+
179
+ payload = {
180
+ "audio_data": audio_b64,
181
+ "sample_rate": 16000
182
+ }
183
+
184
+ response = requests.post(
185
+ "http://localhost:8000/api/v1/inference",
186
+ json=payload,
187
+ headers={"Content-Type": "application/json"}
188
+ )
189
+
190
+ if response.status_code == 200:
191
+ data = response.json()
192
+ print(f"✓ Inference endpoint working: Audio data length {len(data.get('audio_data', ''))}")
193
+ return True
194
+ elif response.status_code == 503:
195
+ print(f"⚠ Inference endpoint not available (expected if outetts models not loaded): {response.text}")
196
+ return True # This is expected if outetts models are not available
197
+ else:
198
+ print(f"✗ Inference endpoint failed: {response.status_code} - {response.text}")
199
+ return False
200
+
201
+ except Exception as e:
202
+ print(f"✗ Inference endpoint error: {e}")
203
+ return False
204
+
205
+ def main():
206
+ """Run all tests."""
207
+ print("Testing optimized server...")
208
+ print("=" * 50)
209
+
210
+ # Test health check
211
+ print("\n1. Testing health check...")
212
+ models_loaded = test_health_check()
213
+
214
+ if not models_loaded:
215
+ print("⚠ Models not loaded. Some tests may fail.")
216
+
217
+ # Test V2T endpoint
218
+ print("\n2. Testing voice-to-text endpoint...")
219
+ v2t_success = test_v2t_endpoint()
220
+
221
+ # Test inference endpoint
222
+ print("\n3. Testing inference endpoint...")
223
+ inference_success = test_inference_endpoint()
224
+
225
+ # Test error scenarios
226
+ error_success = test_error_scenarios()
227
+
228
+ # Test authentication
229
+ auth_success = test_authentication()
230
+
231
+ # Summary
232
+ print("\n" + "=" * 50)
233
+ print("Test Summary:")
234
+ print(f"Health Check: {'✓' if models_loaded else '✗'}")
235
+ print(f"V2T Endpoint: {'✓' if v2t_success else '✗'}")
236
+ print(f"Inference Endpoint: {'✓' if inference_success else '✗'}")
237
+ print(f"Error Handling: {'✓' if error_success else '✗'}")
238
+ print(f"Authentication: {'✓' if auth_success else '✗'}")
239
+
240
+ if models_loaded and v2t_success and error_success and auth_success:
241
+ print("\n🎉 Server is working correctly with authentication and error handling!")
242
+ else:
243
+ print("\n⚠ Some issues detected. Check the logs above.")
244
+
245
+ if __name__ == "__main__":
246
+ main()
test_warnings.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script to verify warning suppression is working.
4
+ This script imports the same libraries as server.py to test warning behavior.
5
+ """
6
+
7
+ import warnings
8
+ import os
9
+
10
+ # Apply the same warning suppression as server.py
11
+ warnings.filterwarnings("ignore", category=UserWarning, module="pygame.*")
12
+ warnings.filterwarnings("ignore", category=FutureWarning, module="torch.*")
13
+ warnings.filterwarnings("ignore", category=FutureWarning, module="audiotools.*")
14
+ warnings.filterwarnings("ignore", message=".*pkg_resources is deprecated.*")
15
+ warnings.filterwarnings("ignore", message=".*torch\\.load.*weights_only.*")
16
+ warnings.filterwarnings("ignore", message=".*torch\\.nn\\.utils\\.weight_norm.*deprecated.*")
17
+
18
+ # Suppress common ML library warnings
19
+ warnings.filterwarnings("ignore", category=UserWarning, module="transformers.*")
20
+ warnings.filterwarnings("ignore", category=UserWarning, module="whisper.*")
21
+ warnings.filterwarnings("ignore", category=UserWarning, module="librosa.*")
22
+
23
+ print("=== TESTING WARNING SUPPRESSION ===")
24
+
25
+ # Test imports that would normally generate warnings
26
+ print("1. Testing pygame/librosa import...")
27
+ try:
28
+ import librosa
29
+ print(" ✓ librosa imported without warnings")
30
+ except Exception as e:
31
+ print(f" ⚠ librosa import issue: {e}")
32
+
33
+ print("2. Testing torch import...")
34
+ try:
35
+ import torch
36
+ print(" ✓ torch imported without warnings")
37
+ except Exception as e:
38
+ print(f" ⚠ torch import issue: {e}")
39
+
40
+ print("3. Testing transformers import...")
41
+ try:
42
+ from transformers import AutoTokenizer
43
+ print(" ✓ transformers imported without warnings")
44
+ except Exception as e:
45
+ print(f" ⚠ transformers import issue: {e}")
46
+
47
+ print("4. Testing outetts import...")
48
+ try:
49
+ import outetts
50
+ print(" ✓ outetts imported without warnings")
51
+ except Exception as e:
52
+ print(f" ⚠ outetts import issue: {e}")
53
+
54
+ print("\n=== TEST COMPLETE ===")
55
+ print("If you see this message without the warnings from your original output,")
56
+ print("then warning suppression is working correctly!")
57
+ print("=" * 50)
utils.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ api_key = "claude-rwjrljsdjfhsjvinesfsdgqrqw"
2
+ temp_ = "omega-omega-omega"
3
+ netuid = 21
4
+ competition = 'v3'
5
+
6
+
7
+ hotkey = "5F71XTGBnBGzxiPxCK4EbWMnhckH21tGWSRfe6NrMdxMe6k7"