botirk commited on
Commit
d2ab339
Β·
verified Β·
1 Parent(s): 2e915b3

Upload quantized ONNX model

Browse files
Files changed (4) hide show
  1. .gitattributes +0 -10
  2. README.md +33 -303
  3. config.json +155 -69
  4. model_quantized.onnx +2 -2
.gitattributes CHANGED
@@ -1,11 +1 @@
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
- *.bin filter=lfs diff=lfs merge=lfs -text
3
- *.safetensors filter=lfs diff=lfs merge=lfs -text
4
- *.pkl filter=lfs diff=lfs merge=lfs -text
5
- *.h5 filter=lfs diff=lfs merge=lfs -text
6
- *.tflite filter=lfs diff=lfs merge=lfs -text
7
- *.tar.gz filter=lfs diff=lfs merge=lfs -text
8
- *.ot filter=lfs diff=lfs merge=lfs -text
9
- *.arrow filter=lfs diff=lfs merge=lfs -text
10
- *.ftz filter=lfs diff=lfs merge=lfs -text
11
- *.joblib filter=lfs diff=lfs merge=lfs -text
 
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,319 +1,49 @@
1
- # Prompt Task Complexity Classifier - Quantized
2
-
3
- πŸš€ **A high-performance, quantized ONNX implementation of NVIDIA's prompt task and complexity classifier optimized for fast CPU inference.**
4
-
5
- This standalone Python package provides a quantized version of the [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) with ~75% size reduction and 2-4x speed improvement while maintaining accuracy.
6
-
7
- ## ✨ Features
8
-
9
- - πŸ”₯ **Fast Inference**: 2-4x faster than original model on CPU
10
- - πŸ“¦ **Compact Size**: ~75% smaller model footprint
11
- - 🎯 **Comprehensive Analysis**: 8 classification dimensions + complexity scoring
12
- - πŸ”§ **Easy Integration**: Drop-in replacement with familiar API
13
- - 🐍 **Production Ready**: Optimized for server deployment and batch processing
14
-
15
- ## πŸ“Š What This Model Does
16
-
17
- The quantized classifier analyzes text prompts across **8 key dimensions**:
18
-
19
- | Dimension | Description | Classes |
20
- |-----------|-------------|---------|
21
- | **Task Type** | Primary task category | 11 types (QA, Generation, Summarization, etc.) |
22
- | **Creativity Scope** | Creative thinking requirements | 5 levels (0.0 - 1.0) |
23
- | **Reasoning** | Logical reasoning complexity | 5 levels (0.0 - 1.0) |
24
- | **Contextual Knowledge** | Context understanding needs | 5 levels (0.0 - 1.0) |
25
- | **Few-shot Learning** | Examples needed | 5 levels (0-4+ shots) |
26
- | **Domain Knowledge** | Specialized expertise required | 5 levels (0.0 - 1.0) |
27
- | **Label Reasoning** | Classification reasoning needs | 5 levels (0.0 - 1.0) |
28
- | **Constraint Handling** | Rule/constraint complexity | 5 levels (0.0 - 1.0) |
29
-
30
- Plus a **task-weighted complexity score** that combines all dimensions intelligently based on the detected task type.
31
-
32
- ## πŸš€ Quick Start
33
-
34
- ### Installation
35
-
36
- ```bash
37
- # Install the package with Poetry
38
- cd prompt-task-complexity-classifier-quantized
39
- poetry install
40
-
41
- # Or install dependencies directly
42
- pip install torch transformers onnxruntime optimum[onnxruntime] huggingface-hub numpy
43
- ```
44
-
45
- ### Basic Usage
46
-
47
- ```python
48
- from prompt_classifier import QuantizedPromptClassifier
49
-
50
- # Load the quantized model
51
- classifier = QuantizedPromptClassifier.from_pretrained("./")
52
-
53
- # Classify a single prompt
54
- result = classifier.classify_single_prompt(
55
- "Write a Python function to implement quicksort with detailed comments"
56
- )
57
-
58
- print(f"Task: {result['task_type_1'][0]}") # "Code Generation"
59
- print(f"Complexity: {result['prompt_complexity_score'][0]:.3f}") # 0.652
60
- print(f"Reasoning: {result['reasoning'][0]:.3f}") # 0.750
61
- print(f"Creativity: {result['creativity_scope'][0]:.3f}") # 0.250
62
- ```
63
-
64
- ### Batch Processing
65
-
66
- ```python
67
- # Process multiple prompts efficiently
68
- prompts = [
69
- "What is the capital of France?",
70
- "Explain quantum computing and write simulation code",
71
- "Create a marketing strategy for eco-friendly products"
72
- ]
73
-
74
- results = classifier.classify_prompts(prompts)
75
-
76
- for prompt, result in zip(prompts, results):
77
- task_type = result['task_type_1'][0]
78
- complexity = result['prompt_complexity_score'][0]
79
- print(f"{task_type}: {complexity:.3f} - {prompt[:50]}...")
80
- ```
81
-
82
- ### Command Line Interface
83
-
84
- ```bash
85
- # Quantize the original model
86
- prompt-classifier quantize --output-dir ./my_quantized_model
87
-
88
- # Test the quantized model
89
- prompt-classifier test --model-path ./my_quantized_model --benchmark
90
-
91
- # Classify prompts from command line
92
- prompt-classifier classify "Explain machine learning" "Write a sorting algorithm"
93
-
94
- # Get model information
95
- prompt-classifier info --model-path ./my_quantized_model
96
-
97
- # Upload to Hugging Face Hub
98
- prompt-classifier upload your-username/my-quantized-model --private
99
- ```
100
-
101
- ## πŸ“¦ Package Structure
102
-
103
- ```
104
- prompt-task-complexity-classifier-quantized/
105
- β”œβ”€β”€ src/prompt_classifier/
106
- β”‚ β”œβ”€β”€ __init__.py # Main package exports
107
- β”‚ β”œβ”€β”€ classifier.py # Core QuantizedPromptClassifier class
108
- β”‚ β”œβ”€β”€ utils.py # Utility functions
109
- β”‚ β”œβ”€β”€ cli.py # Command line interface
110
- β”‚ β”œβ”€β”€ testing.py # Test and validation functions
111
- β”‚ β”œβ”€β”€ examples.py # Usage examples
112
- β”‚ └── scripts/
113
- β”‚ β”œβ”€β”€ quantization.py # Model quantization script
114
- β”‚ β”œβ”€β”€ upload.py # HuggingFace upload script
115
- β”‚ └── quantize_model.py # Core quantization logic
116
- β”œβ”€β”€ tests/
117
- β”‚ └── test_classifier.py # Unit tests
118
- β”œβ”€β”€ config.json # Model configuration
119
- β”œβ”€β”€ pyproject.toml # Poetry project configuration
120
- β”œβ”€β”€ README.md # This file
121
- └── .gitattributes # Git LFS configuration
122
- ```
123
-
124
- ## πŸ› οΈ Development Workflow
125
-
126
- ### 1. Setup Development Environment
127
-
128
- ```bash
129
- # Clone and setup
130
- git clone <your-repo>
131
- cd prompt-task-complexity-classifier-quantized
132
-
133
- # Install with development dependencies
134
- poetry install --with dev
135
 
136
- # Activate environment
137
- poetry shell
138
- ```
139
 
140
- ### 2. Quantize Your Own Model
141
 
142
- ```bash
143
- # Run quantization process
144
- python -m prompt_classifier.scripts.quantization \
145
- --model-id nvidia/prompt-task-and-complexity-classifier \
146
- --output-dir ./quantized_output
147
- ```
148
 
149
- ### 3. Test and Validate
150
 
151
- ```bash
152
- # Run comprehensive tests
153
- python -m prompt_classifier.testing
154
 
155
- # Or use pytest for unit tests
156
- pytest tests/ -v
157
- ```
158
 
159
- ### 4. Upload to Hugging Face
160
 
 
161
  ```bash
162
- # Login to HF Hub
163
- huggingface-cli login
164
-
165
- # Upload your quantized model
166
- python -m prompt_classifier.scripts.upload your-username/model-name
167
  ```
168
 
169
- ## ⚑ Performance Benchmarks
170
-
171
- | Metric | Original Model | Quantized Model | Improvement |
172
- |--------|---------------|-----------------|-------------|
173
- | **Model Size** | ~350 MB | ~89 MB | 75% smaller |
174
- | **Inference Speed** | 45ms/prompt | 12ms/prompt | 3.7x faster |
175
- | **Memory Usage** | ~1.2 GB | ~320 MB | 73% reduction |
176
- | **Accuracy** | Baseline | -1.2% typical | Minimal loss |
177
-
178
- *Benchmarks run on Intel i7-10700K CPU with batch size 1*
179
-
180
- ## πŸ”§ Advanced Usage
181
-
182
- ### Custom Model Path
183
-
184
  ```python
185
- # Load from custom directory
186
- classifier = QuantizedPromptClassifier.from_pretrained("/path/to/model")
187
-
188
- # Load from Hugging Face Hub
189
- classifier = QuantizedPromptClassifier.from_pretrained("username/model-name")
190
- ```
191
-
192
- ### Direct ONNX Runtime Usage
193
-
194
- ```python
195
- import onnxruntime as ort
196
- from transformers import AutoTokenizer
197
-
198
- # For maximum performance
199
- session = ort.InferenceSession("model_quantized.onnx")
200
- tokenizer = AutoTokenizer.from_pretrained("./")
201
-
202
- # Run inference directly
203
- inputs = tokenizer("Your prompt", return_tensors="np", padding=True, truncation=True)
204
- outputs = session.run(None, {
205
- "input_ids": inputs["input_ids"].astype(np.int64),
206
- "attention_mask": inputs["attention_mask"].astype(np.int64)
207
- })
208
- ```
209
-
210
- ### Integration with Existing Code
211
-
212
- ```python
213
- # Drop-in replacement for original CustomModel
214
- from prompt_classifier import QuantizedPromptClassifier
215
-
216
- # Replace this:
217
- # from some_module import CustomModel
218
- # model = CustomModel.from_pretrained("nvidia/prompt-task-and-complexity-classifier")
219
-
220
- # With this:
221
- model = QuantizedPromptClassifier.from_pretrained("./quantized_model")
222
-
223
- # Same API, better performance!
224
- results = model.classify_prompts(["Your prompts here"])
225
- ```
226
-
227
- ## πŸ“ API Reference
228
-
229
- ### `QuantizedPromptClassifier`
230
-
231
- Main class for prompt classification with quantized ONNX backend.
232
 
233
- #### Methods
 
 
234
 
235
- - `from_pretrained(model_path)` - Load model from directory or HF Hub
236
- - `classify_prompts(prompts: List[str])` - Classify multiple prompts
237
- - `classify_single_prompt(prompt: str)` - Classify one prompt
238
- - `get_task_types(prompts: List[str])` - Get just task types
239
- - `get_complexity_scores(prompts: List[str])` - Get just complexity scores
240
 
241
- #### Configuration
242
-
243
- The model uses the same configuration as the original, with additional quantization metadata:
244
-
245
- ```json
246
- {
247
- "quantized": true,
248
- "quantization_method": "dynamic",
249
- "framework": "onnx",
250
- "optimized_for": "cpu",
251
- "file_name": "model_quantized.onnx"
252
- }
253
  ```
254
-
255
- ## πŸ§ͺ Testing
256
-
257
- ```bash
258
- # Run all tests
259
- pytest tests/ -v
260
-
261
- # Run with coverage
262
- pytest tests/ --cov=prompt_classifier --cov-report=html
263
-
264
- # Run only fast tests
265
- pytest tests/ -m "not slow"
266
-
267
- # Test specific functionality
268
- pytest tests/test_classifier.py::TestQuantizedPromptClassifier::test_classify_single_prompt
269
- ```
270
-
271
- ## 🀝 Contributing
272
-
273
- 1. Fork the repository
274
- 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
275
- 3. Make your changes and add tests
276
- 4. Run tests (`pytest tests/`)
277
- 5. Run linting (`ruff check src/ && black src/`)
278
- 6. Commit changes (`git commit -m 'Add amazing feature'`)
279
- 7. Push to branch (`git push origin feature/amazing-feature`)
280
- 8. Open a Pull Request
281
-
282
- ## πŸ“‹ Requirements
283
-
284
- - Python 3.9+
285
- - PyTorch 1.9+
286
- - Transformers 4.21+
287
- - ONNX Runtime 1.12+
288
- - Optimum 1.12+
289
- - NumPy 1.21+
290
-
291
- See `pyproject.toml` for complete dependency specifications.
292
-
293
- ## πŸ“„ License
294
-
295
- Apache 2.0 License - see [LICENSE](LICENSE) file for details.
296
-
297
- ## πŸ™ Acknowledgments
298
-
299
- - **NVIDIA** for the original prompt task and complexity classifier
300
- - **Microsoft** for ONNX Runtime quantization framework
301
- - **Hugging Face** for Optimum and Transformers libraries
302
- - **Poetry** for modern Python dependency management
303
-
304
- ## πŸ“ž Support
305
-
306
- - πŸ“š [Documentation](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier)
307
- - πŸ› [Issues](https://github.com/your-org/prompt-task-complexity-classifier-quantized/issues)
308
- - πŸ’¬ [Discussions](https://github.com/your-org/prompt-task-complexity-classifier-quantized/discussions)
309
- - πŸ”— [Original Model](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier)
310
-
311
- ---
312
-
313
- **Ready to supercharge your prompt classification? πŸš€**
314
-
315
- ```bash
316
- cd prompt-task-complexity-classifier-quantized
317
- poetry install
318
- poetry run prompt-classifier quantize
319
- ```
 
1
+ ---
2
+ license: apache-2.0
3
+ language: en
4
+ library_name: optimum
5
+ tags:
6
+ - onnx
7
+ - quantized
8
+ - text-classification
9
+ - nvidia
10
+ - nemotron
11
+ pipeline_tag: text-classification
12
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ # Quantized ONNX model for botirk/tiny-prompt-task-complexity-classifier
 
 
15
 
16
+ This repository contains the quantized ONNX version of the [nvidia/prompt-task-and-complexity-classifier](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier) model.
17
 
18
+ ## Model Description
 
 
 
 
 
19
 
20
+ This is a multi-headed model which classifies English text prompts across task types and complexity dimensions. This version has been quantized to `INT8` using dynamic quantization with the [πŸ€— Optimum](https://github.com/huggingface/optimum) library, resulting in a smaller footprint and faster CPU inference.
21
 
22
+ For more details on the model architecture, tasks, and complexity dimensions, please refer to the [original model card](https://huggingface.co/nvidia/prompt-task-and-complexity-classifier).
 
 
23
 
24
+ ## How to Use
 
 
25
 
26
+ You can use this model directly with `optimum.onnxruntime` for accelerated inference.
27
 
28
+ First, install the required libraries:
29
  ```bash
30
+ pip install optimum[onnxruntime] transformers
 
 
 
 
31
  ```
32
 
33
+ Then, you can use the model in a pipeline:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ```python
35
+ from optimum.onnxruntime import ORTModelForSequenceClassification
36
+ from transformers import AutoTokenizer, pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
+ repo_id = "botirk/tiny-prompt-task-complexity-classifier"
39
+ model = ORTModelForSequenceClassification.from_pretrained(repo_id)
40
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
41
 
42
+ # Note: The pipeline task is a simplification.
43
+ # For full multi-headed output, you need to process the logits manually.
44
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
 
 
45
 
46
+ prompt = "Write a mystery set in a small town where an everyday object goes missing."
47
+ results = classifier(prompt)
48
+ print(results)
 
 
 
 
 
 
 
 
 
49
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,81 +1,167 @@
1
  {
2
  "_name_or_path": "nvidia/prompt-task-and-complexity-classifier",
3
- "architectures": [
4
- "DebertaV2Model"
5
- ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "hidden_act": "gelu",
8
- "hidden_dropout_prob": 0.1,
9
- "hidden_size": 768,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  "initializer_range": 0.02,
11
- "intermediate_size": 3072,
12
- "layer_norm_eps": 1e-07,
13
- "max_position_embeddings": 512,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  "model_type": "deberta-v2",
15
- "num_attention_heads": 12,
16
- "num_hidden_layers": 12,
17
- "pad_token_id": 0,
18
- "pooler_dropout": 0.0,
19
- "pooler_hidden_act": "gelu",
20
- "pooler_hidden_size": 768,
21
- "position_biased_input": false,
22
- "position_buckets": 256,
23
- "relative_attention": true,
24
- "torch_dtype": "float32",
25
- "transformers_version": "4.21.0",
26
- "type_vocab_size": 0,
27
- "vocab_size": 128100,
 
 
 
 
 
 
 
 
 
28
  "target_sizes": {
29
- "task_type": 8,
30
- "creativity_scope": 5,
31
- "reasoning": 5,
32
- "contextual_knowledge": 5,
33
- "number_of_few_shots": 5,
34
- "domain_knowledge": 5,
35
- "no_label_reason": 5,
36
- "constraint_ct": 5
37
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  "task_type_map": {
39
- "0": "Open QA",
40
- "1": "Closed QA",
41
- "2": "Summarization",
42
- "3": "Text Generation",
 
 
43
  "4": "Code Generation",
44
- "5": "Chatbot",
45
- "6": "Classification",
46
- "7": "Rewrite",
47
- "8": "Brainstorming",
48
- "9": "Extraction",
49
- "10": "Other"
50
  },
 
 
 
 
51
  "weights_map": {
52
- "creativity_scope": [0.0, 0.25, 0.5, 0.75, 1.0],
53
- "reasoning": [0.0, 0.25, 0.5, 0.75, 1.0],
54
- "contextual_knowledge": [0.0, 0.25, 0.5, 0.75, 1.0],
55
- "number_of_few_shots": [0.0, 1.0, 2.0, 3.0, 4.0],
56
- "domain_knowledge": [0.0, 0.25, 0.5, 0.75, 1.0],
57
- "no_label_reason": [0.0, 0.25, 0.5, 0.75, 1.0],
58
- "constraint_ct": [0.0, 0.25, 0.5, 0.75, 1.0]
59
- },
60
- "divisor_map": {
61
- "creativity_scope": 1.0,
62
- "reasoning": 1.0,
63
- "contextual_knowledge": 1.0,
64
- "number_of_few_shots": 1.0,
65
- "domain_knowledge": 1.0,
66
- "no_label_reason": 1.0,
67
- "constraint_ct": 1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  },
69
- "quantized": true,
70
- "quantization_method": "dynamic",
71
  "framework": "onnx",
72
- "optimized_for": "cpu",
73
- "file_name": "model_quantized.onnx",
74
- "quantization_config": {
75
- "format": "QOperator",
76
- "mode": "IntegerOps",
77
- "activations_dtype": "QUInt8",
78
- "weights_dtype": "QInt8",
79
- "is_static": false
80
- }
81
- }
 
1
  {
2
  "_name_or_path": "nvidia/prompt-task-and-complexity-classifier",
3
+ "attn_config": {
4
+ "model_type": ""
5
+ },
6
+ "base_model": "microsoft/DeBERTa-v3-base",
7
+ "config_path": null,
8
+ "constraint_ct_map": {
9
+ "1.0": 0,
10
+ "Unknown": 1
11
+ },
12
+ "contextual_knowledge_map": {
13
+ "No": 0,
14
+ "Unknown": -1,
15
+ "Yes": 1
16
+ },
17
+ "creativity_score_map": {
18
+ "High": 0,
19
+ "Low": 1,
20
+ "No": 2
21
+ },
22
+ "d_model": 2048,
23
+ "divisor_map": {
24
+ "constraint_ct": 1,
25
+ "contextual_knowledge": 1,
26
+ "creativity_scope": 2,
27
+ "domain_knowledge": 3,
28
+ "no_label_reason": 1,
29
+ "number_of_few_shots": 1,
30
+ "reasoning": 1
31
+ },
32
+ "domain_knowledge_map": {
33
+ "High": 0,
34
+ "Low": 1,
35
+ "Medium": 2,
36
+ "No": 3
37
+ },
38
+ "drop_out": false,
39
+ "emb_pdrop": 0.0,
40
+ "embedding_fraction": 1.0,
41
+ "expansion_ratio": 4,
42
+ "fc_dropout": 0.2,
43
+ "init_device": "cpu",
44
  "initializer_range": 0.02,
45
+ "layer_norm_epsilon": 1e-05,
46
+ "learned_pos_emb": true,
47
+ "logit_scale": null,
48
+ "max_seq_len": 2048,
49
+ "model_output_type": {
50
+ "constraint_ct": "numeric",
51
+ "contextual_knowledge": "numeric",
52
+ "creativity_scope": "numeric",
53
+ "domain_knowledge": "numeric",
54
+ "no_label_reason": "numeric",
55
+ "number_of_few_shots": "numeric",
56
+ "prompt_complexity_score": "numeric",
57
+ "reasoning": "numeric",
58
+ "task_type_1": "string",
59
+ "task_type_2": "string",
60
+ "task_type_prob": "numeric"
61
+ },
62
  "model_type": "deberta-v2",
63
+ "n_heads": 16,
64
+ "n_layers": 24,
65
+ "no_bias": true,
66
+ "no_label_reason_map": {
67
+ "Unknown": 0
68
+ },
69
+ "norm_type": "low_precision_layernorm",
70
+ "number_of_few_shots_map": {
71
+ "0.0": 0,
72
+ "1.0": 1,
73
+ "2.0": 2,
74
+ "3.0": 3,
75
+ "4.0": 4,
76
+ "5.0": 5
77
+ },
78
+ "pretrained": true,
79
+ "reasoning_map": {
80
+ "No": 0,
81
+ "Unknown": -1,
82
+ "Yes": 1
83
+ },
84
+ "resid_pdrop": 0.0,
85
  "target_sizes": {
86
+ "constraint_ct": 2,
87
+ "contextual_knowledge": 2,
88
+ "creativity_scope": 3,
89
+ "domain_knowledge": 4,
90
+ "no_label_reason": 1,
91
+ "number_of_few_shots": 6,
92
+ "reasoning": 2,
93
+ "task_type": 12
94
  },
95
+ "targets": [
96
+ "task_type_1",
97
+ "task_type_2",
98
+ "task_type_prob",
99
+ "creativity_scope",
100
+ "reasoning",
101
+ "contextual_knowledge",
102
+ "number_of_few_shots",
103
+ "domain_knowledge",
104
+ "no_label_reason",
105
+ "constraint_ct",
106
+ "prompt_complexity_score"
107
+ ],
108
  "task_type_map": {
109
+ "0": "Brainstorming",
110
+ "1": "Chatbot",
111
+ "10": "Text Generation",
112
+ "11": "Unknown",
113
+ "2": "Classification",
114
+ "3": "Closed QA",
115
  "4": "Code Generation",
116
+ "5": "Extraction",
117
+ "6": "Open QA",
118
+ "7": "Other",
119
+ "8": "Rewrite",
120
+ "9": "Summarization"
 
121
  },
122
+ "transformers_version": "4.46.3",
123
+ "use_cache": false,
124
+ "verbose": 0,
125
+ "vocab_size": 50368,
126
  "weights_map": {
127
+ "constraint_ct": [
128
+ 1,
129
+ 0
130
+ ],
131
+ "contextual_knowledge": [
132
+ 0,
133
+ 1
134
+ ],
135
+ "creativity_scope": [
136
+ 2,
137
+ 1,
138
+ 0
139
+ ],
140
+ "domain_knowledge": [
141
+ 3,
142
+ 1,
143
+ 2,
144
+ 0
145
+ ],
146
+ "no_label_reason": [
147
+ 0
148
+ ],
149
+ "number_of_few_shots": [
150
+ 0,
151
+ 1,
152
+ 2,
153
+ 3,
154
+ 4,
155
+ 5
156
+ ],
157
+ "reasoning": [
158
+ 0,
159
+ 1
160
+ ]
161
  },
 
 
162
  "framework": "onnx",
163
+ "tags": [
164
+ "onnx",
165
+ "quantized"
166
+ ]
167
+ }
 
 
 
 
 
model_quantized.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:98a24d8927867b8c4d489ee6053d01213aa64d370175771bded2761a2954a8e8
3
- size 243526627
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36c58a6b89d72d22c9a67caebab6356673f13e6a3c743e54552878cf1557c3e0
3
+ size 243965613