ByteCNN-10K: Ultra-Lightweight Toxicity Detection
Ultra-compact single-layer CNN for toxicity detection, optimized for maximum speed on edge deployment.
Model Details
- Architecture: Single-layer ByteCNN (Embedding β Conv1D + BatchNorm β Dense β Dense)
- Parameters: 10,009 (~40KB)
- Input: Raw byte sequences (max 512 bytes)
- Output: Toxicity probability (0-1)
- Optimization: 72% parameter reduction from original 36K model
Performance
- Validation Accuracy: 78.97%
- Training Dataset: Full balanced dataset (222,628 samples)
- Efficiency: 7.89% accuracy per 1K parameters (best efficiency in sweep)
- Inference Speed: <1ms on Cloudflare Workers
- CPU Limits: Guaranteed to stay under edge compute constraints
Architecture Configuration
- Embedding: 256 vocab β 12 dimensions
- Conv Layer: 12 β 40 filters, kernel=3
- Dense Layer: 40 β 128 hidden units
- Output: 128 β 1 (sigmoid activation)
Training Details
- Trained on balanced dataset (50/50 toxic/safe ratio)
- 222,628 total samples from multiple sources
- AdamW optimizer with weight decay 0.01
- Learning rate: 0.001 with ReduceLROnPlateau
- 10 epochs, batch size 128
Parameter Sweep Results
Comparison across different model sizes:
Model | Parameters | Accuracy | Efficiency (Acc/1K params) | Trade-off |
---|---|---|---|---|
Original | 36,257 | 81.20% | 2.24% | Baseline |
10K | 10,009 | 78.97% | 7.89% | 72% fewer params |
15K | 14,985 | 80.98% | 5.40% | 59% fewer params |
20K | 20,009 | 78.76% | 3.94% | 45% fewer params |
The 10K model offers the best parameter efficiency with minimal accuracy loss.
Contextual Understanding
Despite its compact size, the model demonstrates sophisticated toxicity detection:
- "fuck you" β 87.28% toxic (direct personal attack)
- "get fucked!" β 17.73% safe (potentially playful/dismissive)
- "Hello everyone!" β 6.65% safe (clearly benign)
Usage
Input text is converted to UTF-8 bytes, truncated/padded to 512 bytes, then processed through the CNN layers.
Deployment
Optimized for edge deployment with:
- BatchNorm folding for inference
- Static weight embedding (211KB)
- Sub-1ms inference time
- Zero CPU timeout issues on Cloudflare Workers
Live Demo
- API: https://bytecnn-demo.mitch-336.workers.dev/api/classify
- Status: https://bytecnn-demo.mitch-336.workers.dev/api/status
Limitations
- English text optimized
- 512 byte context window
- Binary classification only (toxic/safe)
- 2.23% accuracy trade-off vs original for 72% size reduction
Model Card
This model represents the optimal balance of speed and quality for production edge deployment.
- Downloads last month
- 9