ByteCNN-10K: Ultra-Lightweight Toxicity Detection

Ultra-compact single-layer CNN for toxicity detection, optimized for maximum speed on edge deployment.

Model Details

  • Architecture: Single-layer ByteCNN (Embedding β†’ Conv1D + BatchNorm β†’ Dense β†’ Dense)
  • Parameters: 10,009 (~40KB)
  • Input: Raw byte sequences (max 512 bytes)
  • Output: Toxicity probability (0-1)
  • Optimization: 72% parameter reduction from original 36K model

Performance

  • Validation Accuracy: 78.97%
  • Training Dataset: Full balanced dataset (222,628 samples)
  • Efficiency: 7.89% accuracy per 1K parameters (best efficiency in sweep)
  • Inference Speed: <1ms on Cloudflare Workers
  • CPU Limits: Guaranteed to stay under edge compute constraints

Architecture Configuration

  • Embedding: 256 vocab β†’ 12 dimensions
  • Conv Layer: 12 β†’ 40 filters, kernel=3
  • Dense Layer: 40 β†’ 128 hidden units
  • Output: 128 β†’ 1 (sigmoid activation)

Training Details

  • Trained on balanced dataset (50/50 toxic/safe ratio)
  • 222,628 total samples from multiple sources
  • AdamW optimizer with weight decay 0.01
  • Learning rate: 0.001 with ReduceLROnPlateau
  • 10 epochs, batch size 128

Parameter Sweep Results

Comparison across different model sizes:

Model Parameters Accuracy Efficiency (Acc/1K params) Trade-off
Original 36,257 81.20% 2.24% Baseline
10K 10,009 78.97% 7.89% 72% fewer params
15K 14,985 80.98% 5.40% 59% fewer params
20K 20,009 78.76% 3.94% 45% fewer params

The 10K model offers the best parameter efficiency with minimal accuracy loss.

Contextual Understanding

Despite its compact size, the model demonstrates sophisticated toxicity detection:

  • "fuck you" β†’ 87.28% toxic (direct personal attack)
  • "get fucked!" β†’ 17.73% safe (potentially playful/dismissive)
  • "Hello everyone!" β†’ 6.65% safe (clearly benign)

Usage

Input text is converted to UTF-8 bytes, truncated/padded to 512 bytes, then processed through the CNN layers.

Deployment

Optimized for edge deployment with:

  • BatchNorm folding for inference
  • Static weight embedding (211KB)
  • Sub-1ms inference time
  • Zero CPU timeout issues on Cloudflare Workers

Live Demo

Limitations

  • English text optimized
  • 512 byte context window
  • Binary classification only (toxic/safe)
  • 2.23% accuracy trade-off vs original for 72% size reduction

Model Card

This model represents the optimal balance of speed and quality for production edge deployment.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support