You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

TRIDENT

TRIDENT is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through algorithmic self-improvement, rather than parameter scaling.

The model is built on Qwen3-4B and enhanced using the TRIDENT framework: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.


Overview

Traditional large language model training depends on:

  • Human-written reasoning traces
  • Manually curated preference datasets
  • Static fine-tuning pipelines

TRIDENT removes these dependencies.

Instead, the model:

  1. Explores multiple reasoning paths
  2. Evaluates them using a learned GNN policy
  3. Selects high-uncertainty problems automatically
  4. Generates its own training supervision
  5. Distills improvements back into the model using LoRA

model-index:

  • name: TRIDENT results:
    • task: type: text-generation dataset: name: GSM8K type: gsm8k split: test metrics:
      • type: accuracy value: 86.58
    • task: type: text-generation dataset: name: MMLU type: mmlu split: test metrics:
      • type: accuracy value: 72.61
    • task: type: text-generation dataset: name: GPQA type: gpqa split: test metrics:
      • type: accuracy value: 42.42
    • task: type: text-generation dataset: name: ARC-Challenge type: arc-challenge split: test metrics:
      • type: accuracy value: 59.0

Core Capabilities

GNN-Guided Tree-of-Thoughts

Reasoning is represented as a directed graph of intermediate states.
A 3-layer Graph Convolutional Network predicts a promise score for each branch, guiding exploration and pruning.

Multi-Agent Reasoning

Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.

Variance-Based Curriculum

Problems are selected for training based on reward variance, targeting examples where the model is inconsistent and learning signal is highest.

Self-Generative Reasoning Loop

No human-annotated reasoning traces are used.
The model autonomously generates, evaluates, and curates its own reasoning data.

Stable Training

A multi-layer reward stabilization mechanism prevents:

  • Reward collapse
  • Loss explosions
  • Infinite failure loops

The architecture is compatible with future GRPO-style reinforcement learning.




Benchmark Results

Accuracy comparison against the base model:

Benchmark Qwen3-4B TRIDENT
GSM8K (5-shot) 74.14 86.58
MMLU (5-shot) 47.70 72.61
ARC-C (25-shot) 54.0 59.0
GPQA (0-shot) 28.28 42.42
Winogrande (0-shot) 59.6 67.08
TruthfulQA (0-shot) 54.9 54.7

Highlight:
+14.14 percentage point improvement on GPQA (0-shot).


Intended Use

TRIDENT is suitable for:

  • Multi-step mathematical reasoning
  • Scientific and logical inference
  • Hard QA benchmarks
  • Planning and hypothesis exploration
  • Research on reasoning systems

Limitations

  • Higher inference-time compute than single-pass models
  • Not optimized for low-latency chat
  • Best used where reasoning depth matters more than speed

Ethical Considerations

  • No human-written reasoning traces used
  • No preference data collection
  • Training relies on verifiable task rewards
  • Like all LLMs, may hallucinate outside structured reasoning workflows

Paper link

https://www.shivik.in/shivik-labs/trident

Citation

@article{puri2025trident,
  title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
  author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
  year={2025}
}
Downloads last month
465
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for shiviktech/Trident

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(367)
this model

Dataset used to train shiviktech/Trident