You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

TRIDENT

TRIDENT is a reasoning-focused 4B-parameter language model that improves its own reasoning capability through algorithmic self-improvement, rather than parameter scaling.

The model is built on Qwen3-4B and enhanced using the TRIDENT framework: a combination of GNN-guided Tree-of-Thoughts search, multi-agent reasoning policies, and variance-based self-training.

Overview

Traditional large language model training depends on:

Human-written reasoning traces
Manually curated preference datasets
Static fine-tuning pipelines

TRIDENT removes these dependencies.

Instead, the model:

Explores multiple reasoning paths
Evaluates them using a learned GNN policy
Selects high-uncertainty problems automatically
Generates its own training supervision
Distills improvements back into the model using LoRA

model-index:

name: TRIDENT results:
- task: type: text-generation dataset: name: GSM8K type: gsm8k split: test metrics:
  - type: accuracy value: 86.58
- task: type: text-generation dataset: name: MMLU type: mmlu split: test metrics:
  - type: accuracy value: 72.61
- task: type: text-generation dataset: name: GPQA type: gpqa split: test metrics:
  - type: accuracy value: 42.42
- task: type: text-generation dataset: name: ARC-Challenge type: arc-challenge split: test metrics:
  - type: accuracy value: 59.0

Core Capabilities

GNN-Guided Tree-of-Thoughts

Reasoning is represented as a directed graph of intermediate states.
A 3-layer Graph Convolutional Network predicts a promise score for each branch, guiding exploration and pruning.

Multi-Agent Reasoning

Four internal agents (Conservative, Exploratory, Balanced, Reflective) vote on reasoning actions to balance exploration and correctness.

Variance-Based Curriculum

Problems are selected for training based on reward variance, targeting examples where the model is inconsistent and learning signal is highest.

Self-Generative Reasoning Loop

No human-annotated reasoning traces are used.
The model autonomously generates, evaluates, and curates its own reasoning data.

Stable Training

A multi-layer reward stabilization mechanism prevents:

Reward collapse
Loss explosions
Infinite failure loops

The architecture is compatible with future GRPO-style reinforcement learning.

Benchmark Results

Accuracy comparison against the base model:

Benchmark	Qwen3-4B	TRIDENT
GSM8K (5-shot)	74.14	86.58
MMLU (5-shot)	47.70	72.61
ARC-C (25-shot)	54.0	59.0
GPQA (0-shot)	28.28	42.42
Winogrande (0-shot)	59.6	67.08
TruthfulQA (0-shot)	54.9	54.7

Highlight:
+14.14 percentage point improvement on GPQA (0-shot).

Intended Use

TRIDENT is suitable for:

Multi-step mathematical reasoning
Scientific and logical inference
Hard QA benchmarks
Planning and hypothesis exploration
Research on reasoning systems

Limitations

Higher inference-time compute than single-pass models
Not optimized for low-latency chat
Best used where reasoning depth matters more than speed

Ethical Considerations

No human-written reasoning traces used
No preference data collection
Training relies on verifiable task rewards
Like all LLMs, may hallucinate outside structured reasoning workflows

Paper link

https://www.shivik.in/shivik-labs/trident

Citation

@article{puri2025trident,
  title={TRIDENT: Thought-based Reasoning and Improvement through Deep Exploration of Neuronal Trees},
  author={Puri, Shivansh and Khandelwal, Abhisek and Joshi, Vedant and Yadav, Akash},
  year={2025}
}

Downloads last month: 1,830

Safetensors

Model size

4B params

Tensor type

F16

Model tree for shiviktech/Trident

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B