Patagpt-1.0 Multi-Task Address Reasoning Model
Overview
Patagpt-1.0 is a specialized 1.7B parameter language model developed by Shiprocket AI, optimized for Indian address intelligence and geographic reasoning. Built on Qwen3-1.7B architecture with advanced LoRA fine-tuning, this model excels at address correction, component extraction, and geographic Q&A with sophisticated Chain of Thought reasoning.
π§ This is an experimental release focused on Indian address processing. We welcome feedback and contributions.
π― Key Features
- Architecture: 1.7B parameter Qwen3-based model with optimized LoRA adapters
- Multi-Task Excellence: Handles address correction, completion, and geographic Q&A in a single model
- High Accuracy: Trained on 500K+ multi-task conversations with 371% data retention
- Token Efficiency: Optimized for 1024 token sequences with efficient reasoning
- Fast Training: 54.7 hours on NVIDIA A100 MIG with Unsloth optimizations
- Lightweight: Only 1.7B model size for easy deployment
π οΈ Training Details
- Training Data: 500K+ problem-solution pairs from Indian address datasets
- Hardware: NVIDIA A100-SXM4-80GB MIG 7g.80gb
- Training Pipeline:
- Enhanced Data Preparation (70% address correction + 30% geographic Q&A)
- LoRA Fine-Tuning with Unsloth optimizations
- Multi-Task Learning with Chain of Thought reasoning
π Model Architecture
We built upon the efficient Qwen3-1.7B base model with optimized LoRA configuration:
- LoRA Rank: 32 for balanced efficiency and performance
- LoRA Alpha: 64 for stable training dynamics
- Target Modules: All projection layers (q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj)
- Dropout: 0.1 for regularization
π Data Curation Strategy
We processed comprehensive Indian address data with aggressive quality filtering:
- Source: Address NER dataset with structured geographic components
- Multi-task Enhancement: Generated Q&A pairs from address components
- Quality Control: Removed incomplete, non-English, and low-quality samples
- Augmentation: 371% data retention through intelligent conversation generation
Final Dataset:
- 500K+ enhanced conversations
- Multi-task distribution: Address correction + Geographic Q&A
- Quality assurance: Manual verification and automated filtering
π― Training Configuration
Advanced training setup optimized for address intelligence:
- Framework: Unsloth 2025.6.12 with performance optimizations
- Precision: bfloat16 for Qwen3 compatibility
- Learning Rate: Cosine scheduler with warmup
- Optimizer: AdamW with Unsloth enhancements
- Training Throughput: 5.602 samples/second
π Performance Highlights
Core Capabilities
The model demonstrates exceptional performance across multiple address-related tasks:
π§ Address Correction
- Fixes spelling errors and formatting inconsistencies
- Infers missing components (pincode, state, locality)
- Provides detailed reasoning for each correction
π Component Extraction
- Structured extraction of building, locality, city, state, pincode
- Handles complex, unformatted address strings
- Maintains hierarchical relationships
β Geographic Q&A
- Answers location-based questions with high accuracy
- Covers state-city relationships, tier classifications
- Provides contextual geographic knowledge
Training Metrics
Final Training Loss: 0.357
Training Duration: 54.7 hours
Training Throughput: 5.602 samples/second
Hardware Efficiency: Optimized A100 MIG utilization
Framework: Unsloth 2025.6.12
π‘ Quick Start
π§ͺ Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load merged model (no PEFT needed!)
model_name = "shiprocket-ai/Patagpt-1.0"
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
# Load merged model
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
model.eval()
def process_address(messages, max_new_tokens=400):
"""Process address with Chain of Thought reasoning"""
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
device = next(model.parameters()).device
inputs = inputs.to(device)
with torch.no_grad():
outputs = model.generate(
input_ids=inputs,
max_new_tokens=max_new_tokens,
temperature=0.1,
do_sample=True,
use_cache=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(
outputs[0][len(inputs[0]):],
skip_special_tokens=True
).strip()
return response
# Example usage
messages = [
{"role": "user", "content": "Fix and extract components from this address: sec 14 gurugram haryana 122001"}
]
result = process_address(messages)
print(result)
β‘ Task-Specific Functions
def fix_address(address):
"""Address correction with reasoning"""
messages = [{"role": "user", "content": f"Fix and extract components from this address: {address}"}]
return process_address(messages)
def answer_geographic_question(question):
"""Geographic Q&A"""
messages = [{"role": "user", "content": question}]
return process_address(messages, max_new_tokens=150)
def extract_components(address):
"""Component extraction"""
messages = [{"role": "user", "content": f"Extract all components from this address: {address}"}]
return process_address(messages, max_new_tokens=200)
# Example usage
print(fix_address("koramangala bangalor 560095"))
print(answer_geographic_question("Which state is Mumbai in?"))
print(extract_components("Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001"))
π§ Intended Use
Primary Use Cases:
- E-commerce address validation and correction
- Logistics address standardization and component extraction
- Customer support automation for location queries
- Geographic data processing and migration
- Address intelligence and analytics
Supported Tasks:
- Address Correction: Fix errors, complete missing information
- Component Extraction: Structure unformatted addresses
- Geographic Q&A: Answer location-based questions
- Address Standardization: Convert to consistent formats
π Advanced Features
Chain of Thought Reasoning
The model provides detailed step-by-step analysis:
Input: "sec 14 gurgoan haryana 122001"
Reasoning:
1. Identifies "sec 14" as "Sector 14"
2. Corrects "gurgoan" to "Gurgaon"
3. Recognizes complete state and pincode
4. Structures into JSON format
Multi-Task Learning
Single model handles diverse address-related tasks:
- Correction: Spelling and formatting fixes
- Completion: Infer missing components
- Extraction: Structure complex addresses
- Q&A: Geographic knowledge queries
Optimization Features
- Efficient Architecture: Qwen3-1.7B base with LoRA adapters
- Memory Optimized: bfloat16 precision for efficiency
- Fast Inference: Unsloth optimizations for speed
- Scalable Deployment: Lightweight adapter-only model
β οΈ Limitations & Considerations
- Geographic Scope: Optimized specifically for Indian addresses and geography
- Model Size: 1.7B parameters - efficient but with inherent size limitations
- Training Domain: Specialized for address intelligence, may not generalize to other domains
- Framework Requirements: Requires compatible transformers, peft, and optionally unsloth
π Technical Specifications
Model Architecture
- Base Model: Qwen/Qwen3-1.7B-Base
- Fine-tuning: LoRA with rank 32, alpha 64
- Adapter Size: ~216MB
- Context Length: 1024 tokens
- Precision: bfloat16
Training Details
- Dataset: 500K+ multi-task conversations
- Hardware: NVIDIA A100-SXM4-80GB MIG
- Duration: 54.7 hours intensive training
- Framework: Unsloth 2025.6.12
- Optimization: AdamW with cosine scheduling
π€ Citation
If you use this model in your research or applications:
@misc{Patagpt-1.0,
title = {Patagpt-1.0 Multi-Task Address Reasoning Model},
author = {Shiprocket AI},
year = {2025},
publisher = {HuggingFace},
note = {Specialized model for Indian address intelligence with Chain of Thought reasoning},
url = {https://huggingface.co/shiprocket-ai/Patagpt-1.0}
}
π Support
For questions, issues, or collaboration:
- Repository: Open issues for technical problems
- Contact: Shiprocket AI team
- Documentation: Comprehensive examples provided above
Based on Qwen3-1.7B architecture with Unsloth optimizations β’ Built for Indian address intelligence
- Downloads last month
- 13