Patagpt-1.0 Multi-Task Address Reasoning Model

Address Intelligence Multi-Task Chain of Thought

Overview

Patagpt-1.0 is a specialized 1.7B parameter language model developed by Shiprocket AI, optimized for Indian address intelligence and geographic reasoning. Built on Qwen3-1.7B architecture with advanced LoRA fine-tuning, this model excels at address correction, component extraction, and geographic Q&A with sophisticated Chain of Thought reasoning.

🚧 This is an experimental release focused on Indian address processing. We welcome feedback and contributions.


🎯 Key Features

  • Architecture: 1.7B parameter Qwen3-based model with optimized LoRA adapters
  • Multi-Task Excellence: Handles address correction, completion, and geographic Q&A in a single model
  • High Accuracy: Trained on 500K+ multi-task conversations with 371% data retention
  • Token Efficiency: Optimized for 1024 token sequences with efficient reasoning
  • Fast Training: 54.7 hours on NVIDIA A100 MIG with Unsloth optimizations
  • Lightweight: Only 1.7B model size for easy deployment

πŸ› οΈ Training Details

  • Training Data: 500K+ problem-solution pairs from Indian address datasets
  • Hardware: NVIDIA A100-SXM4-80GB MIG 7g.80gb
  • Training Pipeline:
    • Enhanced Data Preparation (70% address correction + 30% geographic Q&A)
    • LoRA Fine-Tuning with Unsloth optimizations
    • Multi-Task Learning with Chain of Thought reasoning

πŸ”€ Model Architecture

We built upon the efficient Qwen3-1.7B base model with optimized LoRA configuration:

  • LoRA Rank: 32 for balanced efficiency and performance
  • LoRA Alpha: 64 for stable training dynamics
  • Target Modules: All projection layers (q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj)
  • Dropout: 0.1 for regularization

πŸ“š Data Curation Strategy

We processed comprehensive Indian address data with aggressive quality filtering:

  • Source: Address NER dataset with structured geographic components
  • Multi-task Enhancement: Generated Q&A pairs from address components
  • Quality Control: Removed incomplete, non-English, and low-quality samples
  • Augmentation: 371% data retention through intelligent conversation generation

Final Dataset:

  • 500K+ enhanced conversations
  • Multi-task distribution: Address correction + Geographic Q&A
  • Quality assurance: Manual verification and automated filtering

🎯 Training Configuration

Advanced training setup optimized for address intelligence:

  • Framework: Unsloth 2025.6.12 with performance optimizations
  • Precision: bfloat16 for Qwen3 compatibility
  • Learning Rate: Cosine scheduler with warmup
  • Optimizer: AdamW with Unsloth enhancements
  • Training Throughput: 5.602 samples/second

πŸ“Š Performance Highlights

Core Capabilities

The model demonstrates exceptional performance across multiple address-related tasks:

πŸ”§ Address Correction

  • Fixes spelling errors and formatting inconsistencies
  • Infers missing components (pincode, state, locality)
  • Provides detailed reasoning for each correction

πŸ“Š Component Extraction

  • Structured extraction of building, locality, city, state, pincode
  • Handles complex, unformatted address strings
  • Maintains hierarchical relationships

❓ Geographic Q&A

  • Answers location-based questions with high accuracy
  • Covers state-city relationships, tier classifications
  • Provides contextual geographic knowledge

Training Metrics

Final Training Loss: 0.357
Training Duration: 54.7 hours
Training Throughput: 5.602 samples/second
Hardware Efficiency: Optimized A100 MIG utilization
Framework: Unsloth 2025.6.12

πŸ’‘ Quick Start

πŸ§ͺ Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load merged model (no PEFT needed!)
model_name = "shiprocket-ai/Patagpt-1.0"

# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load merged model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model.eval()

def process_address(messages, max_new_tokens=400):
    """Process address with Chain of Thought reasoning"""
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    )
    
    device = next(model.parameters()).device
    inputs = inputs.to(device)
    
    with torch.no_grad():
        outputs = model.generate(
            input_ids=inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.1,
            do_sample=True,
            use_cache=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    response = tokenizer.decode(
        outputs[0][len(inputs[0]):],
        skip_special_tokens=True
    ).strip()
    
    return response

# Example usage
messages = [
    {"role": "user", "content": "Fix and extract components from this address: sec 14 gurugram haryana 122001"}
]

result = process_address(messages)
print(result)

⚑ Task-Specific Functions

def fix_address(address):
    """Address correction with reasoning"""
    messages = [{"role": "user", "content": f"Fix and extract components from this address: {address}"}]
    return process_address(messages)

def answer_geographic_question(question):
    """Geographic Q&A"""
    messages = [{"role": "user", "content": question}]
    return process_address(messages, max_new_tokens=150)

def extract_components(address):
    """Component extraction"""
    messages = [{"role": "user", "content": f"Extract all components from this address: {address}"}]
    return process_address(messages, max_new_tokens=200)

# Example usage
print(fix_address("koramangala bangalor 560095"))
print(answer_geographic_question("Which state is Mumbai in?"))
print(extract_components("Flat 203, Emerald Heights, Sector 15, Gurugram, Haryana 122001"))

πŸ”§ Intended Use

Primary Use Cases:

  • E-commerce address validation and correction
  • Logistics address standardization and component extraction
  • Customer support automation for location queries
  • Geographic data processing and migration
  • Address intelligence and analytics

Supported Tasks:

  • Address Correction: Fix errors, complete missing information
  • Component Extraction: Structure unformatted addresses
  • Geographic Q&A: Answer location-based questions
  • Address Standardization: Convert to consistent formats

πŸš€ Advanced Features

Chain of Thought Reasoning

The model provides detailed step-by-step analysis:

Input: "sec 14 gurgoan haryana 122001"

Reasoning:
1. Identifies "sec 14" as "Sector 14"
2. Corrects "gurgoan" to "Gurgaon" 
3. Recognizes complete state and pincode
4. Structures into JSON format

Multi-Task Learning

Single model handles diverse address-related tasks:

  • Correction: Spelling and formatting fixes
  • Completion: Infer missing components
  • Extraction: Structure complex addresses
  • Q&A: Geographic knowledge queries

Optimization Features

  • Efficient Architecture: Qwen3-1.7B base with LoRA adapters
  • Memory Optimized: bfloat16 precision for efficiency
  • Fast Inference: Unsloth optimizations for speed
  • Scalable Deployment: Lightweight adapter-only model

⚠️ Limitations & Considerations

  • Geographic Scope: Optimized specifically for Indian addresses and geography
  • Model Size: 1.7B parameters - efficient but with inherent size limitations
  • Training Domain: Specialized for address intelligence, may not generalize to other domains
  • Framework Requirements: Requires compatible transformers, peft, and optionally unsloth

πŸ“‹ Technical Specifications

Model Architecture

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Fine-tuning: LoRA with rank 32, alpha 64
  • Adapter Size: ~216MB
  • Context Length: 1024 tokens
  • Precision: bfloat16

Training Details

  • Dataset: 500K+ multi-task conversations
  • Hardware: NVIDIA A100-SXM4-80GB MIG
  • Duration: 54.7 hours intensive training
  • Framework: Unsloth 2025.6.12
  • Optimization: AdamW with cosine scheduling

🀝 Citation

If you use this model in your research or applications:

@misc{Patagpt-1.0,
  title = {Patagpt-1.0 Multi-Task Address Reasoning Model},
  author = {Shiprocket AI},
  year = {2025},
  publisher = {HuggingFace},
  note = {Specialized model for Indian address intelligence with Chain of Thought reasoning},
  url = {https://huggingface.co/shiprocket-ai/Patagpt-1.0}
}

πŸ“ž Support

For questions, issues, or collaboration:

  • Repository: Open issues for technical problems
  • Contact: Shiprocket AI team
  • Documentation: Comprehensive examples provided above

Based on Qwen3-1.7B architecture with Unsloth optimizations β€’ Built for Indian address intelligence

Downloads last month
13
Safetensors
Model size
1.72B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shiprocket-ai/Patagpt-1.0

Finetuned
(42)
this model
Quantizations
1 model