schema-gen-llama3-8b-stage1-lora

Stage-1 Schema Induction LoRA fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using VERL FSDP + FlashAttention-2.

Context: up to 8,192 tokens
Training: 3 stages, 200 steps each (hard-capped)
Final eval (6,000 samples): loss=2.696815, ppl=14.8324

Inference (LoRA)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json

BASE = "meta-llama/Meta-Llama-3.1-8B-Instruct"
ADPT = "mohdusman001/schema-gen-llama3-8b-stage1-lora"

tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADPT)

SYSTEM = "You induce minimal JSON schemas from documents. Output strictly valid JSON with no commentary."

def build_user(document_text: str) -> str:
    # Minimal prompt used during training (policy+metadata were compressed into the text)
    return document_text.strip()

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user",   "content": build_user("Paste your unstructured text here.")}
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=300, temperature=0.0, do_sample=False)
text = tok.decode(out[0], skip_special_tokens=True)
print(text)  # JSON object with fields: [{name, type, required, ...}]

Inference (Merged weights, no PEFT)

If you prefer to use the merged model (no PEFT at load time), use the merged repo:

import torch, json
from transformers import AutoTokenizer, AutoModelForCausalLM

BASE = "mohdusman001/schema-gen-llama3-8b-stage1-merged"  # your merged repo id on the Hub

SYSTEM = "You induce minimal JSON schemas from documents. Output strictly valid JSON with no commentary."

def build_user(document_text: str) -> str:
    return document_text.strip()

def generate_schema(text: str, max_new_tokens: int = 320, base_repo: str = BASE) -> str:
    tok = AutoTokenizer.from_pretrained(base_repo, use_fast=True)
    if tok.pad_token is None:
        tok.pad_token = tok.eos_token

    model = AutoModelForCausalLM.from_pretrained(
        base_repo, torch_dtype=torch.bfloat16, device_map="auto"
    )

    messages = [
        {"role": "system", "content": SYSTEM},
        {"role": "user", "content": build_user(text)}
    ]

    if hasattr(tok, "apply_chat_template"):
        prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    else:
        prompt = f"<s>[SYSTEM]\n{SYSTEM}\n[/SYSTEM]\n[USER]\n{text}\n[/USER]\n"

    inputs = tok(prompt, return_tensors="pt")
    inputs = {k: v.to(model.device) for k, v in inputs.items()}

    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.0,
            do_sample=False,
            eos_token_id=tok.eos_token_id,
            pad_token_id=tok.pad_token_id,
        )

    return tok.decode(out[0], skip_special_tokens=True).strip()

Repository layout

lora_adapter/ – LoRA weights and PEFT config
eval/final_eval.json – final eval metrics (loss, ppl)
samples/generations.jsonl – 128 sample generations

Roadmap / Future slots

This repo is organized to accommodate a 2-stage pipeline and later RL (GRPO):

stage2_table_gen/ (placeholder) – Stage-2 text→table generation using the Stage-1 schema
rl_grpo/ (placeholder) – Reinforcement learning (Mona/GRPO) training artifacts and configs

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for mohdusman001/schema-gen-llama3-8b-stage1-lora

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1292)

this model

Evaluation results

loss on internal-eval
self-reported

2.697
perplexity on internal-eval
self-reported

14.832

View on Papers With Code