schema-gen-llama3-8b-stage1-lora
Stage-1 Schema Induction LoRA fine-tuned from meta-llama/Meta-Llama-3.1-8B-Instruct using VERL FSDP + FlashAttention-2.
- Context: up to 8,192 tokens
- Training: 3 stages, 200 steps each (hard-capped)
- Final eval (6,000 samples): loss=2.696815, ppl=14.8324
Inference (LoRA)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch, json
BASE = "meta-llama/Meta-Llama-3.1-8B-Instruct"
ADPT = "mohdusman001/schema-gen-llama3-8b-stage1-lora"
tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADPT)
SYSTEM = "You induce minimal JSON schemas from documents. Output strictly valid JSON with no commentary."
def build_user(document_text: str) -> str:
# Minimal prompt used during training (policy+metadata were compressed into the text)
return document_text.strip()
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": build_user("Paste your unstructured text here.")}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=300, temperature=0.0, do_sample=False)
text = tok.decode(out[0], skip_special_tokens=True)
print(text) # JSON object with fields: [{name, type, required, ...}]
Inference (Merged weights, no PEFT)
If you prefer to use the merged model (no PEFT at load time), use the merged repo:
import torch, json
from transformers import AutoTokenizer, AutoModelForCausalLM
BASE = "mohdusman001/schema-gen-llama3-8b-stage1-merged" # your merged repo id on the Hub
SYSTEM = "You induce minimal JSON schemas from documents. Output strictly valid JSON with no commentary."
def build_user(document_text: str) -> str:
return document_text.strip()
def generate_schema(text: str, max_new_tokens: int = 320, base_repo: str = BASE) -> str:
tok = AutoTokenizer.from_pretrained(base_repo, use_fast=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
model = AutoModelForCausalLM.from_pretrained(
base_repo, torch_dtype=torch.bfloat16, device_map="auto"
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": build_user(text)}
]
if hasattr(tok, "apply_chat_template"):
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
else:
prompt = f"<s>[SYSTEM]\n{SYSTEM}\n[/SYSTEM]\n[USER]\n{text}\n[/USER]\n"
inputs = tok(prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.0,
do_sample=False,
eos_token_id=tok.eos_token_id,
pad_token_id=tok.pad_token_id,
)
return tok.decode(out[0], skip_special_tokens=True).strip()
Repository layout
lora_adapter/– LoRA weights and PEFT configeval/final_eval.json– final eval metrics (loss, ppl)samples/generations.jsonl– 128 sample generations
Roadmap / Future slots
This repo is organized to accommodate a 2-stage pipeline and later RL (GRPO):
stage2_table_gen/(placeholder) – Stage-2 text→table generation using the Stage-1 schemarl_grpo/(placeholder) – Reinforcement learning (Mona/GRPO) training artifacts and configs
Model tree for mohdusman001/schema-gen-llama3-8b-stage1-lora
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct
Evaluation results
- loss on internal-evalself-reported2.697
- perplexity on internal-evalself-reported14.832