Model Card

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, Conversation
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained('fineinstructions/query_templatizer_s1', revision=None) # Load tokenizer
tokenizer.padding_side = 'left'
base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B-Instruct', revision=None) # Load base model
model = PeftModel.from_pretrained(base_model, model_id='fineinstructions/query_templatizer_s1', revision=None) # Apply adapter
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)

inputs = ['Analyze the input query and generate a detailed, structured JSON output with a generalized template, a compatible document description, and comprehensive annotations (including task type, difficulty, reasoning, and domain tags) along with precise explanations for each. Include annotations for the query\'s `realistic` nature (mark as "true" if the query is something that a user would plausibly input into ChatGPT in a real-world scenario), `conversational` (mark as "true" if the query resembles casual dialogue, seeks informal interaction, and explicitly does not ask for any question to be answered or task to be completed), and `task_type_closed` (specify `text_generation` if the response requires generating natural language text or `code_generation` if it involves generating programming code). Additionally, provide explanations for the following annotations: `compatibility` refers to the estimate of how many documents in CommonCrawl would align with this template, which should be expressed in a logarithmic range. Values could range from 0% (no documents) up to 100% (billions of pages), with intermediate values like 0.0000001% (ones of pages), 0.000001% (tens of pages), 0.0001% (thousands of pages), 0.01% (hundreds of thousands of pages), or 1% (tens of millions of pages). `query_frequency` should estimate the percentage of queries to ChatGPT or the ChatGPT API that would resemble the query in this template, also in a logarithmic range. Values could range from 0% (no requests) up to 100% (billions of requests), with intermediate values like 0.0000001% (ones of requests), 0.00001% (hundreds of requests), or 0.001% (tens of thousands of requests). Finally, annotate `is_few_shot` (mark as "true" if the query is designed to include examples or prompts for generating outputs based on patterns).\n\nInput Query:\nok now can you give 3 very speculative ideas on how to achieve unidirectional movement that results in more energy than input using magnets and/or ZPF extraction, as simple setups?']
prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
print(pipe(prompts, max_length=131072, do_sample=False))

This model was trained with a synthetic dataset with DataDreamer 🤖💤. The synthetic dataset card and model card can be found here. The training arguments can be found here.

fineinstructions
/

query_templatizer_s1

Model Card

Example Usage

Model tree for fineinstructions/query_templatizer_s1

Dataset used to train fineinstructions/query_templatizer_s1