Model Card

Add more information here

Example Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, Conversation
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained('fineinstructions/query_templatizer_s1', revision=None) # Load tokenizer
tokenizer.padding_side = 'left'
base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B-Instruct', revision=None) # Load base model
model = PeftModel.from_pretrained(base_model, model_id='fineinstructions/query_templatizer_s1', revision=None) # Apply adapter
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, pad_token_id=tokenizer.pad_token_id, return_full_text=False)

inputs = ['Analyze the input query and generate a detailed, structured JSON output with a generalized template, a compatible document description, and comprehensive annotations (including task type, difficulty, reasoning, and domain tags) along with precise explanations for each. Include annotations for the query\'s `realistic` nature (mark as "true" if the query is something that a user would plausibly input into ChatGPT in a real-world scenario), `conversational` (mark as "true" if the query resembles casual dialogue, seeks informal interaction, and explicitly does not ask for any question to be answered or task to be completed), and `task_type_closed` (specify `text_generation` if the response requires generating natural language text or `code_generation` if it involves generating programming code). Additionally, provide explanations for the following annotations: `compatibility` refers to the estimate of how many documents in CommonCrawl would align with this template, which should be expressed in a logarithmic range. Values could range from 0% (no documents) up to 100% (billions of pages), with intermediate values like 0.0000001% (ones of pages), 0.000001% (tens of pages), 0.0001% (thousands of pages), 0.01% (hundreds of thousands of pages), or 1% (tens of millions of pages). `query_frequency` should estimate the percentage of queries to ChatGPT or the ChatGPT API that would resemble the query in this template, also in a logarithmic range. Values could range from 0% (no requests) up to 100% (billions of requests), with intermediate values like 0.0000001% (ones of requests), 0.00001% (hundreds of requests), or 0.001% (tens of thousands of requests). Finally, annotate `is_few_shot` (mark as "true" if the query is designed to include examples or prompts for generating outputs based on patterns).\n\nInput Query:\nok now can you give 3 very speculative ideas on how to achieve unidirectional movement that results in more energy than input using magnets and/or ZPF extraction, as simple setups?']
prompts = [tokenizer.apply_chat_template([{'role': 'user', 'content': i}], tokenize=False, add_generation_prompt=True) for i in inputs]
print(pipe(prompts, max_length=131072, do_sample=False))

This model was trained with a synthetic dataset with DataDreamer 🤖💤. The synthetic dataset card and model card can be found here. The training arguments can be found here.

Downloads last month
15
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for fineinstructions/query_templatizer_s1

Adapter
(162)
this model

Dataset used to train fineinstructions/query_templatizer_s1