shellzero/gemma2-2b-ft-law-data-tag-generation
This model was converted to MLX format from google/gemma-7b-it
.
Refer to the original model card for more details on the model.
pip install mlx-lm
The model was LoRA fine-tuned on the ymoslem/Law-StackExchange and Synthetic data generated from
GPT-4o and GPT-35-Turbo using the format below, for 1500 steps using mlx
.
This fine tune was one of the best runs with our data and achieved high F1 score on our eval dataset. (Part of the Nvidia hackathon)
def format_prompt(system_prompt: str, title: str, question: str) -> str:
"Format the question to the format of the dataset we fine-tuned to."
return """<bos><start_of_turn>user
## Instructions
{}
## User
TITLE:
{}
QUESTION:
{}<end_of_turn>
<start_of_turn>model
""".format(
system_prompt, title, question
)
Here's an example of the system_prompt from the dataset:
Read the following title and question about a legal issue and assign the most appropriate tag to it. All tags must be in lowercase, ordered lexicographically and separated by commas.
Loading the model using mlx_lm
from mlx_lm import generate, load
model, tokenizer = load("shellzero/gemma2-2b-ft-law-data-tag-generation")
response = generate(
model,
tokenizer,
prompt=format_prompt(system_prompt, question),
verbose=True, # Set to True to see the prompt and response
temp=0.0,
max_tokens=32,
)
- Downloads last month
- 17
Model tree for shellzero/gemma2-2b-ft-law-data-tag-generation
Base model
google/gemma-2-2b