You need to agree to share your contact information to access this model

Privacy Notice: Your data will be stored securely and used only for research and communication purposes related to this model. You may request deletion at any time. The data collected in this form will be processed in accordance with the General Data Protection Law (LGPD) and Neuralmind's Privacy Policy. By clicking Submit, I declare that I have read and understood the provisions contained in Neuralmind's Privacy Policy and accept the terms of the license By submitting this form, I confirm that I have read and understood the above.

Model Access Request Form

You are requesting access to a model fine-tuned from Qwen2.5-7B-Instruct, originally licensed under the Apache License, Version 2.0. The model is being released for research and development purposes, and while the license permits broad usage, we encourage responsible, ethical, and transparent application, especially given its potential legal domain impact. Please confirm the following:
I understand that this model is licensed under the Apache 2.0 License, which permits use, modification, redistribution, and commercial use. I acknowledge that any use of this model should comply with applicable laws and regulations, including those relating to data protection, discrimination, and unauthorized practice of regulated professions (e.g., law, medicine, finance). I understand that the model may generate outputs that require critical human evaluation, and I agree not to use it as a substitute for professional or legal advice without proper oversight. I agree not to use this model to develop or deploy systems intended to deceive, manipulate, discriminate, or violate human rights. I agree to the Acceptable Use Guidelines provided by the authors. I agree to share the following information for logging and research purposes (this will not restrict my use of the model after download):

🇧🇷 Versão em Português | 🇺🇸 English version below

Descrição do Modelo

Jurema-7B é um LLM especializado no domínio jurídico brasileiro, criado a partir do ajuste fino do modelo Qwen2.5-7B-Instruct. O ajuste fino foi realizado com a utilização de um dataset sintético, majoritariamente com exemplos no formato de perguntas e respostas (Q&A), embora também inclua outros estilos de tarefas. Os exemplos foram derivados de uma coleção diversificada e curada de documentos jurídicos de alta qualidade, selecionados por sua representatividade, qualidade e diversidade.

Avaliação

O Jurema-7B foi avaliado em três dos principais benchmarks para o português brasileiro: BLUEX, ENEM e OAB. O dataset OAB, por sua natureza jurídica, fornece uma avaliação particularmente representativa da capacidade do modelo em compreender e responder a questões do direito brasileiro.

Dataset	BLUEX	ENEM	OAB	OAB 2023
Qwen2.5-7B-Instruct	0.6412	0.7480	0.5326	0.5765
Jurema-7B	0.6426	0.7768	0.6679	0.6840

Intenções de Uso

Este modelo é disponibilizado para fins de pesquisa e desenvolvimento. Embora a licença permita um uso amplo, encorajamos fortemente uma aplicação responsável, ética e transparente, especialmente considerando seu potencial impacto no domínio jurídico. Esta é uma versão inicial do modelo, e os usuários devem estar cientes de que ele pode produzir resultados imprecisos ou incompletos. O domínio jurídico brasileiro é vasto e complexo, e este modelo não pretende alcançar cobertura total ou perfeição em todas as áreas do direito. Ele não substitui o aconselhamento jurídico profissional ou o julgamento especializado.

Limitações e Uso Aceitável

O modelo pode alucinar sobre informações e fatos jurídicos ou interpretar normas legais de forma incorreta. Pode imputar crimes ou fazer inferências legais sem embasamento. O modelo não se destina ao aconselhamento jurídico oficial e suas respostas devem ser revisadas por profissionais qualificados. Consulte nossas Diretrizes de Uso Aceitável para mais informações.

Como usar

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("Jurema-br/Jurema-7B", device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("Jurema-br/Jurema-7B")

prompt = "O que significa fazer justiça em um Estado Democrático de Direito como o brasileiro?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=512)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Equipe

NeuralMind e Escavador.

Desenvolvido com apoio e financiamento da FINEP.

🇺🇸 English version

Model Description

Jurema-7B is a fine-tuned version of Qwen2.5-7B-Instruct, specialized for the Brazilian legal domain.
The fine-tuning was done using a dataset, selected for representativeness, quality, and diversity.

The model was fine-tuned on a synthetically generated dataset primarily built in a Q&A format, though it also includes other task styles. The examples were derived from a diverse and curated collection of high-quality legal documents.

Evaluation

Jurema-7B was evaluated on three benchmark datasets for Brazilian Portuguese: BLUEX, ENEM, and OAB. Notably, the OAB dataset is specifically tailored for the legal domain, allowing us to measure the model’s performance in tasks closely aligned with its intended use.

Dataset	BLUEX	ENEM	OAB	OAB 2023
Qwen2.5-7B-Instruct	0.6412	0.7480	0.5326	0.5765
Jurema-7B	0.6426	0.7768	0.6679	0.6840

Intended Use

This model is released for research and development purposes. Although the license permits broad usage, we strongly encourage responsible, ethical, and transparent application, especially given its potential impact within the legal domain. This is an initial version of the model, and users should be aware that it may produce inaccurate or incomplete outputs. The Brazilian legal domain is vast and complex, and this model does not aim to achieve full coverage or perfection across all legal areas. It is not a substitute for professional legal advice or judgment.

Limitations and Acceptable Use

The model may hallucinate legal facts or misinterpret legal norms. It may impute crimes or make unsupported legal inferences. It does not represent official legal advice and must be reviewed by qualified professionals. See our Acceptable Use Guidelines for more.

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("Jurema-br/Jurema-7B", device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("Jurema-br/Jurema-7B")

prompt = "O que significa fazer justiça em um Estado Democrático de Direito como o brasileiro?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=512)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Team

NeuralMind and Escavador.

Supported and financed by FINEP.

Downloads last month: 443

Safetensors

Model size

7.62B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jurema-br/Jurema-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2493)

this model

Quantizations

1 model