metadata

license: mit
language:
  - pt
base_model:
  - Qwen/Qwen2.5-0.5B-Instruct
pipeline_tag: text-generation
datasets:
  - adalbertojunior/openHermes_portuguese
  - cnmoro/smoltalk-555k-ptbr
  - cnmoro/RagMixPTBR-Legal-Alpaca-2M
model-index:
  - name: Qwen2.5-0.5B-Portuguese-v1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: ENEM Challenge (No Images)
          type: eduagarcia/enem_challenge
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 37.86
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BLUEX (No Images)
          type: eduagarcia-temp/BLUEX_without_images
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 34.63
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: OAB Exams
          type: eduagarcia/oab_exams
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 33.12
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Assin2 RTE
          type: assin2
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 86.3
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Assin2 STS
          type: eduagarcia/portuguese_benchmark
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: pearson
            value: 54.3
            name: pearson
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: FaQuAD NLI
          type: ruanchaves/faquad-nli
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 65.33
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HateBR Binary
          type: ruanchaves/hatebr
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 44.06
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: PT Hate Speech Binary
          type: hate_speech_portuguese
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 55.1
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: tweetSentBR
          type: eduagarcia/tweetsentbr_fewshot
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 45.96
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Qwen2.5-0.5B-Portuguese-v1
          name: Open Portuguese LLM Leaderboard

Qwen2.5-0.5B finetuned for proficiency in Portuguese language and increased intelligence.

https://ollama.com/cnmoro/Qwen2.5-0.5B-Portuguese-v1

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "cnmoro/Qwen2.5-0.5B-Portuguese-v1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Escreva uma breve introdução sobre LLMs (Large Language Models) e suas aplicações."

# System prompt is always injected and hardcoded automatically
# for ideal performance in portuguese language.
# No need to write it again.
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
response
# LLM significa Large Language Models, que são modelos de linguagem computacional
# projetados para simular a inteligência humana no processamento e geração de texto.
# Esses modelos usam técnicas avançadas de aprendizado de máquina e redes neurais para
# compreender e gerar texto com base em dados de entrada. As aplicações de LLM incluem
# tradução automática, análise de sentimento, modelagem de tópicos e resposta a perguntas
# automatizadas. Eles estão sendo cada vez mais utilizados em diversas áreas, como
# saúde, educação e finanças, para melhorar a comunicação, as experiências dos clientes
# e os resultados da pesquisa.

Overall Results

Task	Metric	Value	Stdev
assin2_rte	f1_macro	0.391	0.006
assin2_rte	acc	0.527	0.007
assin2_sts	pearson	0.115	0.014
assin2_sts	mse	1.011	N/A
bluex	acc	0.349	0.010
enem_challenge	acc	0.363	0.007
faquad_nli	f1_macro	0.595	0.017
faquad_nli	acc	0.791	0.011
hatebr_offensive	f1_macro	0.338	0.005
hatebr_offensive	acc	0.502	0.009
oab_exams	acc	0.326	0.006
portuguese_hate_speech	f1_macro	0.412	0.004
portuguese_hate_speech	acc	0.702	0.011
tweetsentbr	f1_macro	0.455	0.005
tweetsentbr	acc	0.594	0.008

Detailed Results

assin2_rte

Metric	Value	Stdev
f1_macro	0.391	0.006
acc	0.527	0.007

assin2_sts

Metric	Value	Stdev
pearson	0.115	0.014
mse	1.011	N/A

bluex

Exam ID	Metric	Value	Stdev
all	acc	0.349	0.010
USP_2019	acc	0.225	0.038
USP_2024	acc	0.293	0.041
USP_2021	acc	0.423	0.040
UNICAMP_2018	acc	0.241	0.034
UNICAMP_2024	acc	0.444	0.043
USP_2020	acc	0.393	0.038
UNICAMP_2020	acc	0.291	0.035
UNICAMP_2021_1	acc	0.326	0.040
UNICAMP_2022	acc	0.487	0.046
USP_2022	acc	0.388	0.040
UNICAMP_2019	acc	0.280	0.037
UNICAMP_2021_2	acc	0.294	0.037
UNICAMP_2023	acc	0.558	0.044
USP_2023	acc	0.364	0.042
USP_2018	acc	0.278	0.035

enem_challenge

Exam ID	Metric	Value	Stdev
all	acc	0.363	0.007
2016_2	acc	0.390	0.025
2015	acc	0.319	0.025
2011	acc	0.410	0.026
2013	acc	0.398	0.027
2017	acc	0.319	0.025
2022	acc	0.376	0.024
2009	acc	0.226	0.023
2010	acc	0.444	0.026
2012	acc	0.345	0.025
2014	acc	0.339	0.026
2016	acc	0.397	0.026
2023	acc	0.385	0.024

faquad_nli

Metric	Value	Stdev
f1_macro	0.595	0.017
acc	0.791	0.011

hatebr_offensive

Metric	Value	Stdev
f1_macro	0.338	0.005
acc	0.502	0.009

oab_exams

Exam ID	Metric	Value	Stdev
all	acc	0.326	0.006
2018-25	acc	0.400	0.032
2016-20a	acc	0.238	0.027
2011-05	acc	0.400	0.032
2012-08	acc	0.325	0.030
2012-09	acc	0.260	0.029
2014-13	acc	0.325	0.030
2011-03	acc	0.313	0.027
2016-20	acc	0.275	0.029
2012-06a	acc	0.325	0.030
2017-22	acc	0.338	0.031
2015-16	acc	0.325	0.030
2013-12	acc	0.300	0.030
2017-24	acc	0.250	0.028
2012-06	acc	0.238	0.027
2014-14	acc	0.325	0.030
2013-11	acc	0.325	0.030
2013-10	acc	0.413	0.032
2010-02	acc	0.390	0.028
2016-21	acc	0.375	0.031
2015-18	acc	0.300	0.030
2015-17	acc	0.282	0.029
2016-19	acc	0.333	0.031
2012-07	acc	0.388	0.031
2017-23	acc	0.325	0.030
2011-04	acc	0.350	0.031
2010-01	acc	0.282	0.028
2014-15	acc	0.385	0.032

portuguese_hate_speech

Metric	Value	Stdev
f1_macro	0.412	0.004
acc	0.702	0.011

tweetsentbr

Metric	Value	Stdev
f1_macro	0.455	0.005
acc	0.594	0.008

Model Meta Information

Truncated Samples: 3863
Non-Truncated Samples: 10287
Padded Samples: 0
Non-Padded Samples: 14150
Fewshots Truncated: 3863
Has Chat Template: True
Chat Type: system_user_assistant
Number of GPUs: 1
Accelerate Number of Processes: N/A
Model SHA: None
Model Data Type: torch.bfloat16
Model Memory Footprint: 988065664 bytes
Model Number of Parameters: 494032768
Model is Loaded in 4bit: N/A
Model is Loaded in 8bit: N/A
Model is Quantized: N/A
Model Device: cuda:0
Batch Size: 1
Max Length: 512
Max Context Length 480
Max Generation Tokens: 32
Effective Batch Size: 1.0

Open Portuguese LLM Leaderboard Evaluation Results

Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard

Metric	Value
Average	50.74
ENEM Challenge (No Images)	37.86
BLUEX (No Images)	34.63
OAB Exams	33.12
Assin2 RTE	86.30
Assin2 STS	54.30
FaQuAD NLI	65.33
HateBR Binary	44.06
PT Hate Speech Binary	55.10
tweetSentBR	45.96