--- datasets: - ReDiX/italian-filtered-corpus language: - it - en base_model: - Qwen/Qwen3-0.6B-Base library_name: transformers license: cc --- # Qwen3 0.6B Base - Ita 🇮🇹 This model is a further-pretrained version of [Qwen3-0.6B-Base](https://huggingface.co/Qwen/Qwen3-0.6B-Base) 🚀, specifically trained on 2 billion Italian tokens. The training data includes educational content 📚 carefully filtered from multilingual pre-training datasets. This ensures the model has a strong understanding of the Italian language and its nuances. It also boasts an extended tokenizer ✍️ optimized for Italian. ⚠️ Important Note: This is an experimental model. It may generate content that is dangerous or includes personal information. Please use with caution. ## Base Model (Not Instruct) 🤖 This is not an instruct model, meaning it doesn't follow a specific chat template. Instead, it's designed to be fine-tuned for your specific use case 🎯 with the Italian language. ## Evaluation Results 📊 Here's a breakdown of the model's performance on various tasks: | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |------------|------:|------|-----:|--------|---|-----:|---|-----:| |arc_it | 2|none | 0|acc |↑ |0.2566|± |0.0128| | | |none | 0|acc_norm|↑ |0.2840|± |0.0132| |hellaswag_it| 1|none | 0|acc |↑ |0.3363|± |0.0049| | | |none | 0|acc_norm|↑ |0.3994|± |0.0051| |m_mmlu_it | 0|none | 5|acc |↑ |0.2699|± |0.0039| ## How to use this model ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "ReDiX/Qwen-0.6B-Base-ITA" # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto" ).eval() text = "La principale causa del raffreddore" model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=128 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids[0:], skip_special_tokens=True).strip("\n") print("content:", content) ```