mistral-300m-base
Overview
Welcome to my model card!
This Model feature is ...
- Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
 - Pretrained by wikipedia dataset and cc100 dataset
 - Use of Mistral 300M
 
Yukkuri shite ittene!
How to use the model
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
MODEL_NAME = "ce-lery/mistral-300m-base"
torch.set_float32_matmul_precision('high')
DEVICE = "cuda"
if torch.cuda.is_available():
    print("cuda")
    DEVICE = "cuda"
else:
    print("cpu")
    DEVICE = "cpu"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
).to(DEVICE)
# streamer = TextStreamer(tokenizer)
prompt = "่ช็ถ่จ่ชๅฆ็ใจใฏใ"
inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        inputs["input_ids"],
        max_new_tokens=1024,
        do_sample=True,
        early_stopping=False,
        top_p=0.95,
        top_k=50,
        temperature=0.1,
        # streamer=streamer,
        no_repeat_ngram_size=2,
        num_beams=3
    )
print(outputs.tolist()[0])
outputs_txt = tokenizer.decode(outputs[0])
print(outputs_txt)
Receipe
If you want to restruct this model, you can refer this Github repository.
If you find my mistake,error,...etc, please create issue. If you create pulreqest, I'm very happy!
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
 - train_batch_size: 12
 - eval_batch_size: 12
 - seed: 42
 - gradient_accumulation_steps: 20
 - total_train_batch_size: 240
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine_with_min_lr
 - lr_scheduler_warmup_steps: 1000
 - num_epochs: 1.0
 
Training results
| Training Loss | Epoch | Step | Validation Loss | 
|---|---|---|---|
| 3.7969 | 0.2212 | 10000 | 3.4418 | 
| 3.659 | 0.4424 | 20000 | 3.2704 | 
| 3.5721 | 0.6635 | 30000 | 3.1969 | 
| 3.5678 | 0.8847 | 40000 | 3.1757 | 
Framework versions
- Transformers 4.55.2
 - Pytorch 2.8.0+cu128
 - Datasets 4.0.0
 - Tokenizers 0.21.4
 
- Downloads last month
 - 261