You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

See axolotl config

axolotl version: 0.13.0.dev0

base_model: /mnt/shared/p02/alex/gpt-oss-vl/gpt-oss-20b-vl-sft-output-7/checkpoint-16254
model_type: AutoModelForImageTextToText
processor_type: AutoProcessor
trust_remote_code: true
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
unfrozen_parameters:
  - ^visual.merger.[\s\S]+$
  - ^visual.merger_list.[\s\S]+$
  - ^model.model.embed_tokens.weight$[200009:200012]
datasets:
  - path: /mnt/shared/p02/alex/gpt-oss-vl/mscoco
    type: chat_template
    split: train
#  - path: AlexHung29629/openimages_objdet
#    type: chat_template
#    split: train
message_property_mappings:
  role: role
  content: content
  thinking: thinking
dataset_prepared_path: last_run_prepared
val_set_size: 0
sequence_len: 4096
pad_to_sequence_len: true
dataloader_pin_memory: true
dataloader_num_workers: 16
gradient_accumulation_steps: 1
#gradient_checkpointing: true
micro_batch_size: 16
output_dir: ./gpt-oss-20b-vl-sft-output-10
num_epochs: 1
warmup_ratio: 0.01
torch_compile: false
torch_compile_backend: inductor
torch_compile_mode: reduce-overhead
max_grad_norm: 1.0
evals_per_epoch: 1
saves_per_epoch: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002
logging_steps: 1
bf16: true

fsdp_version: 2
fsdp_config:
  activation_checkpointing: true
  offload_params: false
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: GptOssDecoderLayer
  reshard_after_forward: false
  cpu_ram_efficient_loading: true

flash_attention: true
seed: 42
use_tensorboard: true
use_wandb: true
image_size: 1024

gpt-oss-20b-vl-sft-output-10

This model was trained from scratch on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 16
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 128
total_eval_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 44
training_steps: 4428

Training results

Framework versions

Transformers 4.56.1
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 9

Safetensors

Model size

21B params

Tensor type

BF16

Evaluation results

Metadata error: specify a dataset to view leaderboard