You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

base_model: /mnt/shared/p02/alex/gpt-oss-vl/gpt-oss-20b-vl-sft-output-7/checkpoint-16254
model_type: AutoModelForImageTextToText
processor_type: AutoProcessor
trust_remote_code: true
skip_prepare_dataset: true
remove_unused_columns: false
sample_packing: false
unfrozen_parameters:
  - ^visual.merger.[\s\S]+$
  - ^visual.merger_list.[\s\S]+$
  - ^model.model.embed_tokens.weight$[200009:200012]
datasets:
  - path: /mnt/shared/p02/alex/gpt-oss-vl/mscoco
    type: chat_template
    split: train
#  - path: AlexHung29629/openimages_objdet
#    type: chat_template
#    split: train
message_property_mappings:
  role: role
  content: content
  thinking: thinking
dataset_prepared_path: last_run_prepared
val_set_size: 0
sequence_len: 4096
pad_to_sequence_len: true
dataloader_pin_memory: true
dataloader_num_workers: 16
gradient_accumulation_steps: 1
#gradient_checkpointing: true
micro_batch_size: 16
output_dir: ./gpt-oss-20b-vl-sft-output-10
num_epochs: 1
warmup_ratio: 0.01
torch_compile: false
torch_compile_backend: inductor
torch_compile_mode: reduce-overhead
max_grad_norm: 1.0
evals_per_epoch: 1
saves_per_epoch: 1
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0002
logging_steps: 1
bf16: true

fsdp_version: 2
fsdp_config:
  activation_checkpointing: true
  offload_params: false
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: GptOssDecoderLayer
  reshard_after_forward: false
  cpu_ram_efficient_loading: true

flash_attention: true
seed: 42
use_tensorboard: true
use_wandb: true
image_size: 1024

gpt-oss-20b-vl-sft-output-10

This model was trained from scratch on an unknown dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 44
  • training_steps: 4428

Training results

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
9
Safetensors
Model size
21B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support