** Model Detail

** Training datasets:

  • Pretrain: LLaVA 595k
  • Fine-tune: LLaVA 665k

** Evaluation dataset Currently, we tested RWKV7 SigLIP2 on 4 benchmarks proposed for instruction-following LMMs. More benchmarks will be released soon.

  • Benchmarks

  • Encoder LLM VQAV2 TextVQA GQA ScienceQA
    SigLIP2 RWKV7-0.4B 72.04 38.75 55.52 43.32
  • Inference

  • from infer.worldmodel import Worldinfer
    from PIL import Image
    
    
    llm_path='WorldRWKV/RWKV7-0.4B-siglip2/rwkv-0' #Local model path
    encoder_path='google/siglip2-base-patch16-384'
    encoder_type='siglip'
    
    model = Worldinfer(model_path=llm_path, encoder_type=encoder_type, encoder_path=encoder_path)
    
    img_path = './docs/03-Confusing-Pictures.jpg'
    image = Image.open(img_path).convert('RGB')
    
    text = '\x16User: What is unusual about this image?\x17Assistant:'
    
    result = model.generate(text, image)
    
    print(result)
    
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support