BAAI/Video-XL-2 · So many bug!

guiuiui007

14 days ago

The code can't completely run

zl101

Beijing Academy of Artificial Intelligence org 14 days ago

the code is fine, can u specify ur problem first 😐

lucenliu

13 days ago

why？
OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_encoder.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

CharmingDog

Beijing Academy of Artificial Intelligence org 13 days ago

why？
OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_encoder.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

Thx for your feedback, we will fix it today.

CharmingDog

Beijing Academy of Artificial Intelligence org 13 days ago

why？
OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_encoder.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

Hi Mr.Liu, We've fixed the problem. Please try again using the following steps:

Update inference code: huggingface-cli download BAAI/Video-XL-2 --include "*.py" --local-dir /root/Models/Video-XL-2
Run updated demo code:

1. Inference w/o. Efficiency Optimization

from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig, AutoModelForCausalLM
import torch

# load model 
model_path = '/root/Models/Video-XL-2'
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map=device,quantization_config=None, attn_implementation="sdpa", torch_dtype=torch.float16, low_cpu_mem_usage=True)

gen_kwargs = {
    "do_sample": False,
    "temperature": 0.01,
    "top_p": 0.001,
    "num_beams": 1,
    "use_cache": True,
    "max_new_tokens": 256
}

model.config.enable_sparse = False

# input data
video_path = "/asset/demo.mp4"
question1 = "How many people in the video? (A)3 people (B)6 people. Please only respone the letter"

# params
max_num_frames = 150
sample_fps = 1  # extract frame at 1fps
max_sample_fps = 4

with torch.inference_mode():
    response = model.chat(video_path, tokenizer, question1, chat_history=None, return_history=False,max_num_frames=max_num_frames, sample_fps=sample_fps, max_sample_fps=max_sample_fps, generation_config=gen_kwargs)
    
print(response)

2. Inference w. Chunk-based Pre-filling

from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig, AutoModelForCausalLM
import torch
import pdb
import argparse

torch.cuda.reset_peak_memory_stats()
# load model 
model_path = '/root/Models/Video-XL-2'
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, device_map=device,quantization_config=None, attn_implementation="sdpa", torch_dtype=torch.float16, low_cpu_mem_usage=True) # sdpa

gen_kwargs = {"do_sample": False, "temperature": 0.01, "top_p": 0.001, "num_beams": 1, "use_cache": True, "max_new_tokens": 128}

model.config.enable_chunk_prefill = True
prefill_config = {
    'chunk_prefill_mode': 'streaming',
    'chunk_size': 4,
    'step_size': 1,
    'offload': True,
    'chunk_size_for_vision_tower': 24,
}
model.config.prefill_config = prefill_config

# input data
video_path = "/asset/demo.mp4"
question1 = "How many people in the video? (A)3 people (B)6 people. Please only respone the letter"

# params
max_num_frames = 1300
sample_fps = None  # uniform sampling
max_sample_fps = None

with torch.inference_mode():
    response = model.chat(video_path, tokenizer, question1, chat_history=None, return_history=False,max_num_frames=max_num_frames, sample_fps=sample_fps, max_sample_fps=max_sample_fps, generation_config=gen_kwargs)
    

peak_memory_allocated = torch.cuda.max_memory_allocated()
print(f"Memory Peak: {peak_memory_allocated / (1024**3):.2f} GB")
print(response)

lucenliu

13 days ago

new trouble

my download code：

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)

CharmingDog

Beijing Academy of Artificial Intelligence org 13 days ago

OK

new trouble

my download code：

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)

OK, I will check it.

CharmingDog

Beijing Academy of Artificial Intelligence org 13 days ago

new trouble

my download code：

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True)

The download code works on my local machine, so I suspect the issue might be related to the transformers version. Please try these steps:

Align the transformers version by running pip install transformers==4.43.0, then try the download again.
If the issue persists, please remove all .py files from the download cache (typically located at /root/.cache/huggingface/hub/models--BAAI--Video-XL-2) and resume the download.

Please let me know if anything new comes up, and I'll address it as soon as possible.

lucenliu

12 days ago

Thank you for your patience, but it seems that some files are missing again：

OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_resampler.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

CharmingDog

Beijing Academy of Artificial Intelligence org 12 days ago

Thank you for your patience, but it seems that some files are missing again：

OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_resampler.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

It feels a bit weird that the file multimodal_resampler.builder.py is no longer required after we updated the inference code.
The OSError should not exist.
I guess this is still because the HuggingFace cache contains old .py files from a previous download.

This issue may be solved in one of two ways:

Re-download Video-XL-2 into a new directory by specifying a new cache_dir:

AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=cache_dir)

This method ensures that both the model weights and the code are freshly downloaded. However, it may take some time since the entire model has to be re-downloaded.

Move the existing weights to a new cache directory:
To avoid re-downloading the large model weights, you can manually move them from the old Hugging Face cache directory to your new cache directory. Here's how:

Navigate to the current HF cache directory:

cd /root/.cache/huggingface/hub/models--BAAI--Video-XL-2

Move or copy the weight files into your new cache_dir.

Then load the model using the new cache directory:

AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=cache_dir)

This second approach will only download the latest version of the inference code, without re-downloading the full model weights.

lucenliu

12 days ago

Thank you for your patience, but it seems that some files are missing again：

OSError: BAAI/Video-XL-2 does not appear to have a file named multimodal_resampler.builder.py. Checkout 'https://huggingface.co/BAAI/Video-XL-2/tree/main' for available files.

It feels a bit weird that the file multimodal_resampler.builder.py is no longer required after we updated the inference code.
The OSError should not exist.
I guess this is still because the HuggingFace cache contains old .py files from a previous download.

This issue may be solved in one of two ways:

Re-download Video-XL-2 into a new directory by specifying a new cache_dir:
AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=cache_dir)
This method ensures that both the model weights and the code are freshly downloaded. However, it may take some time since the entire model has to be re-downloaded.

Move the existing weights to a new cache directory:
To avoid re-downloading the large model weights, you can manually move them from the old Hugging Face cache directory to your new cache directory. Here's how:
Navigate to the current HF cache directory:
cd /root/.cache/huggingface/hub/models--BAAI--Video-XL-2
Move or copy the weight files into your new cache_dir.
Then load the model using the new cache directory:
AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=cache_dir)
This second approach will only download the latest version of the inference code, without re-downloading the full model weights.

OK，I reset my server to make sure the environment is clean.
The transformers version is also 4.43.0.
I still encountered an error.I don't konw what to do
My download code:

from transformers import AutoTokenizer, AutoModelForCausalLM
download_path = "/root/model"
tokenizer = AutoTokenizer.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=download_path)
model = AutoModelForCausalLM.from_pretrained("BAAI/Video-XL-2", trust_remote_code=True, cache_dir=download_path)