How to eval the video/image sequences?

#69
by lkllkl - opened

Since the model can eval the single image perfectly, I wonder how to eval the video/image sequences. I change the message where the type is video and its not working.

Good day. As a starting point:

  1. Split the video into frames (e.g. jpg pictures) in an appropriate format.
  2. Pass each frame sequentially to the model with the corresponding prompt (iteratively). Save the result (can be in a list or dataframe).

Great discussion, once you have the video into frames, is there a way to process a batch of images together OR we can only process 1 image at a time?

@vibhu There's a pretty good example in this discussion. https://huggingface.co/google/gemma-3-27b-it/discussions/73
Examine the link to Google Collab

Sign up or log in to comment