Track points in a video
https://huggingface.co/papers/2501.03006
Transform video frames using text instructions