Reconstruct 3D Gaussians from unposes images.
In-browser speech recognition w/ word-level timestamps
Identify speakers in an audio file
Identify sound sources in images using audio