Fundamental Research - a Norm Collection

Norm 's Collections

VAE

Image / Video Gen

Multimodal Language Model

Fundamental Research

Computer Vision

Fundamental Research

updated Feb 24

Scaling Law with Learning Rate Annealing

Paper • 2408.11029 • Published Aug 20, 2024 • 4

Note 1. The random search algorithm is to blame due to its unconstrained search space, which accelerates the converging of search towards the bias of verifiers. 2. start -> sample noise i.i.d -> add noise -> denoise -> verify -> Best of N -> start
Token Turing Machines

Paper • 2211.09119 • Published Nov 16, 2022 • 1

Note 1. The result of memory “read” is fed to the processing unit; The output from the processing unit is “written” to the memory. 2. Token summarisation: implemented as a weighted summation of all context in memory. R_{k x p} * R_{p x d} = R{k x d}; Make R_{k x p} learnable. 3. Add positional embedding to distinguish tokens from memory vs. tokens from inputs.
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Paper • 2203.12602 • Published Mar 23, 2022

Note 1. It has an upgrade version: https://arxiv.org/pdf/2303.16727 1.1. Progressive fine-tuning of the pre-trained models can contribute to higher performance. 1.2. Decoder takes inputs from the encoder visible tokens and only reconstructs the visible tokens under the decoder mask. 1.3 The supervision only applies to the decoder output tokens invisible to the encoder.
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Paper • 2305.13035 • Published May 22, 2023
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 72
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14 • 35
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 29
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 146