Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
2
Edward J
HideOnBush
Follow
weipang142857's profile picture
1 follower
Β·
5 following
AI & ML interests
None yet
Recent Activity
updated
a dataset
6 days ago
G-A-I/GraphOmni
published
a dataset
9 days ago
G-A-I/GraphOmni
reacted
to
ahmed-masry
's
post
with π
3 months ago
Happy to announce AlignVLM π β a novel approach to bridging vision and language latent spaces for multimodal understanding in Vision-Language Models (VLMs) πππΌ π Read the paper: https://huggingface.co/papers/2502.01341 π§ Whatβs the challenge? Aligning visual features with language embeddings remains a major bottleneck in VLMs. Existing connectors such as Multi-layer perceptron (MLPs) often introduce noise that degrades performance. β π― Our Solution: ALIGN Connector We propose AlignVLM, a method that maps vision features into a weighted average of LLM text embeddings, ensuring they remain in a space that the LLM can effectively interpret. β π¬ How does it perform? We compared ALIGN against common connectors like MLPs, Perceiver Resampler, and Ovis trained under similar configurations. The results? ALIGN outperforms them all π on diverse document understanding tasks π. π Meet the AlignVLM Model Family! We trained Llama 3.1 (1B, 3B, 8B) using our connector and benchmarked them against various models. The results: β AlignVLM surpasses all Base VLMs trained under similar configurations. β Our models also perform competitively against Instruct VLMs such as Qwen2-VL and InternVL-2.5 π. π€ What about robustness to noise? We injected Gaussian noise (ΞΌ=0, Ο=3) into the vision encoderβs outputs before feeding them to the connector: β ALIGN Connector: Minimal drop (β1.67%) β proving its high robustness! β MLP Connector: Severe degradation (β25.54%) β struggling with noisy inputs. Code & model weights coming soon! Stay tuned! π₯
View all activity
Organizations
models
18
Sort:Β Recently updated
HideOnBush/BERTModified-finetuned-wikitext-full
Updated
Apr 6, 2023
HideOnBush/syn
Updated
Apr 6, 2023
HideOnBush/wiki
Updated
Apr 6, 2023
HideOnBush/syn_
Updated
Apr 6, 2023
HideOnBush/wiki_
Updated
Apr 6, 2023
HideOnBush/BERTModified-finetuned-wikitext-full-wiki
Updated
Apr 6, 2023
HideOnBush/BERTModified-finetuned-wikitext-full-syn
Updated
Apr 6, 2023
HideOnBush/BERTModified-finetuned-wikitext-full-1
Updated
Apr 6, 2023
HideOnBush/BERTModified-finetuned-wikitext-test
Updated
Mar 7, 2023
HideOnBush/BERTModified-fullsize-alt-trans-finetuned-wikitext-test
Updated
Dec 11, 2022
Expand 18 models
datasets
3
Sort:Β Recently updated
HideOnBush/try1000
Viewer
β’
Updated
Sep 7, 2024
β’
1k
β’
15
HideOnBush/try_from_load_script
Viewer
β’
Updated
Apr 6, 2024
β’
60
β’
17
HideOnBush/vlm_try
Viewer
β’
Updated
Apr 6, 2024
β’
199
β’
19