Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
7
8
Stoney Kang
sikang99
Follow
shtefcs's profile picture
1 follower
Β·
13 following
AI & ML interests
Remote Control based on Vision
Recent Activity
upvoted
an
article
about 18 hours ago
We now support VLMs in smolagents!
liked
a model
21 days ago
microsoft/phi-4
reacted
to
merve
's
post
with π
about 2 months ago
Last week we were blessed with open-source models! A recap π https://huggingface.co/collections/merve/nov-29-releases-674ccc255a57baf97b1e2d31 πΌοΈ Multimodal > At Hugging Face we released SmolVLM, a performant and efficient smol vision language model π > Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents π€ > Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length > ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM > Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning modelπ > Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT π π¬ LLMs > Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet π₯ > AliBaba has released Marco-o1, a new open-source reasoning model π₯ > NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer) β―οΈ Image/Video Generation > Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation > Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res β―οΈ > Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla π·οΈ Audio > OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
View all activity
Organizations
sikang99
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
21 days ago
microsoft/phi-4
Text Generation
β’
Updated
22 days ago
β’
284k
β’
1.62k
liked
a model
4 months ago
rain1011/pyramid-flow-sd3
Text-to-Video
β’
Updated
Oct 30, 2024
β’
809
liked
4 models
5 months ago
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
β’
Updated
19 days ago
β’
1.81M
β’
1.1k
NVEagle/Eagle-X5-7B
Image-Text-to-Text
β’
Updated
Sep 16, 2024
β’
254
β’
24
Xenova/modnet
Image Segmentation
β’
Updated
Sep 2, 2024
β’
5.34k
β’
39
microsoft/Phi-3.5-vision-instruct
Image-Text-to-Text
β’
Updated
Sep 26, 2024
β’
346k
β’
651
liked
a model
6 months ago
remyxai/SpaceLLaVA
Text Generation
β’
Updated
Oct 3, 2024
β’
332
β’
21
liked
a model
7 months ago
Vision-CAIR/MiniGPT4-Video
Updated
Jul 24, 2024
β’
34
β’
31