TRELLIS Appreciation Community

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

multimodalart published a Space 11 days ago

trellis-community/TRELLIS

multimodalart updated a Space 11 days ago

trellis-community/TRELLIS

pcuenq authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

trellis-community's activity

multimodalart

published a Space 11 days ago

TRELLIS

🏢

Scalable and Versatile 3D Generation from images

multimodalart

updated a Space 11 days ago

TRELLIS

🏢

Scalable and Versatile 3D Generation from images

pcuenq

authored a paper 2 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 189

multimodalart

posted an update 11 months ago

Post

34683

New feature 🔥
Image models and LoRAs now have little previews 🤏

If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris

3 replies

multimodalart

posted an update about 1 year ago

Post

28372

The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼️✨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🤝 chinese understanding

Try it out by yourself here ▶️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!

pcuenq

posted an update about 1 year ago

Post

7636

OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!

4 replies

multimodalart

posted an update over 1 year ago

Post

The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf

3 replies

multimodalart

posted an update over 1 year ago

Post

⚔️ The TIGERLab's Text2Image arena is here! ⚔️
TIGER-Lab/GenAI-Arena

Like https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard for LLMs: you prompt, two images emerge, vote for the best one 🏆

With enough votes this will lead to an Elo-based leaderboard for text-to-image models, go vote! 🗳️
TIGER-Lab/GenAI-Arena

multimodalart

posted an update over 1 year ago

Post

It seems February started with a fully open source AI renaissance 🌟

Models released with fully open dataset, training code, weights ✅

LLM - allenai/olmo-suite-65aeaae8fe5b6b2122b46778 🧠
Embedding - nomic-ai/nomic-embed-text-v1 📚 (sota!)

And it's literally February 1st - can't wait to see what else the community will bring 👀

multimodalart

authored 2 papers over 1 year ago

LEDITS++: Limitless Image Editing using Text-to-Image Models

Paper • 2311.16711 • Published Nov 28, 2023 • 24

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 87

multimodalart

authored a paper almost 2 years ago

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

Paper • 2307.00522 • Published Jul 2, 2023 • 32

AI & ML interests

Recent Activity

Team members 2

trellis-community's activity

TRELLIS

TRELLIS