xianbao (Tiezhen WANG)

reacted to MonsterMMORPG's post with 🤯 7 months ago

Post

3894

WAN 2.1 FusionX + Self Forcing LoRA are the New Best of Local Video Generation with Only 8 Steps + FLUX Upscaling Guide : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Tutorial : https://www.youtube.com/watch?v=Xbn93GRQKsQ

Video Chapters

0:00 Introduction to the New FusionX Video Model & FLUX Upscaling
0:30 One-Click Presets & The SwarmUI Model Downloader Explained
1:07 Achieving Hyper-Realism with the FLUX 2x Latent Upscale Preset
1:58 How to Download & Install the SwarmUI Model Downloader
2:49 Downloading Full Models vs. Downloading Just The LoRAs
3:48 Final Setup: Updating SwarmUI & Importing The New Presets
4:32 Generating a Video: Applying the FusionX Image-to-Video Preset
5:03 Critical Step: Correcting The Model's Native Resolution Metadata
5:55 Finalizing Image-to-Video Settings (Frame Count & RIFE Interpolation)
6:49 Troubleshooting Performance: Identifying Low GPU Usage & Shared VRAM Bug
8:35 The Solution: Disabling Sage Attention for Image-to-Video Models
10:02 Final Result: Showcasing The Amazing HD Quality Animation
10:40 How to Use the FusionX Text-to-Video Model with Presets
11:49 Text-to-Video Result & Quality Comparison
12:08 How to Use the FusionX LoRA with the Base Wan 2.1 Model
13:07 FLUX Tutorial: Downloading The Required Upscaler & Face Models
13:48 Generating a High-Quality Image with The Official FLUX Preset
14:50 Using Automatic Face Segmentation & Inpainting with FLUX
16:05 The Ultimate Upgrade: Applying The FLUX 2x Latent Upscaler Preset
16:32 Final Result: Comparing Standard vs. 2x Upscaled Image Quality
16:50 Outro & Sneak Peek of The New Ultimate Video Processing App

6 replies

·

reacted to merve's post with 🔥 12 months ago

Post

5593

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images

7 replies

·

replied to nroggendorff's post about 1 year ago

1.9EB!

posted an update over 1 year ago

Post

2694

With the open-weight release of CogVideoX-5B from THUDM, i.e. GLM team, the Video Generation Model (how about calling it VGM) field has officially became the next booming "LLM"

What does the landscape look like? What are other video generation models? This collection below is all your need.

xianbao/video-generation-models-66c350163c74f60f5c412af6

The above video is generated by @a-r-r-o-w with CogVideoX-5B, taken from a nice lookout for the field!

reacted to not-lain's post with 🔥 over 1 year ago

Post

1975

I will be delivering an introductory coding session this Sunday 7Pm gmt+1 time about huggingface, if you are new to HF and don't know where to begin, you are welcome to join us 🤗
📌Place: huggingface discord server
🔗Link : https://discord.gg/hugging-face-879548962464493619?event=1245406127668203541

2 replies

·

reacted to clem's post with 👍 over 1 year ago

Post

1564

I would pick @ylecun over @elonmuskceo every single day of the week.

Despite getting much less $$, recognition & visibility than entrepreneurs, the scientists who publish their groundbreaking research openly are the cornerstone of technological progress & massively contribute to making the world a better place!

1 reply

·

posted an update over 1 year ago

Post

2064

Why Apache 2.0 Matters for LLMs 🤔

@01AI_Yi recently switched from a permissive & commercially friendly license, to Apache 2.0. And the community loved it! 🚀

@JustinLin610 also had a poll on model license and the majority votes for Apache 2.0.

Why it is a Big Deal? ⬇️

📚 Legal Simplicity: Custom licenses need costly & time-consuming legal review. Apache 2.0 is well-known & easier for legal teams to handle.

👩‍💻 Developer-Friendly: Legal docs are a pain for devs! Apache 2.0 is well-known and tech-friendly, making it easier for non-native developers to understand the implications too.

🔗 Easier Integration: Apache 2.0 is compatible with many other licenses, simplifying tasks like model merging with models of different licensing requirements.

🚫 No Permission Needed: Custom licenses often require explicit permission and additional documentation work of filling forms, creating barriers. Apache 2.0 removes this hurdle, letting devs focus on innovation.

There are a lot interesting discussions from
@JustinLin610 's poll: https://x.com/JustinLin610/status/1793559737482764375 which inspired this thread.

Any other thoughts? Let me know ^^

1 reply

·

posted an update over 1 year ago

Post

1331

DeepSeekV2 is a big deal. Not only because its significant improvements to both key components of Transformer: the Attention layer and FFN layer.

It has also completed disrupted the Chines LLM market and forcing the competitors to drop the price to 1% of the original price.

---

There are two key components in Transformer architecture: the self-attention layer, which captures relationships between tokens in context, and the Feed-Forward Network (FFN) layer, which stores knowledge.

DeepSeek V2 introduces optimizations to both:

Attention layer normally uses KV Cache to reduce repetitive compute, but it consumes significant GPU RAM, limiting concurrent requests. DeepSeek V2 introduces Multi-head Latent Attention (MLA), which stores only a small latent representation, resulting in substantial RAM savings.

DeepSeek V2 utilizes 162 experts instead of the usual 8 as in Mixtral. This approach segments experts into finer granularity for higher specialization and more accurate knowledge acquisition. Activating only a small subset of experts for each token, leads to efficient processing.

It disrupted the market by dropping API prices to $0.14 per 1M tokens. This dramatic reduction forced competitors like GLM, Ernie, and QWen to follow suit, lowering their prices to 1% of their original offerings. Now, users can access these APIs at 1/35th the cost of ChatGPT-4o.

reacted to JustinLin610's post with 🚀🔥 over 1 year ago

Post

5280

Finally, Qwen1.5-110B is out! With weights and demo!

Blog: https://qwenlm.github.io/blog/qwen1.5-110b/
Demo: https://huggingface.co/spaces/Qwen/Qwen1.5-110B-Chat-demo
Base: Qwen/Qwen1.5-110B
Chat: Qwen/Qwen1.5-110B-Chat

This model has some specific features:
* GQA
* 32K token context length
* Multilingual support

We feel good about its performance on benchmarks, including those for base models and chat models, but we still need more of your testing and feedback to help us know its capabilities and limitations!

Additionally, the base model has not learned chatml tokens. Yeah if you use chatml format, you need to be careful about it!

Enjoy and stay tuned for Qwen2!

1 reply

·

posted an update over 1 year ago

Post

1933

So hard to keep up with pace!!! Lots of new Chinese fine-tunes are being released on HF

So I asked my agent to create a collection
xianbao/llama3-zh-662ba8503bdfe51948a28403

code: https://colab.research.google.com/drive/1ap6fP-VytZE367Nqk26DeQqgQkYaf-cD#scrollTo=eljRbYb4c92M

Would be nice to run then regularly. Any thoughts / suggestions on where to host this cron job?

1 reply

·

reacted to abhishek's post with 🚀🔥👀 over 1 year ago

Post

3486

With AutoTrain, you can already finetune the latest llama3 models without writing a single line of code. Here's an example finetune of llama3 8b model: https://huggingface.co/abhishek/autotrain-llama3-no-robots

2 replies

·

reacted to WizardLM's post with 🚀 over 1 year ago

Post

49174

🔥🔥🔥 Introducing WizardLM-2!

📙Release Blog: https://wizardlm.github.io/WizardLM2
✅Model Weights: microsoft/wizardlm-661d403f71e6c8257dbd598a
🐦Twitter: https://twitter.com/WizardLM_AI/status/1779899325868589372

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

🤗 WizardLM 2 Capacities:

1. MT-Bench (Figure-1)
The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.

2. Human Preferences Evaluation (Figure 2)
Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.

🔍Method Overview:
As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.

In the past one year, we built a fully AI powered synthetic training system. (As shown in the Figure 3).

36 replies

·

reacted to chiphuyen's post with ❤️🚀 almost 2 years ago

Post

Huggingface is carrying the AI open source ecosystem https://huyenchip.com/2024/03/14/ai-oss.html

4 replies

·

posted an update almost 2 years ago

Post

Welcome Bunny! A family of lightweight but powerful multimodal models from BAAI

With detailed work on dataset curation, the Bunny-3B model built upon SigLIP and Phi-2 achieves performance on par with 13B models.

Model: BAAI/bunny-phi-2-siglip-lora

2 replies

·

posted an update almost 2 years ago

Post

There appears to be a huge misunderstanding regarding the licensing requirements for open sourced Chinese speaking speaking LLMs on
@huggingface

I initially shared this misconception too, but after conducting some research, I came up with the list below.

Veryimpressive!

replied to victor's post almost 2 years ago

::fire::

Tiezhen WANG

AI & ML interests

Recent Activity

Organizations

Tiezhen WANG

AI & ML interests

Recent Activity

Organizations

xianbao's activity