Add sample usage to model card
Browse filesThis PR enhances the model card for MMaDA-8B-MixCoT by adding a "Sample Usage" section.
The new section provides a Python code snippet, derived from the project's GitHub repository, demonstrating how to perform image generation with the model. This makes it easier for users to quickly get started with the model's capabilities.
README.md
CHANGED
@@ -1,8 +1,9 @@
|
|
1 |
---
|
2 |
-
license: mit
|
3 |
library_name: transformers
|
|
|
4 |
pipeline_tag: any-to-any
|
5 |
---
|
|
|
6 |
# MMaDA-8B-MixCoT
|
7 |
|
8 |
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
|
@@ -15,6 +16,40 @@ Compared to [MMaDA-8B-Base](https://huggingface.co/Gen-Verse/MMaDA-8B-Base), MMa
|
|
15 |
|
16 |
[Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
# Citation
|
19 |
|
20 |
```
|
|
|
1 |
---
|
|
|
2 |
library_name: transformers
|
3 |
+
license: mit
|
4 |
pipeline_tag: any-to-any
|
5 |
---
|
6 |
+
|
7 |
# MMaDA-8B-MixCoT
|
8 |
|
9 |
We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal understanding, and text-to-image generation. MMaDA is distinguished by three key innovations:
|
|
|
16 |
|
17 |
[Paper](https://arxiv.org/abs/2505.15809) | [Code](https://github.com/Gen-Verse/MMaDA) | [Demo](https://huggingface.co/spaces/Gen-Verse/MMaDA)
|
18 |
|
19 |
+
## Sample Usage
|
20 |
+
|
21 |
+
You can use the provided `FlexARInferenceSolver` from the [GitHub repository](https://github.com/Gen-Verse/MMaDA) to easily perform various tasks, such as image generation.
|
22 |
+
|
23 |
+
First, ensure you have cloned the repository and installed the necessary dependencies as per the GitHub repository's instructions (`pip install -r requirements.txt`).
|
24 |
+
|
25 |
+
```python
|
26 |
+
from MMaDA.inference_solver import FlexARInferenceSolver
|
27 |
+
from PIL import Image
|
28 |
+
|
29 |
+
# ******************** Image Generation ********************
|
30 |
+
inference_solver = FlexARInferenceSolver(
|
31 |
+
model_path="Gen-Verse/MMaDA-8B-MixCoT",
|
32 |
+
precision="bf16",
|
33 |
+
target_size=768,
|
34 |
+
)
|
35 |
+
|
36 |
+
q1 = f"Generate an image of 768x768 according to the following prompt:\
|
37 |
+
" \
|
38 |
+
f"Image of a dog playing water, and a waterfall is in the background."
|
39 |
+
|
40 |
+
# generated: tuple of (generated response, list of generated images)
|
41 |
+
generated = inference_solver.generate(
|
42 |
+
images=[],
|
43 |
+
qas=[[q1, None]],
|
44 |
+
max_gen_len=8192,
|
45 |
+
temperature=1.0,
|
46 |
+
logits_processor=inference_solver.create_logits_processor(cfg=4.0, image_top_k=2000),
|
47 |
+
)
|
48 |
+
|
49 |
+
a1, new_image = generated[0], generated[1][0]
|
50 |
+
new_image.show() # Display the generated image
|
51 |
+
```
|
52 |
+
|
53 |
# Citation
|
54 |
|
55 |
```
|