nvidia
/

audio-flamingo-3

Audio-Text-to-Text

audio understanding

Model card Files Files and versions

SreyanG-NVIDIA commited on Jul 11

Commit

529704c

·

verified ·

1 Parent(s): 2c92866

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -72,15 +72,14 @@ Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on o
 **This model is for non-commercial research purposes only.**
 ## Model Architecture:
 Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs.
-<center><img src="static/af3_radial-1.png" width="400"></center>
-## Results:
 <center><img src="static/af3_main_diagram-1.png" width="800"></center>
 ## License / Terms of Use
 The model is released under the [NVIDIA OneWay Noncommercial License](static/NVIDIA_OneWay_Noncommercial_License.docx). Portions of the dataset generation are also subject to the [Qwen Research License](https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE) and OpenAI’s [Terms of Use](https://openai.com/policies/terms-of-use).

 **This model is for non-commercial research purposes only.**
+## Results:
+<center><img src="static/af3_radial-1.png" width="400"></center>
 ## Model Architecture:
 Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs.
 <center><img src="static/af3_main_diagram-1.png" width="800"></center>
 ## License / Terms of Use
 The model is released under the [NVIDIA OneWay Noncommercial License](static/NVIDIA_OneWay_Noncommercial_License.docx). Portions of the dataset generation are also subject to the [Qwen Research License](https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE) and OpenAI’s [Terms of Use](https://openai.com/policies/terms-of-use).