Martin Viewegger

Viewegger

AI & ML interests

None yet

Recent Activity

Organizations

None yet

Viewegger's activity

reacted to merve's post with ๐Ÿ”ฅ 5 days ago
view post
Post
3277
New foundation model on image and video captioning just dropped by NVIDIA AI ๐Ÿ”ฅ

Describe Anything Model (DAM) is a 3B vision language model to generate detailed captions with localized references ๐Ÿ˜ฎ

The team released the models, the dataset, a new benchmark and a demo ๐Ÿคฉ nvidia/describe-anything-680825bb8f5e41ff0785834c

Most of the vision LMs focus on image as a whole, lacking localized references in captions, and not taking in visual prompts (points, boxes, drawings around objects)

DAM addresses this on two levels: new vision backbone that takes in focal crops and the image itself, and a large scale dataset ๐Ÿ‘€

They generate a dataset by extending existing segmentation and referring expression generation datasets like REFCOCO, by passing in the images and classes to VLMs and generating captions.

Lastly, they also release a new benchmark again with self-supervision, they use an LLM to evaluate the detailed captions focusing on localization ๐Ÿ‘
New activity in kadirnar/Orpheus-TTS-Starrail 9 days ago
New activity in kadirnar/Orpheus-TTS-Starrail 13 days ago
reacted to alibabasglab's post with ๐Ÿ‘ 3 months ago
reacted to alibabasglab's post with ๐Ÿ‘ 4 months ago
view post
Post
5316
๐ŸŽ‰ ClearerVoice-Studio New Feature: Speech Super-Resolution with MossFormer2 ! ๐Ÿš€
Weโ€™re excited to announce that ClearerVoice-Studio now supports speech super-resolution, powered by our latest MossFormer2-based model!
Whatโ€™s New?

๐Ÿ”Š Convert Low-Resolution to High-Resolution Audio:
Transform low-resolution audio (effective sampling rate โ‰ฅ 16 kHz) into crystal-clear, high-resolution audio at 48 kHz.

๐Ÿค– Cutting-Edge Technology:
Leverages the MossFormer2 model plus HiFi-GAN, optimised for generating high-quality audio with enhanced perceptual clarity.

๐ŸŽง Enhanced Listening Experience:
Perfect for speech enhancement, content restoration, and high-fidelity audio applications.

๐ŸŒŸ Try It Out!
Upgrade to the latest version of ClearerVoice-Studio (https://github.com/modelscope/ClearerVoice-Studio) to experience this powerful feature. Check out the updated documentation and examples in our repository.

Let us know your thoughts, feedback, or feature requests in the Issues section.
New activity in jpgallegoar/F5-Spanish 5 months ago
New activity in marduk-ra/F5-TTS-German 5 months ago

Training process details

4
#2 opened 5 months ago by
Nils11