MrDragonFox's picture

MrDragonFox PRO

MrDragonFox

AI & ML interests

llm + audio i/o, (un)alignment

Recent Activity

liked a dataset about 23 hours ago
moonshotai/Kimi-Audio-GenTest
updated a model 3 days ago
SynthoCraft/Whisper-large-v3
published a model 3 days ago
SynthoCraft/Whisper-large-v3
View all activity

Organizations

DeepGHS's profile picture Blog-explorers's profile picture SynthoCraft Ai's profile picture FoxEngineAi's profile picture Social Post Explorers's profile picture Mistral AI Game Jam's profile picture

MrDragonFox's activity

posted an update 7 days ago
view post
Post
2287
as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)
posted an update 15 days ago
view post
Post
3458
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format
posted an update about 1 month ago
view post
Post
2099
did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune
  • 1 reply
Β·
replied to Reality123b's post 2 months ago
view reply

bro, im not that popular. its ok to do this ig?

also, no, im not saying openai's products are bad. nor im trying to like offend ANYONE or ANY company or organization. im just trying to promote my product

my problem isnt with you trying to promote a product - you asking if you should opensource something or not .. without showing anything working and state its not clickbait ..

show the goods man ..
otherwise noone cares - not how marketing works .. too many snakeoil sales men in that industry ... if you have something show it off .. if not .. well you know where this goes

replied to Reality123b's post 2 months ago
view reply

the point you are mistaken here is that i dont care if he opensources it or not .. its the engagement farming for no reason while implying its not clickbait - but you do you

replied to Reality123b's post 2 months ago
view reply

this is not clickbait .. if you want to opensource it .. you opensource it .. if not you dont .. its that simple - there are other approaches out there to this already

if that's not for farming what did you post it for ?

replied to mitkox's post 4 months ago
replied to mitkox's post 4 months ago
view reply

with 250g ram used ^^ probably running it at a 2 bit quant .

reacted to danielhanchen's post with πŸ”₯ 5 months ago
replied to davanstrien's post 8 months ago
replied to takeraparterer's post 9 months ago
reacted to merve's post with πŸ€— 10 months ago
view post
Post
6095
Fine-tune Florence-2 on any task πŸ”₯

Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andito @SkalskiP

Blog: https://huggingface.co/blog πŸ“•
Notebook: https://colab.research.google.com/drive/1hKDrJ5AH_o7I95PtZ9__VlCTNAo1Gjpf?usp=sharing πŸ“–
Florence-2 is a great vision-language model thanks to it's massive dataset and small size!

This model requires conditioning through task prefixes and it's not as generalist, requiring fine-tuning on a new task, such as DocVQA πŸ“

We have fine-tuned the model on A100 (and one can also use a smaller GPU with smaller batch size) and saw that model picks up new tasks πŸ₯Ή

See below how it looks like before and after FT 🀩
Play with the demo here andito/Florence-2-DocVQA πŸ„β€β™€οΈ