I need help with something mradermacher / llama 3-13b 5q k m

#1067

by Zedlord - opened 21 days ago

21 days ago

•

Im sorry and i dont know who to talk to and im very new to this, im using web ui and downloaded the model, couldn't get llama.cpp to work so i did llama-server and got the model to work, and now the chat is haywire and bot is hallucinating and repeating itself, don't listen. I dont know how to fix the broken chat, I tried reddit but noone is responding. I tried discord but most are dead. i dont know who to go to.

im sorry if this is not the right discussion for the community

I think it has to do with text template or instruct even then i dont know how to put, I tried messing with it. i dont know whats the right template and settings to act normal

nicoboss

21 days ago

No problem. We are happy to help you. Can you please post the download link of the exact file you are trying to run? Team mradermacher has multiple llama 3-13b based models so it is ambiguous which one you are trying to run.

Zedlord

7 days ago

https://huggingface.co/mradermacher/Llama-3-13B-GGUF
sorry for the late reply, I gave up to some degree, I just know its the template, and combination AI prompt as well. I couldn't figure it out.
I wanted it to be to some degree as fluent as chat gpt, I wasnt sure if it could nsfw for silly tavern but then I notice I couldnt do both. but i couldnt get the template fix without breaking. change the settings so much so it doesnt haywire its word but nothing work.

mradermacher

Owner 7 days ago

Yes, it sounds like a template issue. However, the model you use has been deleted by their creator, so maybe it just isn't so good. You should try the original llama, e.g. bartowksi's quants. They should have proper templates and should be well supported by now.

https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF

Zedlord

6 days ago

my computer can handle llama 3 13B 5-6 quantiation
I do have 5080 plus 64 gig of ram with amd 7900x

I realize i like chat gpt to some degree in terms of intelligence talk, ideas and comprehensive though, that's why I decided to go with AI, then I realize not AI is as smart as chat gpt. so I wanted to get closer to some degree smart. the whole point of offline AI, I guess with the world going on and potentially off grid AI but still function sounds cool to me, I Just think the idea that you still have AI without internet is a cool concept. or if i make that it controls all the house camera and ask it question. like fully integrated AI home. I don't know why I like the idea in the future.

thats why I wanted an uncensored version. I don't like idea of AI being censor, I had llama 3- 8b 16fp un censor, it work just fine, but I was getting like 2-3 tokens per second super slow, even so my computer was handling fine. I did press 4 bit and it was way faster. the reason I went with llama 3- 13b I just like idea its much smarter. and fluent at conversating. cause llama 3 - 8b 16 fp still sounds AI. but then I realize not all template are the same, and somehow I broke my template for my llama 3-8b 16 fp, some how it broke everything.

basically I just wanted the smartest AI and somewhat fluent conversation, not much censor, it can be but not heavily filter that my computer can handle.

mradermacher

Owner 6 days ago

Censored AI models do generally perform worse, yeah. The only issue I see is that there is no upstream llama-3-13b, so the 13b models are upscales, self-merges etc., and are probably not going to perform much better than a newer (uncensored) 8b.

And rest assured, no matter what computer you realistically would have, you'd always want better :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment