Emin Temiz PRO

etemiz

AI & ML interests

Alignment

Recent Activity

posted an update 4 days ago
--- AHA Leaderboard --- We all want AI to be properly aligned so it benefits humans with every answer it generates. While there are tremendous research around this and so many people working on it, I am choosing another route: Curation of people and then curation of datasets that are used in the LLM training. Curation of datasets comprising of people who try to uplift humanity should result in LLMs that try to help humans. This work has revolved around two tasks: 1. Making LLMs that are benefiting humans 2. Measuring misinformation in other LLMs The idea about the second task is, once we make and gather better LLMs and set them as "ground truth" we now can measure how much other LLMs are distancing themselves from those ground truths. For that I am working on something I will call "AHA Leaderboard" (AHA stands for AI -- human alignment). Link to the spreadsheet: https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08 The columns are ground truths. The rows are the mainstream LLMs. If a mainstream LLM produces similar answers to the ground truth LLM, it gets a higher score. The LLMs that are higher in the leaderboard should be considered aligned with humans. Simple idea. This is like analyzing LLMs in different domains asking hundreds of questions and checking if they match the answers that try to mimic humans that care about other humans. Will it going to be effective? What do you think? We want mainstream LLMs to copy answers of ground truth LLMs in certain domains. This may refocus AI towards being more beneficial. There have been 5 content providers and 6 curators as of now in the project. Join us and be one of the pioneers that fixed AI! You can be a curator, content provider or general researcher or something else.
liked a model 5 days ago
etemiz/Hoopoe-8B-Llama-3.1
View all activity

Organizations

None yet

etemiz's activity

reacted to clem's post with πŸ‘ 4 days ago
view post
Post
2525
What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch

Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course
  • 1 reply
Β·
posted an update 4 days ago
view post
Post
1754
--- AHA Leaderboard ---

We all want AI to be properly aligned so it benefits humans with every answer it generates. While there are tremendous research around this and so many people working on it, I am choosing another route: Curation of people and then curation of datasets that are used in the LLM training. Curation of datasets comprising of people who try to uplift humanity should result in LLMs that try to help humans.

This work has revolved around two tasks:

1. Making LLMs that are benefiting humans
2. Measuring misinformation in other LLMs

The idea about the second task is, once we make and gather better LLMs and set them as "ground truth" we now can measure how much other LLMs are distancing themselves from those ground truths.
For that I am working on something I will call "AHA Leaderboard" (AHA stands for AI -- human alignment).

Link to the spreadsheet:

https://sheet.zohopublic.com/sheet/published/mz41j09cc640a29ba47729fed784a263c1d08

The columns are ground truths. The rows are the mainstream LLMs. If a mainstream LLM produces similar answers to the ground truth LLM, it gets a higher score. The LLMs that are higher in the leaderboard should be considered aligned with humans. Simple idea. This is like analyzing LLMs in different domains asking hundreds of questions and checking if they match the answers that try to mimic humans that care about other humans. Will it going to be effective? What do you think?

We want mainstream LLMs to copy answers of ground truth LLMs in certain domains. This may refocus AI towards being more beneficial. There have been 5 content providers and 6 curators as of now in the project. Join us and be one of the pioneers that fixed AI! You can be a curator, content provider or general researcher or something else.
New activity in mradermacher/Nostr-Llama-3.1-8B-i1-GGUF 7 days ago

Old GGUFs

1
#1 opened 7 days ago by
etemiz
posted an update 7 days ago
posted an update 9 days ago
view post
Post
3797
Some things are simple
commented on Open-R1: Update #1 20 days ago
posted an update 21 days ago
published an article 21 days ago
posted an update 22 days ago
view post
Post
376
Having bad LLMs is ok and can be utilized well. They can allow us to find ideas that work faster.

Reinforcement algorithm could be: "take what a proper model says and negate what a bad LLM says". Or in a mixture of agents situation we could say refute the bad LLM output and combine with the output of the good LLM.

This could mean having two wings (or more) in search of "ideas that work for most people most of the time".
  • 1 reply
Β·
replied to their post 26 days ago
view reply

That's a hard question! I think some humans are really creating content for other humans to live happily, healthily and abundantly. I am in favor of giving more weight to those kind of carefully curated humans in the LLM. This can be as simple as pretraining again with their content. I have done that and it works.

Definitely not what the majority says! Majority is often really wrong on many subjects. The mediocrity of current AI systems might be because of this, majority of content is coming from mediocre IQ and EQ and *Q.

A curator council who can choose the "beneficial" humans and the content coming from these can be exaggerated in an LLM, ultimately giving more weight to those thoughts that will be beneficial to many humans most of the time. Ideas that will work in favor of humans in many cases is my definition I guess of human alignment.

posted an update 27 days ago
published an article 27 days ago
replied to their post about 1 month ago
view reply

I am comparing R1's answers to other models that I find 'aligned'. This is my similar work

https://wikifreedia.xyz/based-llm-leaderboard/npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c

I should probably make another leaderboard on HF!

Positive values mean the model is better aligned with aligned models. Negative means their ideas differ.

The idea is find aligned models and use them as benchmarks. I also build models that does well in terms of human alignment according to me. This is mostly a subjective work but if other people is interested we could work together.

replied to their post about 1 month ago
view reply

I repeat: There is a general tendency of models getting smarter but at the same time getting less wiser, less human aligned, less beneficial to humans.

R1 is the last example. This may also be because of synthetic data use. With each synthetic dataset the AI is losing human alignment.

LLM engineers are not doing a great job of bringing the humans into the equation. Some humans really care about other humans and need to be included more in the training datasets.

posted an update about 1 month ago
view post
Post
583
DeepSeek R1 scores:

health -2
fasting -54
faith -31
misinfo -6
nutrition -14

compare to DeepSeek V3:

health +15
fasting -31
faith +4
misinfo +16
nutrition -14

The human disalignment is getting bigger.
  • 4 replies
Β·
posted an update about 1 month ago
view post
Post
1120
Updated the Hoopoe model which is taking faith related and religious texts in.

etemiz/Hoopoe-8B-Llama-3.1

Faith score went from 8% to 54%. Expect more updates and increase in the score. I also did the instruct fine tuning before adding faith to the model. So some of the improvements may be there because I started with llama 3.1 base and not the instruct.

Here are some comparisons with original Llama 3.1: