🚩 Report: Ethical issue(s)

#51
by LauOverload - opened

Perplexity claims that their post-trained model is unbiased, but why are all the examples given on their official blog about controversial topics? Moreover, the answers after post-training clearly exhibit a certain political bias. If the purpose of post-training is truly to reduce bias, why not release all of the over a thousand pieces of data as open source?

If the goal is genuinely to reduce bias in the model, there should be a collaborative, cross-cultural effort to co-lead an open-source project with fully accessible data, rather than fine-tuning a model with private data and then monetizing it through APIs using the open-source DeepSeek model on their own website. Doesn’t this essentially turn large models into ideological propaganda tools?

A call to action: Let’s all contribute to an open-source project that is fully accessible, allowing everyone to oversee the process. This project should aim to remove biases from models trained by Chinese enterprises and labs, ensuring transparency and accountability. Such a project would be a reasonable and fair solution for both Chinese and Western stakeholders.

You're right, while this model may have removed censorship, it's actually introduced bias. When you ask about issues like China and Taiwan, or China and the US, it consistently takes a US-centric viewpoint, prioritizing American interests and even assuming China will resort to unfair tactics to gain an advantage. It doesn't offer a neutral, objective analysis considering both sides. This subtly promotes a particular ideology, which is arguably more insidious than outright censorship.

I know hugging face already launched such an initiative. Actually I don't know their position on politics, but at least the datasets are/will be open source!
https://github.com/huggingface/open-r1

Actually I don't know their position on politics

Their twitter account keeps retweeting the perslopxity CEO about this model, and the huggingface CEO has met in person with them and has even been participating in some of the (non-criticism) discussions on this model here. take your guess as to which side they're currying favor with at the moment

Hmm, thanks for sharing, I just saw it now.
I'm telling myself anyway, if this is a start of new trend among corps to include political biais into their datasets... Their model will be dumb compared to the new ones using filtered from any human subjectivity, as this has been shown to severely mine RL methods that made those new reasoning models a reality. Let the models find their own paths that are purely logical, and they will have a far better value considering the political views they can have!
So I think, and I hope this issue around the big techs closed models will solve by itself... Cause, everyone wants the BEST model. So they will have to deliver.

The assumption your making is the examples are "biased".

The examples are there to show there is no bias, as the Chinese model is the one who's biased since it is being heavily censored on subjects sensitive in China. These subjects are not sensitive in Japan, nor in USA, nor in the land of Oz, nor in any European country. You're trying to export a tool of propaganda. Now the propaganda part is removed, and you have some kind of CCP army here trying to use pink slime on Perplexity. Just cut it out, it won't work.

Hmm, thanks for sharing, I just saw it now.
I'm telling myself anyway, if this is a start of new trend among corps to include political biais into their datasets... Their model will be dumb compared to the new ones using filtered from any human subjectivity, as this has been shown to severely mine RL methods that made those new reasoning models a reality. Let the models find their own paths that are purely logical, and they will have a far better value considering the political views they can have!
So I think, and I hope this issue around the big techs closed models will solve by itself... Cause, everyone wants the BEST model. So they will have to deliver.

If I ask a normal (not-CCP) model right now about 1989 Tiananmen Square protests and massacre, I get a truthful answer. I tried with OpenAI ChatGPT, and Mistral. Give it a whirl, my Chinese friends.

The assumption your making is the examples are "biased".

Thanks for enriching the discussion.

Nobody will agree to remove 1989 refusals in china and no western model will define a woman. What you ask for is fantasy. The models were already ideological propaganda tools.

Open Weights Open Data Open Code
OpenThinker-7B
Bespoke-Stratos-7B
DeepSeek-R1-Distill-Qwen-7B
gpt-4o-0513
o1-mini

https://huggingface.co/open-thoughts/OpenThinker-7B

The assumption your making is the examples are "biased".

Thanks for enriching the discussion.

First of all, I mentioned that the examples are controversial, not necessarily biased. As for the Tiananmen Square incident mentioned in the discussion, I believe that Chinese people need a real, unhidden answer more than foreigners do. However, we cannot ignore the fact that the example of the Xinjiang human rights issue brought up by Perplexity is indeed highly controversial. I grew up in a place near Xinjiang in northwestern China, and over the past two decades, the economic and livelihood conditions in the Xinjiang Uyghur Autonomous Region have undergone tremendous changes (in a positive sense), without forced labor. Therefore, I view Perplexity's fine-tuned data as controversial. For such issues, we should maintain an open-minded approach, like ChatGPT, Claude, and Wikipedia, rather than providing politically biased answers directly. (Of course, the original R1 model’s avoidance of these questions is not a good solution, but it’s clear that Perplexity’s approach is more unsettling than security review processes.)

https://framerusercontent.com/images/BgzLvr8KwDI2hPco5SenobeMrg8.png?scale-down-to=4096

Sign up or log in to comment