AI & ML interests

WIP LLM datasets and models.

Recent Activity

PJMixers-Dev 's collections 19

Prepped for Re-Gen
Dataset flattened, S4 removed, and split into 4 even splits. Ready to have final turn regenerated with another model.
[Old-ish] QwQ RP/Co-Writing Datasets
https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/customchatml-regex-last-only.py
LLaMa-Guard-3 Classified Datasets
null classification means it was skipped due to the turns not correctly converting within apply_chat_template
gemini-2.0-flash-thinking-exp-1219 Datasets
Existing datasets with responses regenerated using gemini-2.0-flash-thinking-exp-1219. Currently only single-turn.
Salesforce Writing Quality Reward Models Datasets
⁣https://github.com/salesforce/creativity_eval/tree/main/WritingRewards/training/Llama/data
Length Filtered Thinking Datasets
Filtered to remove thinking or responses which are too long compared to the average distribution. Also tried to clean some stuff.
gemini-2.0-flash-exp Datasets
Existing datasets with responses regenerated using gemini-2.0-flash-exp. Currently only single-turn.
Thinking/Reasoning Datasets
Salesforce Writing Quality Reward Models Datasets
⁣https://github.com/salesforce/creativity_eval/tree/main/WritingRewards/training/Llama/data
Prepped for Re-Gen
Dataset flattened, S4 removed, and split into 4 even splits. Ready to have final turn regenerated with another model.
[Old-ish] QwQ RP/Co-Writing Datasets
https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/customchatml-regex-last-only.py
Length Filtered Thinking Datasets
Filtered to remove thinking or responses which are too long compared to the average distribution. Also tried to clean some stuff.
LLaMa-Guard-3 Classified Datasets
null classification means it was skipped due to the turns not correctly converting within apply_chat_template
gemini-2.0-flash-thinking-exp-1219 Datasets
Existing datasets with responses regenerated using gemini-2.0-flash-thinking-exp-1219. Currently only single-turn.
gemini-2.0-flash-exp Datasets
Existing datasets with responses regenerated using gemini-2.0-flash-exp. Currently only single-turn.
Thinking/Reasoning Datasets