AI & ML interests

None defined yet.

cfahlgren1ย 
posted an update 28 days ago
view post
Post
351
I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-results

You can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!!

cfahlgren1ย 
posted an update about 2 months ago
cfahlgren1ย 
posted an update 2 months ago
view post
Post
1712
Yesterday, we dropped a new conversational viewer for datasets on the hub! ๐Ÿ’ฌ

Actually being able to view and inspect your data is extremely important. This is a big step in making data more accessible and actionable for everyone.

Here's some datasets you can try it out on:
โ€ข mlabonne/FineTome-100k
โ€ข Salesforce/APIGen-MT-5k
โ€ข open-thoughts/OpenThoughts2-1M
โ€ข allenai/tulu-3-sft-mixture

Any other good ones?
  • 1 reply
ยท
cfahlgren1ย 
posted an update 6 months ago
view post
Post
2343
If you haven't seen yet, we just released Inference Providers ๐Ÿ”€

> 4 new serverless inference providers on the Hub ๐Ÿคฏ
> Use your HF API key or personal key with all providers ๐Ÿ”‘
> Chat with Deepseek R1, V3, and more on HF Hub ๐Ÿ‹
> We support Sambanova, TogetherAI, Replicate, and Fal.ai ๐Ÿ’ช

Best of all, we don't charge any markup on top of the provider ๐Ÿซฐ Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.
cfahlgren1ย 
posted an update 6 months ago
view post
Post
1780
Wow, I just added Langfuse tracing to the Deepseek Artifacts app and it's really nice ๐Ÿ”ฅ

It allows me to visualize and track more things along with the cfahlgren1/react-code-instructions dataset.

It was just added as a one click Docker Space template, so it's super easy to self host ๐Ÿ’ช
cfahlgren1ย 
posted an update 7 months ago
view post
Post
2267
You'll notice the AI in the SQL Console is much better at working with chatml conversations:

Here's example of unnesting the cfahlgren1/react-code-instructions in less than 10 seconds by asking it. Check it out here: cfahlgren1/react-code-instructions

- "show me the average assistant response length"
- "extract user, system, and assistant messages into separate columns"

It's super easy to work with conversational datasets now with natural language ๐Ÿ—ฃ๏ธ





  • 2 replies
ยท
cfahlgren1ย 
posted an update 7 months ago
cfahlgren1ย 
posted an update 8 months ago
view post
Post
1940
You can just ask things ๐Ÿ—ฃ๏ธ

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset ๐Ÿ”ฅ

argilla/magpie-ultra-v1.0

cfahlgren1ย 
posted an update 8 months ago
view post
Post
3057
We just dropped an LLM inside the SQL Console ๐Ÿคฏ

The amazing, new Qwen/Qwen2.5-Coder-32B-Instruct model can now write SQL for any Hugging Face dataset โœจ

It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think ๐Ÿค—
cfahlgren1ย 
posted an update 8 months ago
view post
Post
930
observers ๐Ÿ”ญ - automatically log all OpenAI compatible requests to a dataset๐Ÿ’ฝ

โ€ข supports any OpenAI compatible endpoint ๐Ÿ’ช
โ€ข supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts
cfahlgren1ย 
posted an update 8 months ago
view post
Post
879
You can create charts, leaderboards, and filters on top of any Hugging Face dataset in less than a minute

โ€ข ASCII Bar Charts ๐Ÿ“Š
โ€ข Powered by DuckDB WASM โšก
โ€ข Download results to Parquet ๐Ÿ’ฝ
โ€ข Embed and Share results with friends ๐Ÿ“ฌ

Do you have any interesting queries?
cfahlgren1ย 
posted an update 8 months ago
cfahlgren1ย 
posted an update 8 months ago
view post
Post
3259
You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned
  • 1 reply
ยท
cfahlgren1ย 
posted an update 8 months ago
view post
Post
2253
Why use Google Drive when you can have:

โ€ข Free storage with generous limits๐Ÿ†“
โ€ข Dataset Viewer (Sorting, Filtering, FTS) ๐Ÿ”
โ€ข Third Party Library Support
โ€ข SQL Console ๐ŸŸง
โ€ข Security ๐Ÿ”’
โ€ข Community, Reach, and Visibility ๐Ÿ“ˆ

It's a no brainer!

Check out our post on what you get instantly out of the box when you create a dataset.
https://huggingface.co/blog/researcher-dataset-sharing
  • 1 reply
ยท
cfahlgren1ย 
posted an update 9 months ago
view post
Post
1164
If you are like me, I like to find up and coming datasets and spaces before everyone else.

I made a trending repo space cfahlgren1/trending-repos where it shows:

- New up and coming Spaces in the last day
- New up and coming Datasets in the last 2 weeks

It's a really good way to find some new gems before they become popular. For example, someone is working on a way to dynamically create assets inside a video game here: gptcall/AI-Game-Creator

cfahlgren1ย 
posted an update 10 months ago
view post
Post
1892
Have you tried the new SQL Console yet?

Would love to know any queries you've tried or general feedback! If you haven't go try it out and let us know ๐Ÿค—

If you have some interesting queries feel free to share the URLs as well!
  • 1 reply
ยท
cfahlgren1ย 
posted an update 11 months ago
cfahlgren1ย 
posted an update 12 months ago
view post
Post
1057
You can now embed your heatmap anywhere with a simple change :)

Currently just supports model creation! You can duplicate the space and create your own here:

cfahlgren1/my-heatmap
  • 2 replies
ยท