99 7 43

jp1924 PRO

jp1924

jp1924

AI & ML interests

Audio, Image, Text

Recent Activity

liked a dataset 8 days ago

MLCommons/peoples_speech

posted an update 11 days ago

I appreciate your interest in the data I've uploaded, but I would prefer if you didn't use macros to download it. When I see over 100 people registering and downloading from the mechanicspedia.com domain in a single day, it appears to be macro activity. My purpose for uploading training data to Hugging Face Hub is as follows: 1. To prevent others from experiencing the inconveniences I've faced During AI projects, I waste a lot of time on data preprocessing, and thinking that others besides me are also wasting time on this, I started uploading data to Hugging Face Hub to help reduce that time. 2. To improve the convenience of data preprocessing I wanted to make column names and data consistent for image, audio, and text data to reduce the time spent on data preprocessing. For example, when mixing datasets A and B together, the unique columns of dataset A and the column names of dataset B wouldn't match, so I had to handle exceptions using if statements and branching. I didn't like that data preprocessing code in most projects ended up being single-use code. So for pretrain data, I used conventions like 'corpus' and 'sentence_ls', and for ASR data, 'audio' and 'sentence', to minimize meaningless coding work in projects in those fields. Yes... of course, many people probably wonder how to use it because I haven't written detailed docstrings. I'm aware of this, and I'll be adding docstrings soon. The purpose of uploading to Hugging Face Hub is partly for my own use, but also because I wanted to reduce stress about data in AI projects. However, using public server domains like mechanicspedia.com to register 104 people in a short period and download data doesn't seem like normal usage. So please refrain from such macro activities

updated a dataset 14 days ago

jp1924/FitnessPoseImageDataset

View all activity

Organizations

Posts 1

Post

268

I appreciate your interest in the data I've uploaded, but I would prefer if you didn't use macros to download it.

When I see over 100 people registering and downloading from the mechanicspedia.com domain in a single day, it appears to be macro activity.

My purpose for uploading training data to Hugging Face Hub is as follows:

1. To prevent others from experiencing the inconveniences I've faced
During AI projects, I waste a lot of time on data preprocessing, and thinking that others besides me are also wasting time on this, I started uploading data to Hugging Face Hub to help reduce that time.

2. To improve the convenience of data preprocessing
I wanted to make column names and data consistent for image, audio, and text data to reduce the time spent on data preprocessing.
For example, when mixing datasets A and B together, the unique columns of dataset A and the column names of dataset B wouldn't match, so I had to handle exceptions using if statements and branching. I didn't like that data preprocessing code in most projects ended up being single-use code.
So for pretrain data, I used conventions like 'corpus' and 'sentence_ls', and for ASR data, 'audio' and 'sentence', to minimize meaningless coding work in projects in those fields.

Yes... of course, many people probably wonder how to use it because I haven't written detailed docstrings.
I'm aware of this, and I'll be adding docstrings soon.

The purpose of uploading to Hugging Face Hub is partly for my own use, but also because I wanted to reduce stress about data in AI projects.

However, using public server domains like mechanicspedia.com to register 104 people in a short period and download data doesn't seem like normal usage.
So please refrain from such macro activities