Post
268
I appreciate your interest in the data I've uploaded, but I would prefer if you didn't use macros to download it.
When I see over 100 people registering and downloading from the mechanicspedia.com domain in a single day, it appears to be macro activity.
My purpose for uploading training data to Hugging Face Hub is as follows:
1. To prevent others from experiencing the inconveniences I've faced
During AI projects, I waste a lot of time on data preprocessing, and thinking that others besides me are also wasting time on this, I started uploading data to Hugging Face Hub to help reduce that time.
2. To improve the convenience of data preprocessing
I wanted to make column names and data consistent for image, audio, and text data to reduce the time spent on data preprocessing.
For example, when mixing datasets A and B together, the unique columns of dataset A and the column names of dataset B wouldn't match, so I had to handle exceptions using if statements and branching. I didn't like that data preprocessing code in most projects ended up being single-use code.
So for pretrain data, I used conventions like 'corpus' and 'sentence_ls', and for ASR data, 'audio' and 'sentence', to minimize meaningless coding work in projects in those fields.
Yes... of course, many people probably wonder how to use it because I haven't written detailed docstrings.
I'm aware of this, and I'll be adding docstrings soon.
The purpose of uploading to Hugging Face Hub is partly for my own use, but also because I wanted to reduce stress about data in AI projects.
However, using public server domains like mechanicspedia.com to register 104 people in a short period and download data doesn't seem like normal usage.
So please refrain from such macro activities
When I see over 100 people registering and downloading from the mechanicspedia.com domain in a single day, it appears to be macro activity.
My purpose for uploading training data to Hugging Face Hub is as follows:
1. To prevent others from experiencing the inconveniences I've faced
During AI projects, I waste a lot of time on data preprocessing, and thinking that others besides me are also wasting time on this, I started uploading data to Hugging Face Hub to help reduce that time.
2. To improve the convenience of data preprocessing
I wanted to make column names and data consistent for image, audio, and text data to reduce the time spent on data preprocessing.
For example, when mixing datasets A and B together, the unique columns of dataset A and the column names of dataset B wouldn't match, so I had to handle exceptions using if statements and branching. I didn't like that data preprocessing code in most projects ended up being single-use code.
So for pretrain data, I used conventions like 'corpus' and 'sentence_ls', and for ASR data, 'audio' and 'sentence', to minimize meaningless coding work in projects in those fields.
Yes... of course, many people probably wonder how to use it because I haven't written detailed docstrings.
I'm aware of this, and I'll be adding docstrings soon.
The purpose of uploading to Hugging Face Hub is partly for my own use, but also because I wanted to reduce stress about data in AI projects.
However, using public server domains like mechanicspedia.com to register 104 people in a short period and download data doesn't seem like normal usage.
So please refrain from such macro activities