Xiang Zhang
fancyzhx
		AI & ML interests
None yet
		
		Organizations
None yet
Video Datasets
			
			
	
	Text Datasets
			
			
	
	- 
	
	
	Running125125TxT360: Trillion Extracted Text📖Explore and utilize a large, deduplicated text dataset for LLM training 
- 
	
	
	CASIA-LM/ChineseWebText2.0Viewer • Updated • 2k • 1.57k • 26
- 
	
	
	HPLT/HPLT2.0_cleanedViewer • Updated • 9.03B • 186k • 36
- 
	
	
	TrevorDohm/Pile_TokenizedViewer • Updated • 134M • 93
Audio Datasets
			
			
	
	Robotic Datasets
			
			
	
	Video Datasets
			
			
	
	Image Datasets
			
			
	
	Text Datasets
			
			
	
	- 
	
	
	Running125125TxT360: Trillion Extracted Text📖Explore and utilize a large, deduplicated text dataset for LLM training 
- 
	
	
	CASIA-LM/ChineseWebText2.0Viewer • Updated • 2k • 1.57k • 26
- 
	
	
	HPLT/HPLT2.0_cleanedViewer • Updated • 9.03B • 186k • 36
- 
	
	
	TrevorDohm/Pile_TokenizedViewer • Updated • 134M • 93