None's picture

None

Thireus

·

AI & ML interests

None yet

Recent Activity

liked a dataset 25 minutes ago

Salesforce/wikitext

liked a model about 6 hours ago

anikifoss/DeepSeek-R1-0528-DQ4_K_R4

new activity about 7 hours ago

Qwen/Qwen3-Embedding-0.6B:ONNX version planned?

View all activity

Organizations

None yet

Thireus's activity

liked a dataset 25 minutes ago

Salesforce/wikitext

Viewer • Updated Jan 4, 2024 • 3.71M • 741k • 461

liked a model about 6 hours ago

anikifoss/DeepSeek-R1-0528-DQ4_K_R4

Text Generation • Updated 7 days ago • 102 • 6

New activity in Qwen/Qwen3-Embedding-0.6B about 7 hours ago

ONNX version planned?

#17 opened about 17 hours ago by

liked a model about 7 hours ago

Qwen/Qwen3-Embedding-0.6B

Feature Extraction • Updated 3 days ago • 42k • 176

New activity in anikifoss/DeepSeek-R1-0528-DQ4_K_R4 about 8 hours ago

Metrics for 110k context size?

#4 opened about 8 hours ago by

New activity in ubergarm/DeepSeek-R1-0528-GGUF 1 day ago

Scripts to produce PPL and KLD diagrams?

#10 opened 1 day ago by

New activity in anikifoss/DeepSeek-R1-0528-DQ4_K_R4 5 days ago

DeepSeek-R1-0528-DQ2_K_R4

#3 opened 6 days ago by

liked 2 models 5 days ago

ubergarm/DeepSeek-R1-0528-GGUF

Text Generation • Updated 2 days ago • 2.37k • 18

unsloth/DeepSeek-R1-0528-GGUF

Text Generation • Updated about 2 hours ago • 91.1k • 141

New activity in kalomaze/Qwen3-16B-A3B 5 days ago

DeepSeek R1 0528?

#15 opened 5 days ago by

This model almost completely loses Chinese ablities

#14 opened 29 days ago by

liked a model about 1 month ago

DavidAU/Qwen3-4B-Q8_0-64k-128k-256k-context-GGUF

Text Generation • Updated May 2 • 452 • 2

New activity in unsloth/Qwen3-32B-GGUF about 1 month ago

Potentially still broken?

#8 opened about 1 month ago by

liked a model about 1 month ago

unsloth/Qwen3-32B-128K-GGUF

Text Generation • Updated 17 days ago • 12.2k • 21

New activity in kalomaze/Qwen3-16B-A3B about 1 month ago

Brainstorming

#6 opened about 1 month ago by

New activity in unsloth/Qwen3-32B-128K-GGUF about 1 month ago

UD version not doing great with YaRN compared to non-UD of the same size

#4 opened about 1 month ago by

New activity in unsloth/Qwen3-32B-GGUF about 1 month ago

PPL vs model size - safe to assume larger size == better accuracy regardless of UD vs non-UD?

#6 opened about 1 month ago by

New activity in Qwen/Qwen3-32B about 1 month ago

Potential issue with large context sizes - can someone confirm?

#18 opened about 1 month ago by

New activity in Qwen/Qwen3-235B-A22B about 1 month ago

Qwen is loosing broad knowledge since Qwen2.

#16 opened about 1 month ago by

New activity in unsloth/Qwen3-235B-A22B-128K-GGUF about 1 month ago

YaRN not enabled correctly

#3 opened about 1 month ago by