Benhao Tang's picture

Benhao Tang PRO

benhaotang

AI & ML interests

Physics Master student in theoretical particle physics at UniversitΓ€t Heidelberg, actively looking into the possibilities of integrating AI into future physics research.

Recent Activity

Organizations

None yet

benhaotang's activity

upvoted an article about 10 hours ago
view article
Article

πŸΊπŸ¦β€β¬› LLM Comparison/Test: Phi-4, Qwen2 VL 72B Instruct, Aya Expanse 32B in my updated MMLU-Pro CS benchmark

By wolfram β€’
β€’ 4
New activity in huggingface/HuggingDiscussions 1 day ago
reacted to mitkox's post with πŸ‘ 5 days ago
view post
Post
2028
llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.
Β·
reacted to sometimesanotion's post with πŸ”₯ 6 days ago
view post
Post
2582
I've managed a #1 score of 41.22% average for 14B parameter models on the Open LLM Leaderboard. As of this writing, sometimesanotion/Lamarck-14B-v0.7 is #8 for all models up to 70B parameters.

It took a custom toolchain around Arcee AI's mergekit to manage the complex merges, gradients, and LoRAs required to make this happen. I really like seeing features of many quality finetunes in one solid generalist model.
Β·
New activity in benhaotang/phi4-qwq-sky-t1 8 days ago