While that may be one reason, it doesn't fully explain why there are still many quantized models available for LLaMA 3.1 and LLaMA 3.3.
wenhua cheng
wenhuach
AI & ML interests
Model Compression, CV
Recent Activity
Organizations
wenhuach's activity
replied to
their
post
19 days ago
Post
2323
Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
posted
an
update
26 days ago
Post
2323
Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc
replied to
their
post
about 1 month ago
You can try using auto-round-fast xxx for a slight accuracy drop, or auto-round-fast xxx --nsamples 1 --iters 1 for very fast execution without algorithm tuning.
replied to
their
post
about 1 month ago
Thank you for your suggestion. As our focus is on algorithm development and our computational resources are limited, we currently lack the bandwidth to support a large number of models. If you come across any models that would benefit from quantization, feel free to comment on any models under OPEA. We will make an effort to prioritize and quantize them if resources allow.
Post
1810
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here:
OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
posted
an
update
about 1 month ago
Post
1810
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here:
OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
Post
339
This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!
https://huggingface.co/OPEA
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!
https://huggingface.co/OPEA
replied to
their
post
about 1 month ago
Sure, we will have a try
posted
an
update
about 1 month ago
Post
339
This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!
https://huggingface.co/OPEA
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!
https://huggingface.co/OPEA
Post
983
OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA
posted
an
update
about 2 months ago
Post
983
OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA
posted
an
update
6 months ago
Post
651
Try to find a better int4 algorithm for LLAMA3.1? For the 8B model, AutoRound boasts an average improvement across 10 zero-shot tasks, scoring 63.93 versus 63.15 (AWQ). Notably, on the MMLU task, it achieved 66.72 compared to 65.25, and on the ARC-C task, it scored 52.13 against 50.94. For further details and comparisons, visit the leaderboard at
Intel/low_bit_open_llm_leaderboard.
posted
an
update
7 months ago
Post
539
Check out AutoRound, SOTA LLM quantization algorithm across 2-4 bits without adding any inference overhead to any model
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard