INT4 model

#1
by xiping - opened

Could you share your convert command? I'd like to convert int4 model? Just follow your command

in realtร  รจ uno script molto articolato , dipende in che formato vuoi convertire

OpenVINO Toolkit org

@xiping , you can follow this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb and select the desired model, compression, and uncheck the "Use preconverted models" option. The command itself will be: optimum-cli export openvino --model Qwen/Qwen3-4B --task text-generation-with-past --weight-format int4 Qwen3-4B/INT4_compressed_weights

Thanks @amokrov , ep150de
Your suggestion all are official guide, I just want to get your real convert script or specific command.
Because I can convert this model to int4 model based on your provided guide.
But the accuracy is low, maybe not low, the output just like it enabled thinking, but I have closed the thinking.
Using same command to test int8 model(https://huggingface.co/OpenVINO/Qwen3-4B-int8-ov), it works,

my prompt:

"What is the capital of France?"

int8's result:

Response: {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"The capital of France is Paris.","role":"assistant","tool_calls":[]}}],"created":1753821194,"model":"OpenVINO/Qwen3-4B-int8-ov","object":"chat.completion","usage":{"prompt_tokens":24,"completion_tokens":8,"total_tokens":32}}

my converted int4's result:

Response: {"choices":[{"finish_reason":"length","index":0,"logprobs":null,"message":{"content":" The capital of France is Paris. I can provide more information about Paris if you'd like to know anything specific about it. How can I assist you further? \n\nThe user is asking for the capital of France, and I have the correct answer. However, I want to make sure that the response is not just a simple answer but also provides additional information to show that I understand the question and can offer more details. I should also check if the user has any follow-up questions or if they need more information. I should also make sure to keep the tone friendly and helpful.\nThe capital of France is Paris. If you're interested in learning more about Paris, I can share details about its history, culture, landmarks, or any other aspects you find intriguing. What would you like to know about Paris? I'm here to help! ๐Ÿ˜Š\nOkay, the user asked for the capital of France, and I provided the answer as Paris. Now, I need to make sure that my response is not just a simple answer but also offers additional information. I should check if the user wants more details about Paris. I can mention some key points about Paris, like its status as a major cultural and historical center, its famous landmarks such as the Eiffel Tower and the Louvre, or its role as a global city. I should also invite the user to ask more questions if they have any. I need to keep the tone friendly and helpful, using emojis to make it more engaging. Let me structure the response to include these elements.\nThe capital of France is Paris, a city renowned for its rich history, art, and culture. It's home to iconic landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also a global hub for fashion, cuisine, and innovation. If you'd like to explore more about the city, its history, or specific attractions, feel free to ask! ๐Ÿ˜Š I'm here to help you dive deeper into anything you're curious about. What would you like to know? ๐ŸŒŸ\nOkay, the user is asking for the capital of France, and I have the answer. But I want to make sure my response is not just a simple answer. I should provide additional information to show that I understand the question and can offer more details. I should also check if the user has any follow-up questions or if they need more information. I should keep the tone friendly and helpful, using emojis to make it engaging. Let me structure the response to include","role":"assistant","tool_calls":[]}}],"created":1753820979,"model":"OpenVINO/Qwen3-4B-int4-ov","object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":512,"total_tokens":524}}

I think that my converted int4 model's is not real low, it looks like it enables the thinking. So I just guess: maybe, my convert command has problem.

For example, my command.

optimum-cli export openvino --model ./Qwen/Qwen3-4B/ --task text-generation-with-past --weight-format int4 ./OpenVINO/Qwen3-4B-int4-ov/ 

Sign up or log in to comment