INT4 model
Could you share your convert command? I'd like to convert int4 model? Just follow your command
in realtร รจ uno script molto articolato , dipende in che formato vuoi convertire
@xiping , you can follow this notebook https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/llm-chatbot-generate-api.ipynb and select the desired model, compression, and uncheck the "Use preconverted models" option. The command itself will be: optimum-cli export openvino --model Qwen/Qwen3-4B --task text-generation-with-past --weight-format int4 Qwen3-4B/INT4_compressed_weights
Thanks
@amokrov
, ep150de
Your suggestion all are official guide, I just want to get your real convert script or specific command.
Because I can convert this model to int4 model based on your provided guide.
But the accuracy is low, maybe not low, the output just like it enabled thinking, but I have closed the thinking.
Using same command to test int8 model(https://huggingface.co/OpenVINO/Qwen3-4B-int8-ov), it works,
my prompt:
"What is the capital of France?"
int8's result:
Response: {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"The capital of France is Paris.","role":"assistant","tool_calls":[]}}],"created":1753821194,"model":"OpenVINO/Qwen3-4B-int8-ov","object":"chat.completion","usage":{"prompt_tokens":24,"completion_tokens":8,"total_tokens":32}}
my converted int4's result:
Response: {"choices":[{"finish_reason":"length","index":0,"logprobs":null,"message":{"content":" The capital of France is Paris. I can provide more information about Paris if you'd like to know anything specific about it. How can I assist you further? \n\nThe user is asking for the capital of France, and I have the correct answer. However, I want to make sure that the response is not just a simple answer but also provides additional information to show that I understand the question and can offer more details. I should also check if the user has any follow-up questions or if they need more information. I should also make sure to keep the tone friendly and helpful.\nThe capital of France is Paris. If you're interested in learning more about Paris, I can share details about its history, culture, landmarks, or any other aspects you find intriguing. What would you like to know about Paris? I'm here to help! ๐\nOkay, the user asked for the capital of France, and I provided the answer as Paris. Now, I need to make sure that my response is not just a simple answer but also offers additional information. I should check if the user wants more details about Paris. I can mention some key points about Paris, like its status as a major cultural and historical center, its famous landmarks such as the Eiffel Tower and the Louvre, or its role as a global city. I should also invite the user to ask more questions if they have any. I need to keep the tone friendly and helpful, using emojis to make it more engaging. Let me structure the response to include these elements.\nThe capital of France is Paris, a city renowned for its rich history, art, and culture. It's home to iconic landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. Paris is also a global hub for fashion, cuisine, and innovation. If you'd like to explore more about the city, its history, or specific attractions, feel free to ask! ๐ I'm here to help you dive deeper into anything you're curious about. What would you like to know? ๐\nOkay, the user is asking for the capital of France, and I have the answer. But I want to make sure my response is not just a simple answer. I should provide additional information to show that I understand the question and can offer more details. I should also check if the user has any follow-up questions or if they need more information. I should keep the tone friendly and helpful, using emojis to make it engaging. Let me structure the response to include","role":"assistant","tool_calls":[]}}],"created":1753820979,"model":"OpenVINO/Qwen3-4B-int4-ov","object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":512,"total_tokens":524}}
I think that my converted int4 model's is not real low, it looks like it enables the thinking. So I just guess: maybe, my convert command has problem.
For example, my command.
optimum-cli export openvino --model ./Qwen/Qwen3-4B/ --task text-generation-with-past --weight-format int4 ./OpenVINO/Qwen3-4B-int4-ov/