This is the [inceptionai/jais-family-13b](https://huggingface.co/inceptionai/jais-family-13b) model converted to [OpenVINO](https://docs.openvino.ai/2025/index.html) with INT4 weight compression. ## Download the model - Install huggingface-hub ```sh pip install huggingface-hub[cli] ``` - Download the model ```sh huggingface-cli download helenai/jais-family-13b-ov-int4-sym --local-dir helenai/jais-family-13b-ov-int4-sym ``` ## Run inference - Install/upgrade OpenVINO GenAI nightly (this is the only requirement for inference - no need to install transformers or PyTorch) ```sh pip install --pre --upgrade openvino-genai openvino openvino-tokenizers --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release ``` - Download a sample inference script (curl -O works on Windows Command Prompt and most Linux terminals). Note that this is not a chat/instruct model; this inference script is only for testing model outputs. The model is not finetuned for answering questions, and the inference script will not remember history. ```sh curl -O https://raw.githubusercontent.com/helena-intel/snippets/refs/heads/main/llm_chat/python/llm_test.py ``` - Run the script with the path to the model and the device as parameters. Change GPU to CPU to run on CPU. NPU is not yet supported for this model ```sh python llm_test.py jais-family-13b-ov-int4-sym GPU ``` Check out [OpenVINO GenAI documentation](https://docs.openvino.ai/2025/openvino-workflow-generative/inference-with-genai.html) for more information. ## Model compression parameters ``` openvino_version : 2025.2.0-18660-3ceeeb52d64 advanced_parameters : {'statistics_path': None, 'awq_params': {'subset_size': 32, 'percent_to_apply': 0.002, 'alpha_min': 0.0, 'alpha_max': 1.0, 'steps': 100}, 'scale_estimation_params': {'subset_size': 64, 'initial_steps': 5, 'scale_steps': 5, 'weight_penalty': -1.0}, 'gptq_params': {'damp_percent': 0.1, 'block_size': 128, 'subset_size': 128}, 'lora_correction_params': {'adapter_rank': 8, 'num_iterations': 3, 'apply_regularization': True, 'subset_size': 128, 'use_int8_adapters': True}} all_layers : False awq : False backup_mode : int8_asym gptq : False group_size : -1 ignored_scope : [] lora_correction : False mode : int4_sym ratio : 1.0 scale_estimation : False sensitivity_metric : weight_quantization_error optimum_intel_version : 1.22.0 optimum_version : 1.24.0 pytorch_version : 2.5.1+cpu transformers_version : 4.48.3 ```