janni-t/qwen3-embedding-0.6b-tei-onnx · Error: Could not create backend, Weights not found

2025-07-01 03:18:58.601 | 2025-07-01T00:18:58.601275Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "jan**-*/*****-*********-*.**-***-*nnx", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 512, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "8dd3050edaef", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-07-01 03:18:58.673 | 2025-07-01T00:18:58.673021Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-07-01 03:19:00.142 | 2025-07-01T00:19:00.142784Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-07-01 03:19:00.142 | 2025-07-01T00:19:00.142839Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-07-01 03:19:00.142 | 2025-07-01T00:19:00.142847Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-07-01 03:19:00.142 | 2025-07-01T00:19:00.142854Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.469837314s
2025-07-01 03:19:00.400 | 2025-07-01T00:19:00.400321Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
2025-07-01 03:19:00.400 | 2025-07-01T00:19:00.400353Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 32768
2025-07-01 03:19:00.400 | 2025-07-01T00:19:00.400480Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 20 tokenization workers
2025-07-01 03:19:01.014 | 2025-07-01T00:19:01.014496Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
2025-07-01 03:19:01.015 | 2025-07-01T00:19:01.015497Z  INFO text_embeddings_backend: backends/src/lib.rs:510: Downloading `model.safetensors`
2025-07-01 03:19:01.191 | 2025-07-01T00:19:01.190924Z  WARN text_embeddings_backend: backends/src/lib.rs:513: Could not download `model.safetensors`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/janni-t/qwen3-embedding-0.6b-tei-onnx/resolve/main/model.safetensors)
2025-07-01 03:19:01.191 | 2025-07-01T00:19:01.190945Z  INFO text_embeddings_backend: backends/src/lib.rs:518: Downloading `model.safetensors.index.json`
2025-07-01 03:19:01.359 | 2025-07-01T00:19:01.359068Z  WARN text_embeddings_backend: backends/src/lib.rs:386: safetensors weights not found. Using `pytorch_model.bin` instead. Model loading will be significantly slower.
2025-07-01 03:19:01.359 | 2025-07-01T00:19:01.359094Z  INFO text_embeddings_backend: backends/src/lib.rs:387: Downloading `pytorch_model.bin`
2025-07-01 03:19:01.622 | Error: Could not create backend
2025-07-01 03:19:01.622 | 
2025-07-01 03:19:01.623 | Caused by:
2025-07-01 03:19:01.623 |     Weights not found: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/janni-t/qwen3-embedding-0.6b-tei-onnx/resolve/main/pytorch_model.bin)

Hi, why I get this error with following settings?

services:
  embedding:
    image: ghcr.io/huggingface/text-embeddings-inference:cuda-1.7.3
    ports:
      - "8089:80"
    volumes:
      - embedding_cache:/data
    command: ["--model-id", "janni-t/qwen3-embedding-0.6b-tei-onnx","--dtype", "float16", "--max-client-batch-size", "512", "--pooling", "mean"]
    environment:
      - USE_FLASH_ATTENTION=True
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]