|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: JusteLeo/Qwen3-0.6B-T5-xxl |
|
|
tags: |
|
|
- split |
|
|
- encoder |
|
|
- embedding |
|
|
- Text Generation |
|
|
--- |
|
|
|
|
|
# Qwen3-0.6B-T5-xxl-split |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository provides the components of the `Qwen3-0.6B-T5-xxl` model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications. |
|
|
|
|
|
Both components are provided in **float32** format to ensure maximum precision for downstream tasks like quantization. |
|
|
|
|
|
## Repository Contents |
|
|
|
|
|
- **/qwen_body/**: Contains the fine-tuned `Qwen3-0.6B` model body. This is a standard Hugging Face model directory. The model weights are in `float32`. |
|
|
- **/projection_head/**: Contains the fine-tuned projection head as a single `projection_head.pth` file. This is a PyTorch state dictionary. |
|
|
|
|
|
## How to Use |
|
|
|
|
|
To use these components, you need to load them separately and then combine them in a two-step inference process. |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from torch import nn |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
import numpy as np |
|
|
|
|
|
# --- 1. Load Components --- |
|
|
device = "cuda" |
|
|
|
|
|
# Load the model body |
|
|
body_model = AutoModel.from_pretrained("./qwen_body").to(device) |
|
|
tokenizer = AutoTokenizer.from_pretrained("./qwen_body") |
|
|
|
|
|
# Load the projection head |
|
|
# First, re-create the architecture |
|
|
input_dim = body_model.config.hidden_size # 1024 |
|
|
hidden_dim = 2048 |
|
|
output_dim = 4096 |
|
|
head_model = nn.Sequential( |
|
|
nn.Linear(input_dim, hidden_dim), |
|
|
nn.GELU(), |
|
|
nn.Dropout(0.1), |
|
|
nn.Linear(hidden_dim, output_dim) |
|
|
).to(device) |
|
|
# Then, load the saved weights |
|
|
head_model.load_state_dict(torch.load("./projection_head/projection_head.pth")) |
|
|
|
|
|
body_model.eval() |
|
|
head_model.eval() |
|
|
|
|
|
# --- 2. Create a unified inference function --- |
|
|
def get_final_embedding(text: str): |
|
|
# a) Tokenize the input text |
|
|
inputs = tokenizer(text, return_tensors="pt").to(device) |
|
|
|
|
|
# b) Get the base embedding from the body model |
|
|
with torch.no_grad(): |
|
|
outputs_body = body_model(**inputs) |
|
|
last_hidden_state = outputs_body.last_hidden_state |
|
|
|
|
|
# c) Perform mean pooling |
|
|
attention_mask = inputs['attention_mask'] |
|
|
mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float() |
|
|
sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1) |
|
|
sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9) |
|
|
pooled_embedding = sum_embeddings / sum_mask |
|
|
|
|
|
# d) Pass the pooled embedding through the projection head |
|
|
with torch.no_grad(): |
|
|
final_embedding = head_model(pooled_embedding) |
|
|
|
|
|
return final_embedding |
|
|
|
|
|
# --- 3. Test the pipeline --- |
|
|
prompt = "A high-tech laboratory with glowing vials and holographic displays." |
|
|
embedding = get_final_embedding(prompt) |
|
|
|
|
|
print("Inference successful!") |
|
|
print(f"Output shape: {embedding.shape}") |
|
|
# Expected output shape: (1, 4096) |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This repository is licensed under the **Apache license 2.0**. |