JusteLeo
/

Qwen3-0.6B-T5-xxl-split

Text Generation

Model card Files Files and versions

Qwen3-0.6B-T5-xxl-split / README.md

JusteLeo's picture

Update README.md

7985a06 verified 3 months ago

|

history blame contribute delete

2.99 kB

	---
	license: apache-2.0
	language:
	- en
	base_model: JusteLeo/Qwen3-0.6B-T5-xxl
	tags:
	- split
	- encoder
	- embedding
	- Text Generation
	---

	# Qwen3-0.6B-T5-xxl-split

	## Model Description

	This repository provides the components of the `Qwen3-0.6B-T5-xxl` model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications.

	Both components are provided in float32 format to ensure maximum precision for downstream tasks like quantization.

	## Repository Contents

	- /qwen_body/: Contains the fine-tuned `Qwen3-0.6B` model body. This is a standard Hugging Face model directory. The model weights are in `float32`.
	- /projection_head/: Contains the fine-tuned projection head as a single `projection_head.pth` file. This is a PyTorch state dictionary.

	## How to Use

	To use these components, you need to load them separately and then combine them in a two-step inference process.

	```python
	import torch
	from torch import nn
	from transformers import AutoTokenizer, AutoModel
	import numpy as np

	# --- 1. Load Components ---
	device = "cuda"

	# Load the model body
	body_model = AutoModel.from_pretrained("./qwen_body").to(device)
	tokenizer = AutoTokenizer.from_pretrained("./qwen_body")

	# Load the projection head
	# First, re-create the architecture
	input_dim = body_model.config.hidden_size # 1024
	hidden_dim = 2048
	output_dim = 4096
	head_model = nn.Sequential(
	nn.Linear(input_dim, hidden_dim),
	nn.GELU(),
	nn.Dropout(0.1),
	nn.Linear(hidden_dim, output_dim)
	).to(device)
	# Then, load the saved weights
	head_model.load_state_dict(torch.load("./projection_head/projection_head.pth"))

	body_model.eval()
	head_model.eval()

	# --- 2. Create a unified inference function ---
	def get_final_embedding(text: str):
	# a) Tokenize the input text
	inputs = tokenizer(text, return_tensors="pt").to(device)

	# b) Get the base embedding from the body model
	with torch.no_grad():
	outputs_body = body_model(**inputs)
	last_hidden_state = outputs_body.last_hidden_state

	# c) Perform mean pooling
	attention_mask = inputs['attention_mask']
	mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
	sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1)
	sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9)
	pooled_embedding = sum_embeddings / sum_mask

	# d) Pass the pooled embedding through the projection head
	with torch.no_grad():
	final_embedding = head_model(pooled_embedding)

	return final_embedding

	# --- 3. Test the pipeline ---
	prompt = "A high-tech laboratory with glowing vials and holographic displays."
	embedding = get_final_embedding(prompt)

	print("Inference successful!")
	print(f"Output shape: {embedding.shape}")
	# Expected output shape: (1, 4096)
	```

	## License

	This repository is licensed under the Apache license 2.0.