You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

NMIXX-BGE

This repository contains a bge-large-en-v1.5-based SentenceTransformer model fine-tuned with a triplet-loss setup on the nmixx-fin/NMIXX_train dataset. It produces high-quality sentence embeddings for Korean financial text, optimized for semantic similarity tasks in the finance domain.


How to Use

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# 1. Load tokenizer & model from Hugging Face Hub
repo_name = "nmixx-fin/nmixx-bge"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModel.from_pretrained(repo_name)

# 2. Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# 3. Prepare input sentences
sentences = [
    "์ด ๋ชจ๋ธ์€ ํ•œ๊ตญ ๊ธˆ์œต ๋„๋ฉ”์ธ์— ํŠนํ™”๋œ ์ž„๋ฒ ๋”ฉ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.",
    "NMIXX ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ fine-tuning๋œ sentence transformer์ž…๋‹ˆ๋‹ค."
]

# 4. Tokenize
encoded_input = tokenizer(
    sentences,
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt"
)
input_ids = encoded_input["input_ids"].to(device)
attention_mask = encoded_input["attention_mask"].to(device)

# 5. Forward pass (token embeddings)
with torch.no_grad():
    model_output = model(input_ids=input_ids, attention_mask=attention_mask)

# 6. CLS Pooling (BGE models use CLS token)
sentence_embeddings = model_output[0][:, 0]  # Use CLS token (first token)

# 7. L2 Normalization
sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)

print("Sentence embeddings shape:", sentence_embeddings.shape)
print(sentence_embeddings.cpu())
Downloads last month
13
Safetensors
Model size
335M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nmixx-fin/nmixx-bge

Finetuned
(34)
this model

Dataset used to train nmixx-fin/nmixx-bge