CoDi-Embedding-V1
CoDi-Embedding-V1 is an outstanding embedding model that supports both Chinese and English retrieval, with particularly exceptional performance in Chinese retrieval. It has achieved SOTA results on the Chinese MTEB benchmark as of August 20, 2025. Based on the MiniCPM-Embedding model, CoDi-Embedding-V1 extends the maximum sequence length from 512 to 4,196 tokens, significantly enhancing its capability for long-document retrieval. The model employs mean pooling strategy, where tokens from the instruction are excluded during pooling to optimize retrieval effectiveness.
Model Description
- Maximum Sequence Length: 4096 tokens
- Output Dimensionality: 2304
- Model Size: 2.4B
Requirements
transformers>=4.37.2
Usage
Sentence Transformers
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(model_name_or_path)
queries = ["结算业务系统用户使用"]
documents = [
"根据解冻日输入范围,查询出该时间范围内到期的账户冻结列表。",
"智能定期存款到期日为节假日时处理”设置提前或顺延,支持智能定期证实书提前或顺延到期提醒。",
"账户开户时设置了账户到期日,账户到期提醒是根据全机构系统参数设置"
]
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Get the similarity scores for the embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support