Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## CoDi-Embedding-V1
|
2 |
-
CoDi-Embedding-V1 is an outstanding embedding model that supports both Chinese and English retrieval, with particularly exceptional performance in Chinese retrieval. It has achieved SOTA results on the Chinese MTEB benchmark as of August 20, 2025. Based on the
|
3 |
|
4 |
### Model Description
|
5 |
- **Maximum Sequence Length:** 4096 tokens
|
@@ -39,4 +48,4 @@ document_embeddings = model.encode(documents)
|
|
39 |
# Get the similarity scores for the embeddings
|
40 |
similarity = model.similarity(query_embeddings, document_embeddings)
|
41 |
print(similarity)
|
42 |
-
```
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- zh
|
5 |
+
base_model:
|
6 |
+
- openbmb/MiniCPM-Embedding
|
7 |
+
pipeline_tag: sentence-similarity
|
8 |
+
library_name: sentence-transformers
|
9 |
+
---
|
10 |
## CoDi-Embedding-V1
|
11 |
+
CoDi-Embedding-V1 is an outstanding embedding model that supports both Chinese and English retrieval, with particularly exceptional performance in Chinese retrieval. It has achieved SOTA results on the Chinese MTEB benchmark as of August 20, 2025. Based on the [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) model, CoDi-Embedding-V1 extends the maximum sequence length from 512 to 4,196 tokens, significantly enhancing its capability for long-document retrieval. The model employs mean pooling strategy, where tokens from the instruction are excluded during pooling to optimize retrieval effectiveness.
|
12 |
|
13 |
### Model Description
|
14 |
- **Maximum Sequence Length:** 4096 tokens
|
|
|
48 |
# Get the similarity scores for the embeddings
|
49 |
similarity = model.similarity(query_embeddings, document_embeddings)
|
50 |
print(similarity)
|
51 |
+
```
|