sudongpo commited on
Commit
9ee4337
·
verified ·
1 Parent(s): 4dddd40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -1,5 +1,14 @@
 
 
 
 
 
 
 
 
 
1
  ## CoDi-Embedding-V1
2
- CoDi-Embedding-V1 is an outstanding embedding model that supports both Chinese and English retrieval, with particularly exceptional performance in Chinese retrieval. It has achieved SOTA results on the Chinese MTEB benchmark as of August 20, 2025. Based on the ![MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) model, CoDi-Embedding-V1 extends the maximum sequence length from 512 to 4,196 tokens, significantly enhancing its capability for long-document retrieval. The model employs mean pooling strategy, where tokens from the instruction are excluded during pooling to optimize retrieval effectiveness.
3
 
4
  ### Model Description
5
  - **Maximum Sequence Length:** 4096 tokens
@@ -39,4 +48,4 @@ document_embeddings = model.encode(documents)
39
  # Get the similarity scores for the embeddings
40
  similarity = model.similarity(query_embeddings, document_embeddings)
41
  print(similarity)
42
- ```
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ base_model:
6
+ - openbmb/MiniCPM-Embedding
7
+ pipeline_tag: sentence-similarity
8
+ library_name: sentence-transformers
9
+ ---
10
  ## CoDi-Embedding-V1
11
+ CoDi-Embedding-V1 is an outstanding embedding model that supports both Chinese and English retrieval, with particularly exceptional performance in Chinese retrieval. It has achieved SOTA results on the Chinese MTEB benchmark as of August 20, 2025. Based on the [MiniCPM-Embedding](https://huggingface.co/openbmb/MiniCPM-Embedding) model, CoDi-Embedding-V1 extends the maximum sequence length from 512 to 4,196 tokens, significantly enhancing its capability for long-document retrieval. The model employs mean pooling strategy, where tokens from the instruction are excluded during pooling to optimize retrieval effectiveness.
12
 
13
  ### Model Description
14
  - **Maximum Sequence Length:** 4096 tokens
 
48
  # Get the similarity scores for the embeddings
49
  similarity = model.similarity(query_embeddings, document_embeddings)
50
  print(similarity)
51
+ ```