Sentence Similarity
Safetensors
Japanese
modernbert
feature-extraction
hpprc commited on
Commit
6a30c5a
·
verified ·
1 Parent(s): 7a9f3a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -26,6 +26,17 @@ Ruri v3 offers several key technical advantages:
26
  - **Tokenizer based solely on SentencePiece**
27
  - Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
28
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Usage
31
 
@@ -51,7 +62,7 @@ model = SentenceTransformer("cl-nagoya/ruri-v3-310m")
51
 
52
  # Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
53
  # "" (empty string) is used for encoding semantic meaning.
54
- # "トピック: " is used for encoding topical information.
55
  # "検索クエリ: " is used for queries in retrieval tasks.
56
  # "検索文書: " is used for documents to be retrieved.
57
  sentences = [
 
26
  - **Tokenizer based solely on SentencePiece**
27
  - Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
28
 
29
+ ## Model Series
30
+
31
+ We provide Ruri-v3 in several model sizes. Below is a summary of each model.
32
+
33
+ |ID| #Param. | #Param.<br>w/o Emb.|Dim.|#Layers|Avg. JMTEB|
34
+ |-|-|-|-|-|-|
35
+ |[cl-nagoya/ruri-v3-30m](https://huggingface.co/cl-nagoya/ruri-v3-30m)|37M|10M|256|10|74.51|
36
+ |[cl-nagoya/ruri-v3-70m](https://huggingface.co/cl-nagoya/ruri-v3-70m)|70M|31M|384|13|75.48|
37
+ |[cl-nagoya/ruri-v3-130m](https://huggingface.co/cl-nagoya/ruri-v3-130m)|132M|80M|512|19|76.55|
38
+ |[**cl-nagoya/ruri-v3-310m**](https://huggingface.co/cl-nagoya/ruri-v3-310m)|315M|236M|768|25|**77.24**|
39
+
40
 
41
  ## Usage
42
 
 
62
 
63
  # Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
64
  # "" (empty string) is used for encoding semantic meaning.
65
+ # "トピック: " is used for classification, clustering, and encoding topical information.
66
  # "検索クエリ: " is used for queries in retrieval tasks.
67
  # "検索文書: " is used for documents to be retrieved.
68
  sentences = [