Update README.md
Browse files
README.md
CHANGED
@@ -26,6 +26,17 @@ Ruri v3 offers several key technical advantages:
|
|
26 |
- **Tokenizer based solely on SentencePiece**
|
27 |
- Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
## Usage
|
31 |
|
@@ -51,7 +62,7 @@ model = SentenceTransformer("cl-nagoya/ruri-v3-310m")
|
|
51 |
|
52 |
# Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
|
53 |
# "" (empty string) is used for encoding semantic meaning.
|
54 |
-
# "トピック: " is used for encoding topical information.
|
55 |
# "検索クエリ: " is used for queries in retrieval tasks.
|
56 |
# "検索文書: " is used for documents to be retrieved.
|
57 |
sentences = [
|
|
|
26 |
- **Tokenizer based solely on SentencePiece**
|
27 |
- Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
|
28 |
|
29 |
+
## Model Series
|
30 |
+
|
31 |
+
We provide Ruri-v3 in several model sizes. Below is a summary of each model.
|
32 |
+
|
33 |
+
|ID| #Param. | #Param.<br>w/o Emb.|Dim.|#Layers|Avg. JMTEB|
|
34 |
+
|-|-|-|-|-|-|
|
35 |
+
|[cl-nagoya/ruri-v3-30m](https://huggingface.co/cl-nagoya/ruri-v3-30m)|37M|10M|256|10|74.51|
|
36 |
+
|[cl-nagoya/ruri-v3-70m](https://huggingface.co/cl-nagoya/ruri-v3-70m)|70M|31M|384|13|75.48|
|
37 |
+
|[cl-nagoya/ruri-v3-130m](https://huggingface.co/cl-nagoya/ruri-v3-130m)|132M|80M|512|19|76.55|
|
38 |
+
|[**cl-nagoya/ruri-v3-310m**](https://huggingface.co/cl-nagoya/ruri-v3-310m)|315M|236M|768|25|**77.24**|
|
39 |
+
|
40 |
|
41 |
## Usage
|
42 |
|
|
|
62 |
|
63 |
# Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
|
64 |
# "" (empty string) is used for encoding semantic meaning.
|
65 |
+
# "トピック: " is used for classification, clustering, and encoding topical information.
|
66 |
# "検索クエリ: " is used for queries in retrieval tasks.
|
67 |
# "検索文書: " is used for documents to be retrieved.
|
68 |
sentences = [
|