cl-nagoya
/

ruri-v3-310m

Sentence Similarity

feature-extraction

Model card Files Files and versions Community

hpprc commited on Apr 14

Commit

6a30c5a

·

verified ·

1 Parent(s): 7a9f3a0

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -26,6 +26,17 @@ Ruri v3 offers several key technical advantages:
 - **Tokenizer based solely on SentencePiece**
   - Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
 ## Usage
@@ -51,7 +62,7 @@ model = SentenceTransformer("cl-nagoya/ruri-v3-310m")
 # Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
 # "" (empty string) is used for encoding semantic meaning.
-# "トピック: " is used for encoding topical information.
 # "検索クエリ: " is used for queries in retrieval tasks.
 # "検索文書: " is used for documents to be retrieved.
 sentences = [

 - **Tokenizer based solely on SentencePiece**
   - Unlike previous versions, which relied on Japanese-specific BERT tokenizers and required pre-tokenized input, Ruri v3 performs tokenization with SentencePiece only—no external word segmentation tool is required.
+## Model Series
+We provide Ruri-v3 in several model sizes. Below is a summary of each model.
+|ID| #Param. | #Param.<br>w/o Emb.|Dim.|#Layers|Avg. JMTEB|
+|-|-|-|-|-|-|
+|[cl-nagoya/ruri-v3-30m](https://huggingface.co/cl-nagoya/ruri-v3-30m)|37M|10M|256|10|74.51|
+|[cl-nagoya/ruri-v3-70m](https://huggingface.co/cl-nagoya/ruri-v3-70m)|70M|31M|384|13|75.48|
+|[cl-nagoya/ruri-v3-130m](https://huggingface.co/cl-nagoya/ruri-v3-130m)|132M|80M|512|19|76.55|
+|[**cl-nagoya/ruri-v3-310m**](https://huggingface.co/cl-nagoya/ruri-v3-310m)|315M|236M|768|25|**77.24**|
 ## Usage
 # Ruri v3 employs a 1+3 prefix scheme to distinguish between different types of text inputs:
 # "" (empty string) is used for encoding semantic meaning.
+# "トピック: " is used for classification, clustering, and encoding topical information.
 # "検索クエリ: " is used for queries in retrieval tasks.
 # "検索文書: " is used for documents to be retrieved.
 sentences = [