hpprc commited on
Commit
0c147ea
·
verified ·
1 Parent(s): ad4df54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -133
README.md CHANGED
@@ -6,136 +6,9 @@ tags:
6
  - sentence-transformers
7
  - sentence-similarity
8
  - feature-extraction
9
- ---
10
-
11
- # SentenceTransformer based on tohoku-nlp/bert-large-japanese-v2
12
-
13
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [tohoku-nlp/bert-large-japanese-v2](https://huggingface.co/tohoku-nlp/bert-large-japanese-v2). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
14
-
15
- ## Model Details
16
-
17
- ### Model Description
18
- - **Model Type:** Sentence Transformer
19
- - **Base model:** [tohoku-nlp/bert-large-japanese-v2](https://huggingface.co/tohoku-nlp/bert-large-japanese-v2) <!-- at revision 75b828083735e953e3ed13e2ad6ea945c1fdb390 -->
20
- - **Maximum Sequence Length:** 512 tokens
21
- - **Output Dimensionality:** 1024 tokens
22
- - **Similarity Function:** Cosine Similarity
23
- <!-- - **Training Dataset:** Unknown -->
24
- <!-- - **Language:** Unknown -->
25
- <!-- - **License:** Unknown -->
26
-
27
- ### Model Sources
28
-
29
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
30
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
31
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
32
-
33
- ### Full Model Architecture
34
-
35
- ```
36
- MySentenceTransformer(
37
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
38
- (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
39
- )
40
- ```
41
-
42
- ## Usage
43
-
44
- ### Direct Usage (Sentence Transformers)
45
-
46
- First install the Sentence Transformers library:
47
-
48
- ```bash
49
- pip install -U sentence-transformers
50
- ```
51
-
52
- Then you can load this model and run inference.
53
- ```python
54
- from sentence_transformers import SentenceTransformer
55
-
56
- # Download from the 🤗 Hub
57
- model = SentenceTransformer("hpprc/ruri-v2-pt-large")
58
- # Run inference
59
- sentences = [
60
- 'The weather is lovely today.',
61
- "It's so sunny outside!",
62
- 'He drove to the stadium.',
63
- ]
64
- embeddings = model.encode(sentences)
65
- print(embeddings.shape)
66
- # [3, 1024]
67
-
68
- # Get the similarity scores for the embeddings
69
- similarities = model.similarity(embeddings, embeddings)
70
- print(similarities.shape)
71
- # [3, 3]
72
- ```
73
-
74
- <!--
75
- ### Direct Usage (Transformers)
76
-
77
- <details><summary>Click to see the direct usage in Transformers</summary>
78
-
79
- </details>
80
- -->
81
-
82
- <!--
83
- ### Downstream Usage (Sentence Transformers)
84
-
85
- You can finetune this model on your own dataset.
86
-
87
- <details><summary>Click to expand</summary>
88
-
89
- </details>
90
- -->
91
-
92
- <!--
93
- ### Out-of-Scope Use
94
-
95
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
96
- -->
97
-
98
- <!--
99
- ## Bias, Risks and Limitations
100
-
101
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
102
- -->
103
-
104
- <!--
105
- ### Recommendations
106
-
107
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
108
- -->
109
-
110
- ## Training Details
111
-
112
- ### Framework Versions
113
- - Python: 3.10.13
114
- - Sentence Transformers: 3.1.1
115
- - Transformers: 4.45.1
116
- - PyTorch: 2.4.1+cu124
117
- - Accelerate: 0.34.2
118
- - Datasets: 2.19.1
119
- - Tokenizers: 0.20.0
120
-
121
- ## Citation
122
-
123
- ### BibTeX
124
-
125
- <!--
126
- ## Glossary
127
-
128
- *Clearly define terms in order to be accessible across audiences.*
129
- -->
130
-
131
- <!--
132
- ## Model Card Authors
133
-
134
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
135
- -->
136
-
137
- <!--
138
- ## Model Card Contact
139
-
140
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
141
- -->
 
6
  - sentence-transformers
7
  - sentence-similarity
8
  - feature-extraction
9
+ license: apache-2.0
10
+ datasets:
11
+ - cl-nagoya/ruri-dataset-v2-pt
12
+ language:
13
+ - ja
14
+ ---