tomaarsen HF Staff commited on
Commit
217ea97
·
verified ·
1 Parent(s): 89c8245

Add new SparseEncoder model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_CSRSparsity/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_dim": 1024,
3
+ "hidden_dim": 4096,
4
+ "k": 256,
5
+ "k_aux": 512,
6
+ "normalize": false,
7
+ "dead_threshold": 30
8
+ }
2_CSRSparsity/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f12554991ef4abfe313c0083d030b23fd265f9278e398afe7a89ae212d1bffd7
3
+ size 16830864
README.md ADDED
@@ -0,0 +1,2174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - csr
10
+ - generated_from_trainer
11
+ - dataset_size:99000
12
+ - loss:CSRLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ base_model: mixedbread-ai/mxbai-embed-large-v1
15
+ widget:
16
+ - source_sentence: what is the difference between uae and saudi arabia
17
+ sentences:
18
+ - 'Monopoly Junior Players take turns in order, with the initial player determined
19
+ by age before the game: the youngest player goes first. Players are dealt an initial
20
+ amount Monopoly money depending on the total number of players playing: 20 in
21
+ a two-player game, 18 in a three-player game or 16 in a four-player game. A typical
22
+ turn begins with the rolling of the die and the player advancing their token clockwise
23
+ around the board the corresponding number of spaces. When the player lands on
24
+ an unowned space they must purchase the space from the bank for the amount indicated
25
+ on the board, and places a sold sign on the coloured band at the top of the space
26
+ to denote ownership. If a player lands on a space owned by an opponent the player
27
+ pays the opponent rent in the amount written on the board. If the opponent owns
28
+ both properties of the same colour the rent is doubled.'
29
+ - Saudi Arabia–United Arab Emirates relations However, the UAE and Saudi Arabia
30
+ continue to take somewhat differing stances on regional conflicts such the Yemeni
31
+ Civil War, where the UAE opposes Al-Islah, and supports the Southern Movement,
32
+ which has fought against Saudi-backed forces, and the Syrian Civil War, where
33
+ the UAE has disagreed with Saudi support for Islamist movements.[4]
34
+ - Governors of states of India The governors and lieutenant-governors are appointed
35
+ by the President for a term of five years.
36
+ - source_sentence: who came up with the seperation of powers
37
+ sentences:
38
+ - Separation of powers Aristotle first mentioned the idea of a "mixed government"
39
+ or hybrid government in his work Politics where he drew upon many of the constitutional
40
+ forms in the city-states of Ancient Greece. In the Roman Republic, the Roman Senate,
41
+ Consuls and the Assemblies showed an example of a mixed government according to
42
+ Polybius (Histories, Book 6, 11–13).
43
+ - Economy of New Zealand New Zealand's diverse market economy has a sizable service
44
+ sector, accounting for 63% of all GDP activity in 2013.[17] Large scale manufacturing
45
+ industries include aluminium production, food processing, metal fabrication, wood
46
+ and paper products. Mining, manufacturing, electricity, gas, water, and waste
47
+ services accounted for 16.5% of GDP in 2013.[17] The primary sector continues
48
+ to dominate New Zealand's exports, despite accounting for 6.5% of GDP in 2013.[17]
49
+ - John Dalton John Dalton FRS (/ˈdɔːltən/; 6 September 1766 – 27 July 1844) was
50
+ an English chemist, physicist, and meteorologist. He is best known for proposing
51
+ the modern atomic theory and for his research into colour blindness, sometimes
52
+ referred to as Daltonism in his honour.
53
+ - source_sentence: who was the first president of indian science congress meeting
54
+ held in kolkata in 1914
55
+ sentences:
56
+ - Nobody to Blame "Nobody to Blame" is a song recorded by American country music
57
+ artist Chris Stapleton. The song was released in November 2015 as the singer's
58
+ third single overall. Stapleton co-wrote the song with Barry Bales and Ronnie
59
+ Bowman. It became Stapleton's first top 10 single on the US Country Airplay chart.[2]
60
+ "Nobody to Blame" won Song of the Year at the ACM Awards.[3]
61
+ - Indian Science Congress Association The first meeting of the congress was held
62
+ from 15–17 January 1914 at the premises of the Asiatic Society, Calcutta. Honorable
63
+ justice Sir Ashutosh Mukherjee, the then Vice Chancellor of the University of
64
+ Calcutta presided over the Congress. One hundred and five scientists from different
65
+ parts of India and abroad attended it. Altogether 35 papers under 6 different
66
+ sections, namely Botany, Chemistry, Ethnography, Geology, Physics and Zoology
67
+ were presented.
68
+ - New Soul "New Soul" is a song by the French-Israeli R&B/soul singer Yael Naïm,
69
+ from her self-titled second album. The song gained popularity in the United States
70
+ following its use by Apple in an advertisement for their MacBook Air laptop. In
71
+ the song Naïm sings of being a new soul who has come into the world to learn "a
72
+ bit 'bout how to give and take." However, she finds that things are harder than
73
+ they seem. The song, also featured in the films The House Bunny and Wild Target,
74
+ features a prominent "la la la la" section as its hook. It remains Naïm's biggest
75
+ hit single in the U.S. to date, and her only one to reach the Top 40 of the Billboard
76
+ Hot 100.
77
+ - source_sentence: who wrote get over it by the eagles
78
+ sentences:
79
+ - Get Over It (Eagles song) "Get Over It" is a song by the Eagles released as a
80
+ single after a fourteen-year breakup. It was also the first song written by bandmates
81
+ Don Henley and Glenn Frey when the band reunited. "Get Over It" was played live
82
+ for the first time during their Hell Freezes Over tour in 1994. It returned the
83
+ band to the U.S. Top 40 after a fourteen-year absence, peaking at No. 31 on the
84
+ Billboard Hot 100 chart. It also hit No. 4 on the Billboard Mainstream Rock Tracks
85
+ chart. The song was not played live by the Eagles after the "Hell Freezes Over"
86
+ tour in 1994. It remains the group's last Top 40 hit in the U.S.
87
+ - Pokhran-II In 1980, the general elections marked the return of Indira Gandhi and
88
+ the nuclear program began to gain momentum under Ramanna in 1981. Requests for
89
+ additional nuclear tests were continued to be denied by the government when Prime
90
+ Minister Indira Gandhi saw Pakistan began exercising the brinkmanship, though
91
+ the nuclear program continued to advance.[7] Initiation towards hydrogen bomb
92
+ began as well as the launch of the missile programme began under Late president
93
+ Dr. Abdul Kalam, who was then an aerospace engineer.[7]
94
+ - R. Budd Dwyer Robert Budd Dwyer (November 21, 1939 – January 22, 1987) was the
95
+ 30th State Treasurer of the Commonwealth of Pennsylvania. He served from 1971
96
+ to 1981 as a Republican member of the Pennsylvania State Senate representing the
97
+ state's 50th district. He then served as the 30th Treasurer of Pennsylvania from
98
+ January 20, 1981, until his death. On January 22, 1987, Dwyer called a news conference
99
+ in the Pennsylvania state capital of Harrisburg where he killed himself in front
100
+ of the gathered reporters, by shooting himself in the mouth with a .357 Magnum
101
+ revolver.[4] Dwyer's suicide was broadcast later that day to a wide television
102
+ audience across Pennsylvania.
103
+ - source_sentence: who is cornelius in the book of acts
104
+ sentences:
105
+ - Wonderful Tonight "Wonderful Tonight" is a ballad written by Eric Clapton. It
106
+ was included on Clapton's 1977 album Slowhand. Clapton wrote the song about Pattie
107
+ Boyd.[1] The female vocal harmonies on the song are provided by Marcella Detroit
108
+ (then Marcy Levy) and Yvonne Elliman.
109
+ - Joe Ranft Ranft reunited with Lasseter when he was hired by Pixar in 1991 as their
110
+ head of story.[1] There he worked on all of their films produced up to 2006; this
111
+ included Toy Story (for which he received an Academy Award nomination) and A Bug's
112
+ Life, as the co-story writer and others as story supervisor. His final film was
113
+ Cars. He also voiced characters in many of the films, including Heimlich the caterpillar
114
+ in A Bug's Life, Wheezy the penguin in Toy Story 2, and Jacques the shrimp in
115
+ Finding Nemo.[1]
116
+ - 'Cornelius the Centurion Cornelius (Greek: Κορνήλιος) was a Roman centurion who
117
+ is considered by Christians to be one of the first Gentiles to convert to the
118
+ faith, as related in Acts of the Apostles.'
119
+ datasets:
120
+ - sentence-transformers/natural-questions
121
+ pipeline_tag: feature-extraction
122
+ library_name: sentence-transformers
123
+ metrics:
124
+ - dot_accuracy@1
125
+ - dot_accuracy@3
126
+ - dot_accuracy@5
127
+ - dot_accuracy@10
128
+ - dot_precision@1
129
+ - dot_precision@3
130
+ - dot_precision@5
131
+ - dot_precision@10
132
+ - dot_recall@1
133
+ - dot_recall@3
134
+ - dot_recall@5
135
+ - dot_recall@10
136
+ - dot_ndcg@10
137
+ - dot_mrr@10
138
+ - dot_map@100
139
+ - row_non_zero_mean_query
140
+ - row_sparsity_mean_query
141
+ - row_non_zero_mean_corpus
142
+ - row_sparsity_mean_corpus
143
+ co2_eq_emissions:
144
+ emissions: 53.0254354591015
145
+ energy_consumed: 0.1364166777096632
146
+ source: codecarbon
147
+ training_type: fine-tuning
148
+ on_cloud: false
149
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
150
+ ram_total_size: 31.777088165283203
151
+ hours_used: 0.398
152
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
153
+ model-index:
154
+ - name: Sparse CSR model trained on Natural Questions
155
+ results:
156
+ - task:
157
+ type: sparse-information-retrieval
158
+ name: Sparse Information Retrieval
159
+ dataset:
160
+ name: NanoMSMARCO 128
161
+ type: NanoMSMARCO_128
162
+ metrics:
163
+ - type: dot_accuracy@1
164
+ value: 0.36
165
+ name: Dot Accuracy@1
166
+ - type: dot_accuracy@3
167
+ value: 0.6
168
+ name: Dot Accuracy@3
169
+ - type: dot_accuracy@5
170
+ value: 0.66
171
+ name: Dot Accuracy@5
172
+ - type: dot_accuracy@10
173
+ value: 0.78
174
+ name: Dot Accuracy@10
175
+ - type: dot_precision@1
176
+ value: 0.36
177
+ name: Dot Precision@1
178
+ - type: dot_precision@3
179
+ value: 0.2
180
+ name: Dot Precision@3
181
+ - type: dot_precision@5
182
+ value: 0.132
183
+ name: Dot Precision@5
184
+ - type: dot_precision@10
185
+ value: 0.07800000000000001
186
+ name: Dot Precision@10
187
+ - type: dot_recall@1
188
+ value: 0.36
189
+ name: Dot Recall@1
190
+ - type: dot_recall@3
191
+ value: 0.6
192
+ name: Dot Recall@3
193
+ - type: dot_recall@5
194
+ value: 0.66
195
+ name: Dot Recall@5
196
+ - type: dot_recall@10
197
+ value: 0.78
198
+ name: Dot Recall@10
199
+ - type: dot_ndcg@10
200
+ value: 0.5700548121129412
201
+ name: Dot Ndcg@10
202
+ - type: dot_mrr@10
203
+ value: 0.5031904761904762
204
+ name: Dot Mrr@10
205
+ - type: dot_map@100
206
+ value: 0.514501390584724
207
+ name: Dot Map@100
208
+ - type: row_non_zero_mean_query
209
+ value: 128.0
210
+ name: Row Non Zero Mean Query
211
+ - type: row_sparsity_mean_query
212
+ value: 0.96875
213
+ name: Row Sparsity Mean Query
214
+ - type: row_non_zero_mean_corpus
215
+ value: 128.0
216
+ name: Row Non Zero Mean Corpus
217
+ - type: row_sparsity_mean_corpus
218
+ value: 0.96875
219
+ name: Row Sparsity Mean Corpus
220
+ - task:
221
+ type: sparse-information-retrieval
222
+ name: Sparse Information Retrieval
223
+ dataset:
224
+ name: NanoNFCorpus 128
225
+ type: NanoNFCorpus_128
226
+ metrics:
227
+ - type: dot_accuracy@1
228
+ value: 0.3
229
+ name: Dot Accuracy@1
230
+ - type: dot_accuracy@3
231
+ value: 0.58
232
+ name: Dot Accuracy@3
233
+ - type: dot_accuracy@5
234
+ value: 0.64
235
+ name: Dot Accuracy@5
236
+ - type: dot_accuracy@10
237
+ value: 0.66
238
+ name: Dot Accuracy@10
239
+ - type: dot_precision@1
240
+ value: 0.3
241
+ name: Dot Precision@1
242
+ - type: dot_precision@3
243
+ value: 0.32
244
+ name: Dot Precision@3
245
+ - type: dot_precision@5
246
+ value: 0.28400000000000003
247
+ name: Dot Precision@5
248
+ - type: dot_precision@10
249
+ value: 0.22399999999999998
250
+ name: Dot Precision@10
251
+ - type: dot_recall@1
252
+ value: 0.020619614054435857
253
+ name: Dot Recall@1
254
+ - type: dot_recall@3
255
+ value: 0.07638129396550794
256
+ name: Dot Recall@3
257
+ - type: dot_recall@5
258
+ value: 0.09086567610708625
259
+ name: Dot Recall@5
260
+ - type: dot_recall@10
261
+ value: 0.10949508245462748
262
+ name: Dot Recall@10
263
+ - type: dot_ndcg@10
264
+ value: 0.2705576989448532
265
+ name: Dot Ndcg@10
266
+ - type: dot_mrr@10
267
+ value: 0.43883333333333324
268
+ name: Dot Mrr@10
269
+ - type: dot_map@100
270
+ value: 0.11570301194076318
271
+ name: Dot Map@100
272
+ - type: row_non_zero_mean_query
273
+ value: 128.0
274
+ name: Row Non Zero Mean Query
275
+ - type: row_sparsity_mean_query
276
+ value: 0.96875
277
+ name: Row Sparsity Mean Query
278
+ - type: row_non_zero_mean_corpus
279
+ value: 128.0
280
+ name: Row Non Zero Mean Corpus
281
+ - type: row_sparsity_mean_corpus
282
+ value: 0.96875
283
+ name: Row Sparsity Mean Corpus
284
+ - task:
285
+ type: sparse-information-retrieval
286
+ name: Sparse Information Retrieval
287
+ dataset:
288
+ name: NanoNQ 128
289
+ type: NanoNQ_128
290
+ metrics:
291
+ - type: dot_accuracy@1
292
+ value: 0.44
293
+ name: Dot Accuracy@1
294
+ - type: dot_accuracy@3
295
+ value: 0.58
296
+ name: Dot Accuracy@3
297
+ - type: dot_accuracy@5
298
+ value: 0.66
299
+ name: Dot Accuracy@5
300
+ - type: dot_accuracy@10
301
+ value: 0.76
302
+ name: Dot Accuracy@10
303
+ - type: dot_precision@1
304
+ value: 0.44
305
+ name: Dot Precision@1
306
+ - type: dot_precision@3
307
+ value: 0.19333333333333333
308
+ name: Dot Precision@3
309
+ - type: dot_precision@5
310
+ value: 0.132
311
+ name: Dot Precision@5
312
+ - type: dot_precision@10
313
+ value: 0.08199999999999999
314
+ name: Dot Precision@10
315
+ - type: dot_recall@1
316
+ value: 0.43
317
+ name: Dot Recall@1
318
+ - type: dot_recall@3
319
+ value: 0.54
320
+ name: Dot Recall@3
321
+ - type: dot_recall@5
322
+ value: 0.6
323
+ name: Dot Recall@5
324
+ - type: dot_recall@10
325
+ value: 0.73
326
+ name: Dot Recall@10
327
+ - type: dot_ndcg@10
328
+ value: 0.5760476804950475
329
+ name: Dot Ndcg@10
330
+ - type: dot_mrr@10
331
+ value: 0.5402222222222222
332
+ name: Dot Mrr@10
333
+ - type: dot_map@100
334
+ value: 0.5348788301685897
335
+ name: Dot Map@100
336
+ - type: row_non_zero_mean_query
337
+ value: 128.0
338
+ name: Row Non Zero Mean Query
339
+ - type: row_sparsity_mean_query
340
+ value: 0.96875
341
+ name: Row Sparsity Mean Query
342
+ - type: row_non_zero_mean_corpus
343
+ value: 128.0
344
+ name: Row Non Zero Mean Corpus
345
+ - type: row_sparsity_mean_corpus
346
+ value: 0.96875
347
+ name: Row Sparsity Mean Corpus
348
+ - task:
349
+ type: sparse-nano-beir
350
+ name: Sparse Nano BEIR
351
+ dataset:
352
+ name: NanoBEIR mean 128
353
+ type: NanoBEIR_mean_128
354
+ metrics:
355
+ - type: dot_accuracy@1
356
+ value: 0.36666666666666664
357
+ name: Dot Accuracy@1
358
+ - type: dot_accuracy@3
359
+ value: 0.5866666666666666
360
+ name: Dot Accuracy@3
361
+ - type: dot_accuracy@5
362
+ value: 0.6533333333333333
363
+ name: Dot Accuracy@5
364
+ - type: dot_accuracy@10
365
+ value: 0.7333333333333334
366
+ name: Dot Accuracy@10
367
+ - type: dot_precision@1
368
+ value: 0.36666666666666664
369
+ name: Dot Precision@1
370
+ - type: dot_precision@3
371
+ value: 0.23777777777777778
372
+ name: Dot Precision@3
373
+ - type: dot_precision@5
374
+ value: 0.18266666666666667
375
+ name: Dot Precision@5
376
+ - type: dot_precision@10
377
+ value: 0.128
378
+ name: Dot Precision@10
379
+ - type: dot_recall@1
380
+ value: 0.27020653801814526
381
+ name: Dot Recall@1
382
+ - type: dot_recall@3
383
+ value: 0.405460431321836
384
+ name: Dot Recall@3
385
+ - type: dot_recall@5
386
+ value: 0.4502885587023621
387
+ name: Dot Recall@5
388
+ - type: dot_recall@10
389
+ value: 0.5398316941515425
390
+ name: Dot Recall@10
391
+ - type: dot_ndcg@10
392
+ value: 0.4722200638509473
393
+ name: Dot Ndcg@10
394
+ - type: dot_mrr@10
395
+ value: 0.4940820105820105
396
+ name: Dot Mrr@10
397
+ - type: dot_map@100
398
+ value: 0.38836107756469235
399
+ name: Dot Map@100
400
+ - type: row_non_zero_mean_query
401
+ value: 128.0
402
+ name: Row Non Zero Mean Query
403
+ - type: row_sparsity_mean_query
404
+ value: 0.96875
405
+ name: Row Sparsity Mean Query
406
+ - type: row_non_zero_mean_corpus
407
+ value: 128.0
408
+ name: Row Non Zero Mean Corpus
409
+ - type: row_sparsity_mean_corpus
410
+ value: 0.96875
411
+ name: Row Sparsity Mean Corpus
412
+ - task:
413
+ type: sparse-information-retrieval
414
+ name: Sparse Information Retrieval
415
+ dataset:
416
+ name: NanoMSMARCO 256
417
+ type: NanoMSMARCO_256
418
+ metrics:
419
+ - type: dot_accuracy@1
420
+ value: 0.36
421
+ name: Dot Accuracy@1
422
+ - type: dot_accuracy@3
423
+ value: 0.64
424
+ name: Dot Accuracy@3
425
+ - type: dot_accuracy@5
426
+ value: 0.76
427
+ name: Dot Accuracy@5
428
+ - type: dot_accuracy@10
429
+ value: 0.84
430
+ name: Dot Accuracy@10
431
+ - type: dot_precision@1
432
+ value: 0.36
433
+ name: Dot Precision@1
434
+ - type: dot_precision@3
435
+ value: 0.21333333333333332
436
+ name: Dot Precision@3
437
+ - type: dot_precision@5
438
+ value: 0.15200000000000002
439
+ name: Dot Precision@5
440
+ - type: dot_precision@10
441
+ value: 0.08399999999999999
442
+ name: Dot Precision@10
443
+ - type: dot_recall@1
444
+ value: 0.36
445
+ name: Dot Recall@1
446
+ - type: dot_recall@3
447
+ value: 0.64
448
+ name: Dot Recall@3
449
+ - type: dot_recall@5
450
+ value: 0.76
451
+ name: Dot Recall@5
452
+ - type: dot_recall@10
453
+ value: 0.84
454
+ name: Dot Recall@10
455
+ - type: dot_ndcg@10
456
+ value: 0.6020044872439759
457
+ name: Dot Ndcg@10
458
+ - type: dot_mrr@10
459
+ value: 0.5252142857142856
460
+ name: Dot Mrr@10
461
+ - type: dot_map@100
462
+ value: 0.5321764898130005
463
+ name: Dot Map@100
464
+ - type: row_non_zero_mean_query
465
+ value: 256.0
466
+ name: Row Non Zero Mean Query
467
+ - type: row_sparsity_mean_query
468
+ value: 0.9375
469
+ name: Row Sparsity Mean Query
470
+ - type: row_non_zero_mean_corpus
471
+ value: 256.0
472
+ name: Row Non Zero Mean Corpus
473
+ - type: row_sparsity_mean_corpus
474
+ value: 0.9375
475
+ name: Row Sparsity Mean Corpus
476
+ - task:
477
+ type: sparse-information-retrieval
478
+ name: Sparse Information Retrieval
479
+ dataset:
480
+ name: NanoNFCorpus 256
481
+ type: NanoNFCorpus_256
482
+ metrics:
483
+ - type: dot_accuracy@1
484
+ value: 0.4
485
+ name: Dot Accuracy@1
486
+ - type: dot_accuracy@3
487
+ value: 0.56
488
+ name: Dot Accuracy@3
489
+ - type: dot_accuracy@5
490
+ value: 0.64
491
+ name: Dot Accuracy@5
492
+ - type: dot_accuracy@10
493
+ value: 0.76
494
+ name: Dot Accuracy@10
495
+ - type: dot_precision@1
496
+ value: 0.4
497
+ name: Dot Precision@1
498
+ - type: dot_precision@3
499
+ value: 0.34666666666666657
500
+ name: Dot Precision@3
501
+ - type: dot_precision@5
502
+ value: 0.316
503
+ name: Dot Precision@5
504
+ - type: dot_precision@10
505
+ value: 0.27
506
+ name: Dot Precision@10
507
+ - type: dot_recall@1
508
+ value: 0.023916206387792894
509
+ name: Dot Recall@1
510
+ - type: dot_recall@3
511
+ value: 0.060605496737713836
512
+ name: Dot Recall@3
513
+ - type: dot_recall@5
514
+ value: 0.08375989700258081
515
+ name: Dot Recall@5
516
+ - type: dot_recall@10
517
+ value: 0.14574397353137197
518
+ name: Dot Recall@10
519
+ - type: dot_ndcg@10
520
+ value: 0.3186443185167164
521
+ name: Dot Ndcg@10
522
+ - type: dot_mrr@10
523
+ value: 0.5101904761904763
524
+ name: Dot Mrr@10
525
+ - type: dot_map@100
526
+ value: 0.1354214218643388
527
+ name: Dot Map@100
528
+ - type: row_non_zero_mean_query
529
+ value: 256.0
530
+ name: Row Non Zero Mean Query
531
+ - type: row_sparsity_mean_query
532
+ value: 0.9375
533
+ name: Row Sparsity Mean Query
534
+ - type: row_non_zero_mean_corpus
535
+ value: 256.0
536
+ name: Row Non Zero Mean Corpus
537
+ - type: row_sparsity_mean_corpus
538
+ value: 0.9375
539
+ name: Row Sparsity Mean Corpus
540
+ - task:
541
+ type: sparse-information-retrieval
542
+ name: Sparse Information Retrieval
543
+ dataset:
544
+ name: NanoNQ 256
545
+ type: NanoNQ_256
546
+ metrics:
547
+ - type: dot_accuracy@1
548
+ value: 0.44
549
+ name: Dot Accuracy@1
550
+ - type: dot_accuracy@3
551
+ value: 0.7
552
+ name: Dot Accuracy@3
553
+ - type: dot_accuracy@5
554
+ value: 0.76
555
+ name: Dot Accuracy@5
556
+ - type: dot_accuracy@10
557
+ value: 0.82
558
+ name: Dot Accuracy@10
559
+ - type: dot_precision@1
560
+ value: 0.44
561
+ name: Dot Precision@1
562
+ - type: dot_precision@3
563
+ value: 0.23333333333333336
564
+ name: Dot Precision@3
565
+ - type: dot_precision@5
566
+ value: 0.15600000000000003
567
+ name: Dot Precision@5
568
+ - type: dot_precision@10
569
+ value: 0.088
570
+ name: Dot Precision@10
571
+ - type: dot_recall@1
572
+ value: 0.42
573
+ name: Dot Recall@1
574
+ - type: dot_recall@3
575
+ value: 0.66
576
+ name: Dot Recall@3
577
+ - type: dot_recall@5
578
+ value: 0.71
579
+ name: Dot Recall@5
580
+ - type: dot_recall@10
581
+ value: 0.79
582
+ name: Dot Recall@10
583
+ - type: dot_ndcg@10
584
+ value: 0.6113177400510434
585
+ name: Dot Ndcg@10
586
+ - type: dot_mrr@10
587
+ value: 0.5685238095238094
588
+ name: Dot Mrr@10
589
+ - type: dot_map@100
590
+ value: 0.5538446726220486
591
+ name: Dot Map@100
592
+ - type: row_non_zero_mean_query
593
+ value: 256.0
594
+ name: Row Non Zero Mean Query
595
+ - type: row_sparsity_mean_query
596
+ value: 0.9375
597
+ name: Row Sparsity Mean Query
598
+ - type: row_non_zero_mean_corpus
599
+ value: 256.0
600
+ name: Row Non Zero Mean Corpus
601
+ - type: row_sparsity_mean_corpus
602
+ value: 0.9375
603
+ name: Row Sparsity Mean Corpus
604
+ - task:
605
+ type: sparse-nano-beir
606
+ name: Sparse Nano BEIR
607
+ dataset:
608
+ name: NanoBEIR mean 256
609
+ type: NanoBEIR_mean_256
610
+ metrics:
611
+ - type: dot_accuracy@1
612
+ value: 0.39999999999999997
613
+ name: Dot Accuracy@1
614
+ - type: dot_accuracy@3
615
+ value: 0.6333333333333334
616
+ name: Dot Accuracy@3
617
+ - type: dot_accuracy@5
618
+ value: 0.7200000000000001
619
+ name: Dot Accuracy@5
620
+ - type: dot_accuracy@10
621
+ value: 0.8066666666666666
622
+ name: Dot Accuracy@10
623
+ - type: dot_precision@1
624
+ value: 0.39999999999999997
625
+ name: Dot Precision@1
626
+ - type: dot_precision@3
627
+ value: 0.2644444444444444
628
+ name: Dot Precision@3
629
+ - type: dot_precision@5
630
+ value: 0.20800000000000005
631
+ name: Dot Precision@5
632
+ - type: dot_precision@10
633
+ value: 0.14733333333333332
634
+ name: Dot Precision@10
635
+ - type: dot_recall@1
636
+ value: 0.267972068795931
637
+ name: Dot Recall@1
638
+ - type: dot_recall@3
639
+ value: 0.45353516557923795
640
+ name: Dot Recall@3
641
+ - type: dot_recall@5
642
+ value: 0.517919965667527
643
+ name: Dot Recall@5
644
+ - type: dot_recall@10
645
+ value: 0.5919146578437907
646
+ name: Dot Recall@10
647
+ - type: dot_ndcg@10
648
+ value: 0.5106555152705786
649
+ name: Dot Ndcg@10
650
+ - type: dot_mrr@10
651
+ value: 0.5346428571428571
652
+ name: Dot Mrr@10
653
+ - type: dot_map@100
654
+ value: 0.407147528099796
655
+ name: Dot Map@100
656
+ - type: row_non_zero_mean_query
657
+ value: 256.0
658
+ name: Row Non Zero Mean Query
659
+ - type: row_sparsity_mean_query
660
+ value: 0.9375
661
+ name: Row Sparsity Mean Query
662
+ - type: row_non_zero_mean_corpus
663
+ value: 256.0
664
+ name: Row Non Zero Mean Corpus
665
+ - type: row_sparsity_mean_corpus
666
+ value: 0.9375
667
+ name: Row Sparsity Mean Corpus
668
+ - task:
669
+ type: sparse-information-retrieval
670
+ name: Sparse Information Retrieval
671
+ dataset:
672
+ name: NanoClimateFEVER
673
+ type: NanoClimateFEVER
674
+ metrics:
675
+ - type: dot_accuracy@1
676
+ value: 0.32
677
+ name: Dot Accuracy@1
678
+ - type: dot_accuracy@3
679
+ value: 0.44
680
+ name: Dot Accuracy@3
681
+ - type: dot_accuracy@5
682
+ value: 0.52
683
+ name: Dot Accuracy@5
684
+ - type: dot_accuracy@10
685
+ value: 0.66
686
+ name: Dot Accuracy@10
687
+ - type: dot_precision@1
688
+ value: 0.32
689
+ name: Dot Precision@1
690
+ - type: dot_precision@3
691
+ value: 0.16
692
+ name: Dot Precision@3
693
+ - type: dot_precision@5
694
+ value: 0.128
695
+ name: Dot Precision@5
696
+ - type: dot_precision@10
697
+ value: 0.1
698
+ name: Dot Precision@10
699
+ - type: dot_recall@1
700
+ value: 0.16
701
+ name: Dot Recall@1
702
+ - type: dot_recall@3
703
+ value: 0.205
704
+ name: Dot Recall@3
705
+ - type: dot_recall@5
706
+ value: 0.255
707
+ name: Dot Recall@5
708
+ - type: dot_recall@10
709
+ value: 0.3833333333333333
710
+ name: Dot Recall@10
711
+ - type: dot_ndcg@10
712
+ value: 0.31822361752418216
713
+ name: Dot Ndcg@10
714
+ - type: dot_mrr@10
715
+ value: 0.41229365079365077
716
+ name: Dot Mrr@10
717
+ - type: dot_map@100
718
+ value: 0.2533758500528694
719
+ name: Dot Map@100
720
+ - type: row_non_zero_mean_query
721
+ value: 256.0
722
+ name: Row Non Zero Mean Query
723
+ - type: row_sparsity_mean_query
724
+ value: 0.9375
725
+ name: Row Sparsity Mean Query
726
+ - type: row_non_zero_mean_corpus
727
+ value: 256.0
728
+ name: Row Non Zero Mean Corpus
729
+ - type: row_sparsity_mean_corpus
730
+ value: 0.9375
731
+ name: Row Sparsity Mean Corpus
732
+ - task:
733
+ type: sparse-information-retrieval
734
+ name: Sparse Information Retrieval
735
+ dataset:
736
+ name: NanoDBPedia
737
+ type: NanoDBPedia
738
+ metrics:
739
+ - type: dot_accuracy@1
740
+ value: 0.66
741
+ name: Dot Accuracy@1
742
+ - type: dot_accuracy@3
743
+ value: 0.88
744
+ name: Dot Accuracy@3
745
+ - type: dot_accuracy@5
746
+ value: 0.94
747
+ name: Dot Accuracy@5
748
+ - type: dot_accuracy@10
749
+ value: 0.94
750
+ name: Dot Accuracy@10
751
+ - type: dot_precision@1
752
+ value: 0.66
753
+ name: Dot Precision@1
754
+ - type: dot_precision@3
755
+ value: 0.6466666666666666
756
+ name: Dot Precision@3
757
+ - type: dot_precision@5
758
+ value: 0.6040000000000001
759
+ name: Dot Precision@5
760
+ - type: dot_precision@10
761
+ value: 0.49
762
+ name: Dot Precision@10
763
+ - type: dot_recall@1
764
+ value: 0.06909677601128397
765
+ name: Dot Recall@1
766
+ - type: dot_recall@3
767
+ value: 0.17837135105230828
768
+ name: Dot Recall@3
769
+ - type: dot_recall@5
770
+ value: 0.26249987826636084
771
+ name: Dot Recall@5
772
+ - type: dot_recall@10
773
+ value: 0.35073086886185734
774
+ name: Dot Recall@10
775
+ - type: dot_ndcg@10
776
+ value: 0.6034573399856589
777
+ name: Dot Ndcg@10
778
+ - type: dot_mrr@10
779
+ value: 0.7786666666666667
780
+ name: Dot Mrr@10
781
+ - type: dot_map@100
782
+ value: 0.44336900502358395
783
+ name: Dot Map@100
784
+ - type: row_non_zero_mean_query
785
+ value: 256.0
786
+ name: Row Non Zero Mean Query
787
+ - type: row_sparsity_mean_query
788
+ value: 0.9375
789
+ name: Row Sparsity Mean Query
790
+ - type: row_non_zero_mean_corpus
791
+ value: 256.0
792
+ name: Row Non Zero Mean Corpus
793
+ - type: row_sparsity_mean_corpus
794
+ value: 0.9375
795
+ name: Row Sparsity Mean Corpus
796
+ - task:
797
+ type: sparse-information-retrieval
798
+ name: Sparse Information Retrieval
799
+ dataset:
800
+ name: NanoFEVER
801
+ type: NanoFEVER
802
+ metrics:
803
+ - type: dot_accuracy@1
804
+ value: 0.78
805
+ name: Dot Accuracy@1
806
+ - type: dot_accuracy@3
807
+ value: 0.9
808
+ name: Dot Accuracy@3
809
+ - type: dot_accuracy@5
810
+ value: 0.94
811
+ name: Dot Accuracy@5
812
+ - type: dot_accuracy@10
813
+ value: 0.96
814
+ name: Dot Accuracy@10
815
+ - type: dot_precision@1
816
+ value: 0.78
817
+ name: Dot Precision@1
818
+ - type: dot_precision@3
819
+ value: 0.3066666666666667
820
+ name: Dot Precision@3
821
+ - type: dot_precision@5
822
+ value: 0.19599999999999995
823
+ name: Dot Precision@5
824
+ - type: dot_precision@10
825
+ value: 0.09999999999999998
826
+ name: Dot Precision@10
827
+ - type: dot_recall@1
828
+ value: 0.7266666666666666
829
+ name: Dot Recall@1
830
+ - type: dot_recall@3
831
+ value: 0.8566666666666666
832
+ name: Dot Recall@3
833
+ - type: dot_recall@5
834
+ value: 0.9066666666666667
835
+ name: Dot Recall@5
836
+ - type: dot_recall@10
837
+ value: 0.9266666666666667
838
+ name: Dot Recall@10
839
+ - type: dot_ndcg@10
840
+ value: 0.8474860667472335
841
+ name: Dot Ndcg@10
842
+ - type: dot_mrr@10
843
+ value: 0.8490000000000001
844
+ name: Dot Mrr@10
845
+ - type: dot_map@100
846
+ value: 0.8124727372162975
847
+ name: Dot Map@100
848
+ - type: row_non_zero_mean_query
849
+ value: 256.0
850
+ name: Row Non Zero Mean Query
851
+ - type: row_sparsity_mean_query
852
+ value: 0.9375
853
+ name: Row Sparsity Mean Query
854
+ - type: row_non_zero_mean_corpus
855
+ value: 256.0
856
+ name: Row Non Zero Mean Corpus
857
+ - type: row_sparsity_mean_corpus
858
+ value: 0.9375
859
+ name: Row Sparsity Mean Corpus
860
+ - task:
861
+ type: sparse-information-retrieval
862
+ name: Sparse Information Retrieval
863
+ dataset:
864
+ name: NanoFiQA2018
865
+ type: NanoFiQA2018
866
+ metrics:
867
+ - type: dot_accuracy@1
868
+ value: 0.4
869
+ name: Dot Accuracy@1
870
+ - type: dot_accuracy@3
871
+ value: 0.58
872
+ name: Dot Accuracy@3
873
+ - type: dot_accuracy@5
874
+ value: 0.62
875
+ name: Dot Accuracy@5
876
+ - type: dot_accuracy@10
877
+ value: 0.76
878
+ name: Dot Accuracy@10
879
+ - type: dot_precision@1
880
+ value: 0.4
881
+ name: Dot Precision@1
882
+ - type: dot_precision@3
883
+ value: 0.28
884
+ name: Dot Precision@3
885
+ - type: dot_precision@5
886
+ value: 0.2
887
+ name: Dot Precision@5
888
+ - type: dot_precision@10
889
+ value: 0.12399999999999999
890
+ name: Dot Precision@10
891
+ - type: dot_recall@1
892
+ value: 0.1779126984126984
893
+ name: Dot Recall@1
894
+ - type: dot_recall@3
895
+ value: 0.3990714285714285
896
+ name: Dot Recall@3
897
+ - type: dot_recall@5
898
+ value: 0.45465079365079364
899
+ name: Dot Recall@5
900
+ - type: dot_recall@10
901
+ value: 0.5628412698412698
902
+ name: Dot Recall@10
903
+ - type: dot_ndcg@10
904
+ value: 0.44217413756349744
905
+ name: Dot Ndcg@10
906
+ - type: dot_mrr@10
907
+ value: 0.503095238095238
908
+ name: Dot Mrr@10
909
+ - type: dot_map@100
910
+ value: 0.3726712950424665
911
+ name: Dot Map@100
912
+ - type: row_non_zero_mean_query
913
+ value: 256.0
914
+ name: Row Non Zero Mean Query
915
+ - type: row_sparsity_mean_query
916
+ value: 0.9375
917
+ name: Row Sparsity Mean Query
918
+ - type: row_non_zero_mean_corpus
919
+ value: 256.0
920
+ name: Row Non Zero Mean Corpus
921
+ - type: row_sparsity_mean_corpus
922
+ value: 0.9375
923
+ name: Row Sparsity Mean Corpus
924
+ - task:
925
+ type: sparse-information-retrieval
926
+ name: Sparse Information Retrieval
927
+ dataset:
928
+ name: NanoHotpotQA
929
+ type: NanoHotpotQA
930
+ metrics:
931
+ - type: dot_accuracy@1
932
+ value: 0.78
933
+ name: Dot Accuracy@1
934
+ - type: dot_accuracy@3
935
+ value: 0.9
936
+ name: Dot Accuracy@3
937
+ - type: dot_accuracy@5
938
+ value: 0.94
939
+ name: Dot Accuracy@5
940
+ - type: dot_accuracy@10
941
+ value: 1.0
942
+ name: Dot Accuracy@10
943
+ - type: dot_precision@1
944
+ value: 0.78
945
+ name: Dot Precision@1
946
+ - type: dot_precision@3
947
+ value: 0.5
948
+ name: Dot Precision@3
949
+ - type: dot_precision@5
950
+ value: 0.33599999999999997
951
+ name: Dot Precision@5
952
+ - type: dot_precision@10
953
+ value: 0.17599999999999993
954
+ name: Dot Precision@10
955
+ - type: dot_recall@1
956
+ value: 0.39
957
+ name: Dot Recall@1
958
+ - type: dot_recall@3
959
+ value: 0.75
960
+ name: Dot Recall@3
961
+ - type: dot_recall@5
962
+ value: 0.84
963
+ name: Dot Recall@5
964
+ - type: dot_recall@10
965
+ value: 0.88
966
+ name: Dot Recall@10
967
+ - type: dot_ndcg@10
968
+ value: 0.8076193908022954
969
+ name: Dot Ndcg@10
970
+ - type: dot_mrr@10
971
+ value: 0.8510000000000001
972
+ name: Dot Mrr@10
973
+ - type: dot_map@100
974
+ value: 0.7532702446589332
975
+ name: Dot Map@100
976
+ - type: row_non_zero_mean_query
977
+ value: 256.0
978
+ name: Row Non Zero Mean Query
979
+ - type: row_sparsity_mean_query
980
+ value: 0.9375
981
+ name: Row Sparsity Mean Query
982
+ - type: row_non_zero_mean_corpus
983
+ value: 256.0
984
+ name: Row Non Zero Mean Corpus
985
+ - type: row_sparsity_mean_corpus
986
+ value: 0.9375
987
+ name: Row Sparsity Mean Corpus
988
+ - task:
989
+ type: sparse-information-retrieval
990
+ name: Sparse Information Retrieval
991
+ dataset:
992
+ name: NanoMSMARCO
993
+ type: NanoMSMARCO
994
+ metrics:
995
+ - type: dot_accuracy@1
996
+ value: 0.38
997
+ name: Dot Accuracy@1
998
+ - type: dot_accuracy@3
999
+ value: 0.7
1000
+ name: Dot Accuracy@3
1001
+ - type: dot_accuracy@5
1002
+ value: 0.74
1003
+ name: Dot Accuracy@5
1004
+ - type: dot_accuracy@10
1005
+ value: 0.82
1006
+ name: Dot Accuracy@10
1007
+ - type: dot_precision@1
1008
+ value: 0.38
1009
+ name: Dot Precision@1
1010
+ - type: dot_precision@3
1011
+ value: 0.2333333333333333
1012
+ name: Dot Precision@3
1013
+ - type: dot_precision@5
1014
+ value: 0.14800000000000002
1015
+ name: Dot Precision@5
1016
+ - type: dot_precision@10
1017
+ value: 0.08199999999999999
1018
+ name: Dot Precision@10
1019
+ - type: dot_recall@1
1020
+ value: 0.38
1021
+ name: Dot Recall@1
1022
+ - type: dot_recall@3
1023
+ value: 0.7
1024
+ name: Dot Recall@3
1025
+ - type: dot_recall@5
1026
+ value: 0.74
1027
+ name: Dot Recall@5
1028
+ - type: dot_recall@10
1029
+ value: 0.82
1030
+ name: Dot Recall@10
1031
+ - type: dot_ndcg@10
1032
+ value: 0.6105756359135982
1033
+ name: Dot Ndcg@10
1034
+ - type: dot_mrr@10
1035
+ value: 0.5418571428571429
1036
+ name: Dot Mrr@10
1037
+ - type: dot_map@100
1038
+ value: 0.5510242257742258
1039
+ name: Dot Map@100
1040
+ - type: row_non_zero_mean_query
1041
+ value: 256.0
1042
+ name: Row Non Zero Mean Query
1043
+ - type: row_sparsity_mean_query
1044
+ value: 0.9375
1045
+ name: Row Sparsity Mean Query
1046
+ - type: row_non_zero_mean_corpus
1047
+ value: 256.0
1048
+ name: Row Non Zero Mean Corpus
1049
+ - type: row_sparsity_mean_corpus
1050
+ value: 0.9375
1051
+ name: Row Sparsity Mean Corpus
1052
+ - task:
1053
+ type: sparse-information-retrieval
1054
+ name: Sparse Information Retrieval
1055
+ dataset:
1056
+ name: NanoNFCorpus
1057
+ type: NanoNFCorpus
1058
+ metrics:
1059
+ - type: dot_accuracy@1
1060
+ value: 0.4
1061
+ name: Dot Accuracy@1
1062
+ - type: dot_accuracy@3
1063
+ value: 0.58
1064
+ name: Dot Accuracy@3
1065
+ - type: dot_accuracy@5
1066
+ value: 0.64
1067
+ name: Dot Accuracy@5
1068
+ - type: dot_accuracy@10
1069
+ value: 0.72
1070
+ name: Dot Accuracy@10
1071
+ - type: dot_precision@1
1072
+ value: 0.4
1073
+ name: Dot Precision@1
1074
+ - type: dot_precision@3
1075
+ value: 0.36
1076
+ name: Dot Precision@3
1077
+ - type: dot_precision@5
1078
+ value: 0.32
1079
+ name: Dot Precision@5
1080
+ - type: dot_precision@10
1081
+ value: 0.258
1082
+ name: Dot Precision@10
1083
+ - type: dot_recall@1
1084
+ value: 0.04221121382565747
1085
+ name: Dot Recall@1
1086
+ - type: dot_recall@3
1087
+ value: 0.07831185452988602
1088
+ name: Dot Recall@3
1089
+ - type: dot_recall@5
1090
+ value: 0.09530060099380368
1091
+ name: Dot Recall@5
1092
+ - type: dot_recall@10
1093
+ value: 0.12471139152233171
1094
+ name: Dot Recall@10
1095
+ - type: dot_ndcg@10
1096
+ value: 0.31877595732776315
1097
+ name: Dot Ndcg@10
1098
+ - type: dot_mrr@10
1099
+ value: 0.49883333333333324
1100
+ name: Dot Mrr@10
1101
+ - type: dot_map@100
1102
+ value: 0.14727865014045124
1103
+ name: Dot Map@100
1104
+ - type: row_non_zero_mean_query
1105
+ value: 256.0
1106
+ name: Row Non Zero Mean Query
1107
+ - type: row_sparsity_mean_query
1108
+ value: 0.9375
1109
+ name: Row Sparsity Mean Query
1110
+ - type: row_non_zero_mean_corpus
1111
+ value: 256.0
1112
+ name: Row Non Zero Mean Corpus
1113
+ - type: row_sparsity_mean_corpus
1114
+ value: 0.9375
1115
+ name: Row Sparsity Mean Corpus
1116
+ - task:
1117
+ type: sparse-information-retrieval
1118
+ name: Sparse Information Retrieval
1119
+ dataset:
1120
+ name: NanoNQ
1121
+ type: NanoNQ
1122
+ metrics:
1123
+ - type: dot_accuracy@1
1124
+ value: 0.52
1125
+ name: Dot Accuracy@1
1126
+ - type: dot_accuracy@3
1127
+ value: 0.72
1128
+ name: Dot Accuracy@3
1129
+ - type: dot_accuracy@5
1130
+ value: 0.76
1131
+ name: Dot Accuracy@5
1132
+ - type: dot_accuracy@10
1133
+ value: 0.8
1134
+ name: Dot Accuracy@10
1135
+ - type: dot_precision@1
1136
+ value: 0.52
1137
+ name: Dot Precision@1
1138
+ - type: dot_precision@3
1139
+ value: 0.24
1140
+ name: Dot Precision@3
1141
+ - type: dot_precision@5
1142
+ value: 0.15200000000000002
1143
+ name: Dot Precision@5
1144
+ - type: dot_precision@10
1145
+ value: 0.08599999999999998
1146
+ name: Dot Precision@10
1147
+ - type: dot_recall@1
1148
+ value: 0.51
1149
+ name: Dot Recall@1
1150
+ - type: dot_recall@3
1151
+ value: 0.67
1152
+ name: Dot Recall@3
1153
+ - type: dot_recall@5
1154
+ value: 0.7
1155
+ name: Dot Recall@5
1156
+ - type: dot_recall@10
1157
+ value: 0.77
1158
+ name: Dot Recall@10
1159
+ - type: dot_ndcg@10
1160
+ value: 0.6459385405932947
1161
+ name: Dot Ndcg@10
1162
+ - type: dot_mrr@10
1163
+ value: 0.6175238095238095
1164
+ name: Dot Mrr@10
1165
+ - type: dot_map@100
1166
+ value: 0.6086016240895907
1167
+ name: Dot Map@100
1168
+ - type: row_non_zero_mean_query
1169
+ value: 256.0
1170
+ name: Row Non Zero Mean Query
1171
+ - type: row_sparsity_mean_query
1172
+ value: 0.9375
1173
+ name: Row Sparsity Mean Query
1174
+ - type: row_non_zero_mean_corpus
1175
+ value: 256.0
1176
+ name: Row Non Zero Mean Corpus
1177
+ - type: row_sparsity_mean_corpus
1178
+ value: 0.9375
1179
+ name: Row Sparsity Mean Corpus
1180
+ - task:
1181
+ type: sparse-information-retrieval
1182
+ name: Sparse Information Retrieval
1183
+ dataset:
1184
+ name: NanoQuoraRetrieval
1185
+ type: NanoQuoraRetrieval
1186
+ metrics:
1187
+ - type: dot_accuracy@1
1188
+ value: 0.9
1189
+ name: Dot Accuracy@1
1190
+ - type: dot_accuracy@3
1191
+ value: 0.98
1192
+ name: Dot Accuracy@3
1193
+ - type: dot_accuracy@5
1194
+ value: 0.98
1195
+ name: Dot Accuracy@5
1196
+ - type: dot_accuracy@10
1197
+ value: 1.0
1198
+ name: Dot Accuracy@10
1199
+ - type: dot_precision@1
1200
+ value: 0.9
1201
+ name: Dot Precision@1
1202
+ - type: dot_precision@3
1203
+ value: 0.4
1204
+ name: Dot Precision@3
1205
+ - type: dot_precision@5
1206
+ value: 0.25999999999999995
1207
+ name: Dot Precision@5
1208
+ - type: dot_precision@10
1209
+ value: 0.13999999999999999
1210
+ name: Dot Precision@10
1211
+ - type: dot_recall@1
1212
+ value: 0.7906666666666666
1213
+ name: Dot Recall@1
1214
+ - type: dot_recall@3
1215
+ value: 0.9353333333333333
1216
+ name: Dot Recall@3
1217
+ - type: dot_recall@5
1218
+ value: 0.966
1219
+ name: Dot Recall@5
1220
+ - type: dot_recall@10
1221
+ value: 1.0
1222
+ name: Dot Recall@10
1223
+ - type: dot_ndcg@10
1224
+ value: 0.9507875725473174
1225
+ name: Dot Ndcg@10
1226
+ - type: dot_mrr@10
1227
+ value: 0.9395238095238095
1228
+ name: Dot Mrr@10
1229
+ - type: dot_map@100
1230
+ value: 0.9286047619047619
1231
+ name: Dot Map@100
1232
+ - type: row_non_zero_mean_query
1233
+ value: 256.0
1234
+ name: Row Non Zero Mean Query
1235
+ - type: row_sparsity_mean_query
1236
+ value: 0.9375
1237
+ name: Row Sparsity Mean Query
1238
+ - type: row_non_zero_mean_corpus
1239
+ value: 256.0
1240
+ name: Row Non Zero Mean Corpus
1241
+ - type: row_sparsity_mean_corpus
1242
+ value: 0.9375
1243
+ name: Row Sparsity Mean Corpus
1244
+ - task:
1245
+ type: sparse-information-retrieval
1246
+ name: Sparse Information Retrieval
1247
+ dataset:
1248
+ name: NanoSCIDOCS
1249
+ type: NanoSCIDOCS
1250
+ metrics:
1251
+ - type: dot_accuracy@1
1252
+ value: 0.44
1253
+ name: Dot Accuracy@1
1254
+ - type: dot_accuracy@3
1255
+ value: 0.68
1256
+ name: Dot Accuracy@3
1257
+ - type: dot_accuracy@5
1258
+ value: 0.78
1259
+ name: Dot Accuracy@5
1260
+ - type: dot_accuracy@10
1261
+ value: 0.84
1262
+ name: Dot Accuracy@10
1263
+ - type: dot_precision@1
1264
+ value: 0.44
1265
+ name: Dot Precision@1
1266
+ - type: dot_precision@3
1267
+ value: 0.3466666666666666
1268
+ name: Dot Precision@3
1269
+ - type: dot_precision@5
1270
+ value: 0.28800000000000003
1271
+ name: Dot Precision@5
1272
+ - type: dot_precision@10
1273
+ value: 0.20800000000000002
1274
+ name: Dot Precision@10
1275
+ - type: dot_recall@1
1276
+ value: 0.09466666666666669
1277
+ name: Dot Recall@1
1278
+ - type: dot_recall@3
1279
+ value: 0.21566666666666667
1280
+ name: Dot Recall@3
1281
+ - type: dot_recall@5
1282
+ value: 0.29766666666666663
1283
+ name: Dot Recall@5
1284
+ - type: dot_recall@10
1285
+ value: 0.4266666666666667
1286
+ name: Dot Recall@10
1287
+ - type: dot_ndcg@10
1288
+ value: 0.4003633964698161
1289
+ name: Dot Ndcg@10
1290
+ - type: dot_mrr@10
1291
+ value: 0.5814126984126983
1292
+ name: Dot Mrr@10
1293
+ - type: dot_map@100
1294
+ value: 0.3235984936558747
1295
+ name: Dot Map@100
1296
+ - type: row_non_zero_mean_query
1297
+ value: 256.0
1298
+ name: Row Non Zero Mean Query
1299
+ - type: row_sparsity_mean_query
1300
+ value: 0.9375
1301
+ name: Row Sparsity Mean Query
1302
+ - type: row_non_zero_mean_corpus
1303
+ value: 256.0
1304
+ name: Row Non Zero Mean Corpus
1305
+ - type: row_sparsity_mean_corpus
1306
+ value: 0.9375
1307
+ name: Row Sparsity Mean Corpus
1308
+ - task:
1309
+ type: sparse-information-retrieval
1310
+ name: Sparse Information Retrieval
1311
+ dataset:
1312
+ name: NanoArguAna
1313
+ type: NanoArguAna
1314
+ metrics:
1315
+ - type: dot_accuracy@1
1316
+ value: 0.32
1317
+ name: Dot Accuracy@1
1318
+ - type: dot_accuracy@3
1319
+ value: 0.8
1320
+ name: Dot Accuracy@3
1321
+ - type: dot_accuracy@5
1322
+ value: 0.86
1323
+ name: Dot Accuracy@5
1324
+ - type: dot_accuracy@10
1325
+ value: 0.94
1326
+ name: Dot Accuracy@10
1327
+ - type: dot_precision@1
1328
+ value: 0.32
1329
+ name: Dot Precision@1
1330
+ - type: dot_precision@3
1331
+ value: 0.26666666666666666
1332
+ name: Dot Precision@3
1333
+ - type: dot_precision@5
1334
+ value: 0.17199999999999996
1335
+ name: Dot Precision@5
1336
+ - type: dot_precision@10
1337
+ value: 0.09399999999999999
1338
+ name: Dot Precision@10
1339
+ - type: dot_recall@1
1340
+ value: 0.32
1341
+ name: Dot Recall@1
1342
+ - type: dot_recall@3
1343
+ value: 0.8
1344
+ name: Dot Recall@3
1345
+ - type: dot_recall@5
1346
+ value: 0.86
1347
+ name: Dot Recall@5
1348
+ - type: dot_recall@10
1349
+ value: 0.94
1350
+ name: Dot Recall@10
1351
+ - type: dot_ndcg@10
1352
+ value: 0.6564774175565204
1353
+ name: Dot Ndcg@10
1354
+ - type: dot_mrr@10
1355
+ value: 0.562579365079365
1356
+ name: Dot Mrr@10
1357
+ - type: dot_map@100
1358
+ value: 0.56506105006105
1359
+ name: Dot Map@100
1360
+ - type: row_non_zero_mean_query
1361
+ value: 256.0
1362
+ name: Row Non Zero Mean Query
1363
+ - type: row_sparsity_mean_query
1364
+ value: 0.9375
1365
+ name: Row Sparsity Mean Query
1366
+ - type: row_non_zero_mean_corpus
1367
+ value: 256.0
1368
+ name: Row Non Zero Mean Corpus
1369
+ - type: row_sparsity_mean_corpus
1370
+ value: 0.9375
1371
+ name: Row Sparsity Mean Corpus
1372
+ - task:
1373
+ type: sparse-information-retrieval
1374
+ name: Sparse Information Retrieval
1375
+ dataset:
1376
+ name: NanoSciFact
1377
+ type: NanoSciFact
1378
+ metrics:
1379
+ - type: dot_accuracy@1
1380
+ value: 0.64
1381
+ name: Dot Accuracy@1
1382
+ - type: dot_accuracy@3
1383
+ value: 0.72
1384
+ name: Dot Accuracy@3
1385
+ - type: dot_accuracy@5
1386
+ value: 0.8
1387
+ name: Dot Accuracy@5
1388
+ - type: dot_accuracy@10
1389
+ value: 0.84
1390
+ name: Dot Accuracy@10
1391
+ - type: dot_precision@1
1392
+ value: 0.64
1393
+ name: Dot Precision@1
1394
+ - type: dot_precision@3
1395
+ value: 0.24666666666666665
1396
+ name: Dot Precision@3
1397
+ - type: dot_precision@5
1398
+ value: 0.17199999999999996
1399
+ name: Dot Precision@5
1400
+ - type: dot_precision@10
1401
+ value: 0.09399999999999999
1402
+ name: Dot Precision@10
1403
+ - type: dot_recall@1
1404
+ value: 0.615
1405
+ name: Dot Recall@1
1406
+ - type: dot_recall@3
1407
+ value: 0.69
1408
+ name: Dot Recall@3
1409
+ - type: dot_recall@5
1410
+ value: 0.775
1411
+ name: Dot Recall@5
1412
+ - type: dot_recall@10
1413
+ value: 0.83
1414
+ name: Dot Recall@10
1415
+ - type: dot_ndcg@10
1416
+ value: 0.7238166818627989
1417
+ name: Dot Ndcg@10
1418
+ - type: dot_mrr@10
1419
+ value: 0.696888888888889
1420
+ name: Dot Mrr@10
1421
+ - type: dot_map@100
1422
+ value: 0.6903857890475537
1423
+ name: Dot Map@100
1424
+ - type: row_non_zero_mean_query
1425
+ value: 256.0
1426
+ name: Row Non Zero Mean Query
1427
+ - type: row_sparsity_mean_query
1428
+ value: 0.9375
1429
+ name: Row Sparsity Mean Query
1430
+ - type: row_non_zero_mean_corpus
1431
+ value: 256.0
1432
+ name: Row Non Zero Mean Corpus
1433
+ - type: row_sparsity_mean_corpus
1434
+ value: 0.9375
1435
+ name: Row Sparsity Mean Corpus
1436
+ - task:
1437
+ type: sparse-information-retrieval
1438
+ name: Sparse Information Retrieval
1439
+ dataset:
1440
+ name: NanoTouche2020
1441
+ type: NanoTouche2020
1442
+ metrics:
1443
+ - type: dot_accuracy@1
1444
+ value: 0.5714285714285714
1445
+ name: Dot Accuracy@1
1446
+ - type: dot_accuracy@3
1447
+ value: 0.8367346938775511
1448
+ name: Dot Accuracy@3
1449
+ - type: dot_accuracy@5
1450
+ value: 0.8979591836734694
1451
+ name: Dot Accuracy@5
1452
+ - type: dot_accuracy@10
1453
+ value: 1.0
1454
+ name: Dot Accuracy@10
1455
+ - type: dot_precision@1
1456
+ value: 0.5714285714285714
1457
+ name: Dot Precision@1
1458
+ - type: dot_precision@3
1459
+ value: 0.5374149659863945
1460
+ name: Dot Precision@3
1461
+ - type: dot_precision@5
1462
+ value: 0.5306122448979592
1463
+ name: Dot Precision@5
1464
+ - type: dot_precision@10
1465
+ value: 0.43061224489795913
1466
+ name: Dot Precision@10
1467
+ - type: dot_recall@1
1468
+ value: 0.03877084212205675
1469
+ name: Dot Recall@1
1470
+ - type: dot_recall@3
1471
+ value: 0.10977308661269546
1472
+ name: Dot Recall@3
1473
+ - type: dot_recall@5
1474
+ value: 0.1862486001524683
1475
+ name: Dot Recall@5
1476
+ - type: dot_recall@10
1477
+ value: 0.2846992525980098
1478
+ name: Dot Recall@10
1479
+ - type: dot_ndcg@10
1480
+ value: 0.4824099438070076
1481
+ name: Dot Ndcg@10
1482
+ - type: dot_mrr@10
1483
+ value: 0.7269517330741821
1484
+ name: Dot Mrr@10
1485
+ - type: dot_map@100
1486
+ value: 0.34839943189604633
1487
+ name: Dot Map@100
1488
+ - type: row_non_zero_mean_query
1489
+ value: 256.0
1490
+ name: Row Non Zero Mean Query
1491
+ - type: row_sparsity_mean_query
1492
+ value: 0.9375
1493
+ name: Row Sparsity Mean Query
1494
+ - type: row_non_zero_mean_corpus
1495
+ value: 256.0
1496
+ name: Row Non Zero Mean Corpus
1497
+ - type: row_sparsity_mean_corpus
1498
+ value: 0.9375
1499
+ name: Row Sparsity Mean Corpus
1500
+ - task:
1501
+ type: sparse-nano-beir
1502
+ name: Sparse Nano BEIR
1503
+ dataset:
1504
+ name: NanoBEIR mean
1505
+ type: NanoBEIR_mean
1506
+ metrics:
1507
+ - type: dot_accuracy@1
1508
+ value: 0.5470329670329671
1509
+ name: Dot Accuracy@1
1510
+ - type: dot_accuracy@3
1511
+ value: 0.7474411302982732
1512
+ name: Dot Accuracy@3
1513
+ - type: dot_accuracy@5
1514
+ value: 0.8013814756671901
1515
+ name: Dot Accuracy@5
1516
+ - type: dot_accuracy@10
1517
+ value: 0.8676923076923077
1518
+ name: Dot Accuracy@10
1519
+ - type: dot_precision@1
1520
+ value: 0.5470329670329671
1521
+ name: Dot Precision@1
1522
+ - type: dot_precision@3
1523
+ value: 0.34800627943485085
1524
+ name: Dot Precision@3
1525
+ - type: dot_precision@5
1526
+ value: 0.2697394034536892
1527
+ name: Dot Precision@5
1528
+ - type: dot_precision@10
1529
+ value: 0.1832778649921507
1530
+ name: Dot Precision@10
1531
+ - type: dot_recall@1
1532
+ value: 0.33192242541320743
1533
+ name: Dot Recall@1
1534
+ - type: dot_recall@3
1535
+ value: 0.506784183648691
1536
+ name: Dot Recall@3
1537
+ - type: dot_recall@5
1538
+ value: 0.5645410158766739
1539
+ name: Dot Recall@5
1540
+ - type: dot_recall@10
1541
+ value: 0.6384345730377028
1542
+ name: Dot Recall@10
1543
+ - type: dot_ndcg@10
1544
+ value: 0.600623515284691
1545
+ name: Dot Ndcg@10
1546
+ - type: dot_mrr@10
1547
+ value: 0.6584327950960606
1548
+ name: Dot Mrr@10
1549
+ - type: dot_map@100
1550
+ value: 0.5229317814279774
1551
+ name: Dot Map@100
1552
+ - type: row_non_zero_mean_query
1553
+ value: 256.0
1554
+ name: Row Non Zero Mean Query
1555
+ - type: row_sparsity_mean_query
1556
+ value: 0.9375
1557
+ name: Row Sparsity Mean Query
1558
+ - type: row_non_zero_mean_corpus
1559
+ value: 256.0
1560
+ name: Row Non Zero Mean Corpus
1561
+ - type: row_sparsity_mean_corpus
1562
+ value: 0.9375
1563
+ name: Row Sparsity Mean Corpus
1564
+ ---
1565
+
1566
+ # Sparse CSR model trained on Natural Questions
1567
+
1568
+ This is a [CSR Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) on the [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 4096-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
1569
+
1570
+ ## Model Details
1571
+
1572
+ ### Model Description
1573
+ - **Model Type:** CSR Sparse Encoder
1574
+ - **Base model:** [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) <!-- at revision db9d1fe0f31addb4978201b2bf3e577f3f8900d2 -->
1575
+ - **Maximum Sequence Length:** 512 tokens
1576
+ - **Output Dimensionality:** 4096 dimensions
1577
+ - **Similarity Function:** Dot Product
1578
+ - **Training Dataset:**
1579
+ - [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions)
1580
+ - **Language:** en
1581
+ - **License:** apache-2.0
1582
+
1583
+ ### Model Sources
1584
+
1585
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
1586
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
1587
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
1588
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
1589
+
1590
+ ### Full Model Architecture
1591
+
1592
+ ```
1593
+ SparseEncoder(
1594
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
1595
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
1596
+ (2): CSRSparsity({'input_dim': 1024, 'hidden_dim': 4096, 'k': 256, 'k_aux': 512, 'normalize': False, 'dead_threshold': 30})
1597
+ )
1598
+ ```
1599
+
1600
+ ## Usage
1601
+
1602
+ ### Direct Usage (Sentence Transformers)
1603
+
1604
+ First install the Sentence Transformers library:
1605
+
1606
+ ```bash
1607
+ pip install -U sentence-transformers
1608
+ ```
1609
+
1610
+ Then you can load this model and run inference.
1611
+ ```python
1612
+ from sentence_transformers import SparseEncoder
1613
+
1614
+ # Download from the 🤗 Hub
1615
+ model = SparseEncoder("tomaarsen/csr-mxbai-embed-large-v1-nq")
1616
+ # Run inference
1617
+ sentences = [
1618
+ 'who is cornelius in the book of acts',
1619
+ 'Cornelius the Centurion Cornelius (Greek: Κορνήλιος) was a Roman centurion who is considered by Christians to be one of the first Gentiles to convert to the faith, as related in Acts of the Apostles.',
1620
+ "Joe Ranft Ranft reunited with Lasseter when he was hired by Pixar in 1991 as their head of story.[1] There he worked on all of their films produced up to 2006; this included Toy Story (for which he received an Academy Award nomination) and A Bug's Life, as the co-story writer and others as story supervisor. His final film was Cars. He also voiced characters in many of the films, including Heimlich the caterpillar in A Bug's Life, Wheezy the penguin in Toy Story 2, and Jacques the shrimp in Finding Nemo.[1]",
1621
+ ]
1622
+ embeddings = model.encode(sentences)
1623
+ print(embeddings.shape)
1624
+ # (3, 4096)
1625
+
1626
+ # Get the similarity scores for the embeddings
1627
+ similarities = model.similarity(embeddings, embeddings)
1628
+ print(similarities.shape)
1629
+ # [3, 3]
1630
+ ```
1631
+
1632
+ <!--
1633
+ ### Direct Usage (Transformers)
1634
+
1635
+ <details><summary>Click to see the direct usage in Transformers</summary>
1636
+
1637
+ </details>
1638
+ -->
1639
+
1640
+ <!--
1641
+ ### Downstream Usage (Sentence Transformers)
1642
+
1643
+ You can finetune this model on your own dataset.
1644
+
1645
+ <details><summary>Click to expand</summary>
1646
+
1647
+ </details>
1648
+ -->
1649
+
1650
+ <!--
1651
+ ### Out-of-Scope Use
1652
+
1653
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
1654
+ -->
1655
+
1656
+ ## Evaluation
1657
+
1658
+ ### Metrics
1659
+
1660
+ #### Sparse Information Retrieval
1661
+
1662
+ * Datasets: `NanoMSMARCO_128`, `NanoNFCorpus_128` and `NanoNQ_128`
1663
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator) with these parameters:
1664
+ ```json
1665
+ {
1666
+ "max_active_dims": 128
1667
+ }
1668
+ ```
1669
+
1670
+ | Metric | NanoMSMARCO_128 | NanoNFCorpus_128 | NanoNQ_128 |
1671
+ |:-------------------------|:----------------|:-----------------|:-----------|
1672
+ | dot_accuracy@1 | 0.36 | 0.3 | 0.44 |
1673
+ | dot_accuracy@3 | 0.6 | 0.58 | 0.58 |
1674
+ | dot_accuracy@5 | 0.66 | 0.64 | 0.66 |
1675
+ | dot_accuracy@10 | 0.78 | 0.66 | 0.76 |
1676
+ | dot_precision@1 | 0.36 | 0.3 | 0.44 |
1677
+ | dot_precision@3 | 0.2 | 0.32 | 0.1933 |
1678
+ | dot_precision@5 | 0.132 | 0.284 | 0.132 |
1679
+ | dot_precision@10 | 0.078 | 0.224 | 0.082 |
1680
+ | dot_recall@1 | 0.36 | 0.0206 | 0.43 |
1681
+ | dot_recall@3 | 0.6 | 0.0764 | 0.54 |
1682
+ | dot_recall@5 | 0.66 | 0.0909 | 0.6 |
1683
+ | dot_recall@10 | 0.78 | 0.1095 | 0.73 |
1684
+ | **dot_ndcg@10** | **0.5701** | **0.2706** | **0.576** |
1685
+ | dot_mrr@10 | 0.5032 | 0.4388 | 0.5402 |
1686
+ | dot_map@100 | 0.5145 | 0.1157 | 0.5349 |
1687
+ | row_non_zero_mean_query | 128.0 | 128.0 | 128.0 |
1688
+ | row_sparsity_mean_query | 0.9688 | 0.9688 | 0.9688 |
1689
+ | row_non_zero_mean_corpus | 128.0 | 128.0 | 128.0 |
1690
+ | row_sparsity_mean_corpus | 0.9688 | 0.9688 | 0.9688 |
1691
+
1692
+ #### Sparse Nano BEIR
1693
+
1694
+ * Dataset: `NanoBEIR_mean_128`
1695
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1696
+ ```json
1697
+ {
1698
+ "dataset_names": [
1699
+ "msmarco",
1700
+ "nfcorpus",
1701
+ "nq"
1702
+ ],
1703
+ "max_active_dims": 128
1704
+ }
1705
+ ```
1706
+
1707
+ | Metric | Value |
1708
+ |:-------------------------|:-----------|
1709
+ | dot_accuracy@1 | 0.3667 |
1710
+ | dot_accuracy@3 | 0.5867 |
1711
+ | dot_accuracy@5 | 0.6533 |
1712
+ | dot_accuracy@10 | 0.7333 |
1713
+ | dot_precision@1 | 0.3667 |
1714
+ | dot_precision@3 | 0.2378 |
1715
+ | dot_precision@5 | 0.1827 |
1716
+ | dot_precision@10 | 0.128 |
1717
+ | dot_recall@1 | 0.2702 |
1718
+ | dot_recall@3 | 0.4055 |
1719
+ | dot_recall@5 | 0.4503 |
1720
+ | dot_recall@10 | 0.5398 |
1721
+ | **dot_ndcg@10** | **0.4722** |
1722
+ | dot_mrr@10 | 0.4941 |
1723
+ | dot_map@100 | 0.3884 |
1724
+ | row_non_zero_mean_query | 128.0 |
1725
+ | row_sparsity_mean_query | 0.9688 |
1726
+ | row_non_zero_mean_corpus | 128.0 |
1727
+ | row_sparsity_mean_corpus | 0.9688 |
1728
+
1729
+ #### Sparse Information Retrieval
1730
+
1731
+ * Datasets: `NanoMSMARCO_256`, `NanoNFCorpus_256` and `NanoNQ_256`
1732
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator) with these parameters:
1733
+ ```json
1734
+ {
1735
+ "max_active_dims": 256
1736
+ }
1737
+ ```
1738
+
1739
+ | Metric | NanoMSMARCO_256 | NanoNFCorpus_256 | NanoNQ_256 |
1740
+ |:-------------------------|:----------------|:-----------------|:-----------|
1741
+ | dot_accuracy@1 | 0.36 | 0.4 | 0.44 |
1742
+ | dot_accuracy@3 | 0.64 | 0.56 | 0.7 |
1743
+ | dot_accuracy@5 | 0.76 | 0.64 | 0.76 |
1744
+ | dot_accuracy@10 | 0.84 | 0.76 | 0.82 |
1745
+ | dot_precision@1 | 0.36 | 0.4 | 0.44 |
1746
+ | dot_precision@3 | 0.2133 | 0.3467 | 0.2333 |
1747
+ | dot_precision@5 | 0.152 | 0.316 | 0.156 |
1748
+ | dot_precision@10 | 0.084 | 0.27 | 0.088 |
1749
+ | dot_recall@1 | 0.36 | 0.0239 | 0.42 |
1750
+ | dot_recall@3 | 0.64 | 0.0606 | 0.66 |
1751
+ | dot_recall@5 | 0.76 | 0.0838 | 0.71 |
1752
+ | dot_recall@10 | 0.84 | 0.1457 | 0.79 |
1753
+ | **dot_ndcg@10** | **0.602** | **0.3186** | **0.6113** |
1754
+ | dot_mrr@10 | 0.5252 | 0.5102 | 0.5685 |
1755
+ | dot_map@100 | 0.5322 | 0.1354 | 0.5538 |
1756
+ | row_non_zero_mean_query | 256.0 | 256.0 | 256.0 |
1757
+ | row_sparsity_mean_query | 0.9375 | 0.9375 | 0.9375 |
1758
+ | row_non_zero_mean_corpus | 256.0 | 256.0 | 256.0 |
1759
+ | row_sparsity_mean_corpus | 0.9375 | 0.9375 | 0.9375 |
1760
+
1761
+ #### Sparse Nano BEIR
1762
+
1763
+ * Dataset: `NanoBEIR_mean_256`
1764
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1765
+ ```json
1766
+ {
1767
+ "dataset_names": [
1768
+ "msmarco",
1769
+ "nfcorpus",
1770
+ "nq"
1771
+ ],
1772
+ "max_active_dims": 256
1773
+ }
1774
+ ```
1775
+
1776
+ | Metric | Value |
1777
+ |:-------------------------|:-----------|
1778
+ | dot_accuracy@1 | 0.4 |
1779
+ | dot_accuracy@3 | 0.6333 |
1780
+ | dot_accuracy@5 | 0.72 |
1781
+ | dot_accuracy@10 | 0.8067 |
1782
+ | dot_precision@1 | 0.4 |
1783
+ | dot_precision@3 | 0.2644 |
1784
+ | dot_precision@5 | 0.208 |
1785
+ | dot_precision@10 | 0.1473 |
1786
+ | dot_recall@1 | 0.268 |
1787
+ | dot_recall@3 | 0.4535 |
1788
+ | dot_recall@5 | 0.5179 |
1789
+ | dot_recall@10 | 0.5919 |
1790
+ | **dot_ndcg@10** | **0.5107** |
1791
+ | dot_mrr@10 | 0.5346 |
1792
+ | dot_map@100 | 0.4071 |
1793
+ | row_non_zero_mean_query | 256.0 |
1794
+ | row_sparsity_mean_query | 0.9375 |
1795
+ | row_non_zero_mean_corpus | 256.0 |
1796
+ | row_sparsity_mean_corpus | 0.9375 |
1797
+
1798
+ #### Sparse Information Retrieval
1799
+
1800
+ * Datasets: `NanoClimateFEVER`, `NanoDBPedia`, `NanoFEVER`, `NanoFiQA2018`, `NanoHotpotQA`, `NanoMSMARCO`, `NanoNFCorpus`, `NanoNQ`, `NanoQuoraRetrieval`, `NanoSCIDOCS`, `NanoArguAna`, `NanoSciFact` and `NanoTouche2020`
1801
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
1802
+
1803
+ | Metric | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
1804
+ |:-------------------------|:-----------------|:------------|:-----------|:-------------|:-------------|:------------|:-------------|:-----------|:-------------------|:------------|:------------|:------------|:---------------|
1805
+ | dot_accuracy@1 | 0.32 | 0.66 | 0.78 | 0.4 | 0.78 | 0.38 | 0.4 | 0.52 | 0.9 | 0.44 | 0.32 | 0.64 | 0.5714 |
1806
+ | dot_accuracy@3 | 0.44 | 0.88 | 0.9 | 0.58 | 0.9 | 0.7 | 0.58 | 0.72 | 0.98 | 0.68 | 0.8 | 0.72 | 0.8367 |
1807
+ | dot_accuracy@5 | 0.52 | 0.94 | 0.94 | 0.62 | 0.94 | 0.74 | 0.64 | 0.76 | 0.98 | 0.78 | 0.86 | 0.8 | 0.898 |
1808
+ | dot_accuracy@10 | 0.66 | 0.94 | 0.96 | 0.76 | 1.0 | 0.82 | 0.72 | 0.8 | 1.0 | 0.84 | 0.94 | 0.84 | 1.0 |
1809
+ | dot_precision@1 | 0.32 | 0.66 | 0.78 | 0.4 | 0.78 | 0.38 | 0.4 | 0.52 | 0.9 | 0.44 | 0.32 | 0.64 | 0.5714 |
1810
+ | dot_precision@3 | 0.16 | 0.6467 | 0.3067 | 0.28 | 0.5 | 0.2333 | 0.36 | 0.24 | 0.4 | 0.3467 | 0.2667 | 0.2467 | 0.5374 |
1811
+ | dot_precision@5 | 0.128 | 0.604 | 0.196 | 0.2 | 0.336 | 0.148 | 0.32 | 0.152 | 0.26 | 0.288 | 0.172 | 0.172 | 0.5306 |
1812
+ | dot_precision@10 | 0.1 | 0.49 | 0.1 | 0.124 | 0.176 | 0.082 | 0.258 | 0.086 | 0.14 | 0.208 | 0.094 | 0.094 | 0.4306 |
1813
+ | dot_recall@1 | 0.16 | 0.0691 | 0.7267 | 0.1779 | 0.39 | 0.38 | 0.0422 | 0.51 | 0.7907 | 0.0947 | 0.32 | 0.615 | 0.0388 |
1814
+ | dot_recall@3 | 0.205 | 0.1784 | 0.8567 | 0.3991 | 0.75 | 0.7 | 0.0783 | 0.67 | 0.9353 | 0.2157 | 0.8 | 0.69 | 0.1098 |
1815
+ | dot_recall@5 | 0.255 | 0.2625 | 0.9067 | 0.4547 | 0.84 | 0.74 | 0.0953 | 0.7 | 0.966 | 0.2977 | 0.86 | 0.775 | 0.1862 |
1816
+ | dot_recall@10 | 0.3833 | 0.3507 | 0.9267 | 0.5628 | 0.88 | 0.82 | 0.1247 | 0.77 | 1.0 | 0.4267 | 0.94 | 0.83 | 0.2847 |
1817
+ | **dot_ndcg@10** | **0.3182** | **0.6035** | **0.8475** | **0.4422** | **0.8076** | **0.6106** | **0.3188** | **0.6459** | **0.9508** | **0.4004** | **0.6565** | **0.7238** | **0.4824** |
1818
+ | dot_mrr@10 | 0.4123 | 0.7787 | 0.849 | 0.5031 | 0.851 | 0.5419 | 0.4988 | 0.6175 | 0.9395 | 0.5814 | 0.5626 | 0.6969 | 0.727 |
1819
+ | dot_map@100 | 0.2534 | 0.4434 | 0.8125 | 0.3727 | 0.7533 | 0.551 | 0.1473 | 0.6086 | 0.9286 | 0.3236 | 0.5651 | 0.6904 | 0.3484 |
1820
+ | row_non_zero_mean_query | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 |
1821
+ | row_sparsity_mean_query | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 |
1822
+ | row_non_zero_mean_corpus | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 | 256.0 |
1823
+ | row_sparsity_mean_corpus | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 | 0.9375 |
1824
+
1825
+ #### Sparse Nano BEIR
1826
+
1827
+ * Dataset: `NanoBEIR_mean`
1828
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
1829
+ ```json
1830
+ {
1831
+ "dataset_names": [
1832
+ "climatefever",
1833
+ "dbpedia",
1834
+ "fever",
1835
+ "fiqa2018",
1836
+ "hotpotqa",
1837
+ "msmarco",
1838
+ "nfcorpus",
1839
+ "nq",
1840
+ "quoraretrieval",
1841
+ "scidocs",
1842
+ "arguana",
1843
+ "scifact",
1844
+ "touche2020"
1845
+ ]
1846
+ }
1847
+ ```
1848
+
1849
+ | Metric | Value |
1850
+ |:-------------------------|:-----------|
1851
+ | dot_accuracy@1 | 0.547 |
1852
+ | dot_accuracy@3 | 0.7474 |
1853
+ | dot_accuracy@5 | 0.8014 |
1854
+ | dot_accuracy@10 | 0.8677 |
1855
+ | dot_precision@1 | 0.547 |
1856
+ | dot_precision@3 | 0.348 |
1857
+ | dot_precision@5 | 0.2697 |
1858
+ | dot_precision@10 | 0.1833 |
1859
+ | dot_recall@1 | 0.3319 |
1860
+ | dot_recall@3 | 0.5068 |
1861
+ | dot_recall@5 | 0.5645 |
1862
+ | dot_recall@10 | 0.6384 |
1863
+ | **dot_ndcg@10** | **0.6006** |
1864
+ | dot_mrr@10 | 0.6584 |
1865
+ | dot_map@100 | 0.5229 |
1866
+ | row_non_zero_mean_query | 256.0 |
1867
+ | row_sparsity_mean_query | 0.9375 |
1868
+ | row_non_zero_mean_corpus | 256.0 |
1869
+ | row_sparsity_mean_corpus | 0.9375 |
1870
+
1871
+ <!--
1872
+ ## Bias, Risks and Limitations
1873
+
1874
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
1875
+ -->
1876
+
1877
+ <!--
1878
+ ### Recommendations
1879
+
1880
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
1881
+ -->
1882
+
1883
+ ## Training Details
1884
+
1885
+ ### Training Dataset
1886
+
1887
+ #### natural-questions
1888
+
1889
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
1890
+ * Size: 99,000 training samples
1891
+ * Columns: <code>query</code> and <code>answer</code>
1892
+ * Approximate statistics based on the first 1000 samples:
1893
+ | | query | answer |
1894
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
1895
+ | type | string | string |
1896
+ | details | <ul><li>min: 10 tokens</li><li>mean: 11.71 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 131.81 tokens</li><li>max: 450 tokens</li></ul> |
1897
+ * Samples:
1898
+ | query | answer |
1899
+ |:--------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1900
+ | <code>who played the father in papa don't preach</code> | <code>Alex McArthur Alex McArthur (born March 6, 1957) is an American actor.</code> |
1901
+ | <code>where was the location of the battle of hastings</code> | <code>Battle of Hastings The Battle of Hastings[a] was fought on 14 October 1066 between the Norman-French army of William, the Duke of Normandy, and an English army under the Anglo-Saxon King Harold Godwinson, beginning the Norman conquest of England. It took place approximately 7 miles (11 kilometres) northwest of Hastings, close to the present-day town of Battle, East Sussex, and was a decisive Norman victory.</code> |
1902
+ | <code>how many puppies can a dog give birth to</code> | <code>Canine reproduction The largest litter size to date was set by a Neapolitan Mastiff in Manea, Cambridgeshire, UK on November 29, 2004; the litter was 24 puppies.[22]</code> |
1903
+ * Loss: [<code>CSRLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#csrloss) with these parameters:
1904
+ ```json
1905
+ {
1906
+ "beta": 0.1,
1907
+ "gamma": 1.0,
1908
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')"
1909
+ }
1910
+ ```
1911
+
1912
+ ### Evaluation Dataset
1913
+
1914
+ #### natural-questions
1915
+
1916
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
1917
+ * Size: 1,000 evaluation samples
1918
+ * Columns: <code>query</code> and <code>answer</code>
1919
+ * Approximate statistics based on the first 1000 samples:
1920
+ | | query | answer |
1921
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
1922
+ | type | string | string |
1923
+ | details | <ul><li>min: 10 tokens</li><li>mean: 11.69 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 134.01 tokens</li><li>max: 512 tokens</li></ul> |
1924
+ * Samples:
1925
+ | query | answer |
1926
+ |:-------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1927
+ | <code>where is the tiber river located in italy</code> | <code>Tiber The Tiber (/ˈtaɪbər/, Latin: Tiberis,[1] Italian: Tevere [ˈteːvere])[2] is the third-longest river in Italy, rising in the Apennine Mountains in Emilia-Romagna and flowing 406 kilometres (252 mi) through Tuscany, Umbria and Lazio, where it is joined by the river Aniene, to the Tyrrhenian Sea, between Ostia and Fiumicino.[3] It drains a basin estimated at 17,375 square kilometres (6,709 sq mi). The river has achieved lasting fame as the main watercourse of the city of Rome, founded on its eastern banks.</code> |
1928
+ | <code>what kind of car does jay gatsby drive</code> | <code>Jay Gatsby At the Buchanan home, Jordan Baker, Nick, Jay, and the Buchanans decide to visit New York City. Tom borrows Gatsby's yellow Rolls Royce to drive up to the city. On the way to New York City, Tom makes a detour at a gas station in "the Valley of Ashes", a run-down part of Long Island. The owner, George Wilson, shares his concern that his wife, Myrtle, may be having an affair. This unnerves Tom, who has been having an affair with Myrtle, and he leaves in a hurry.</code> |
1929
+ | <code>who sings if i can dream about you</code> | <code>I Can Dream About You "I Can Dream About You" is a song performed by American singer Dan Hartman on the soundtrack album of the film Streets of Fire. Released in 1984 as a single from the soundtrack, and included on Hartman's album I Can Dream About You, it reached number 6 on the Billboard Hot 100.[1]</code> |
1930
+ * Loss: [<code>CSRLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#csrloss) with these parameters:
1931
+ ```json
1932
+ {
1933
+ "beta": 0.1,
1934
+ "gamma": 1.0,
1935
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')"
1936
+ }
1937
+ ```
1938
+
1939
+ ### Training Hyperparameters
1940
+ #### Non-Default Hyperparameters
1941
+
1942
+ - `eval_strategy`: steps
1943
+ - `per_device_train_batch_size`: 64
1944
+ - `per_device_eval_batch_size`: 64
1945
+ - `learning_rate`: 4e-05
1946
+ - `num_train_epochs`: 1
1947
+ - `bf16`: True
1948
+ - `load_best_model_at_end`: True
1949
+ - `batch_sampler`: no_duplicates
1950
+
1951
+ #### All Hyperparameters
1952
+ <details><summary>Click to expand</summary>
1953
+
1954
+ - `overwrite_output_dir`: False
1955
+ - `do_predict`: False
1956
+ - `eval_strategy`: steps
1957
+ - `prediction_loss_only`: True
1958
+ - `per_device_train_batch_size`: 64
1959
+ - `per_device_eval_batch_size`: 64
1960
+ - `per_gpu_train_batch_size`: None
1961
+ - `per_gpu_eval_batch_size`: None
1962
+ - `gradient_accumulation_steps`: 1
1963
+ - `eval_accumulation_steps`: None
1964
+ - `torch_empty_cache_steps`: None
1965
+ - `learning_rate`: 4e-05
1966
+ - `weight_decay`: 0.0
1967
+ - `adam_beta1`: 0.9
1968
+ - `adam_beta2`: 0.999
1969
+ - `adam_epsilon`: 1e-08
1970
+ - `max_grad_norm`: 1.0
1971
+ - `num_train_epochs`: 1
1972
+ - `max_steps`: -1
1973
+ - `lr_scheduler_type`: linear
1974
+ - `lr_scheduler_kwargs`: {}
1975
+ - `warmup_ratio`: 0.0
1976
+ - `warmup_steps`: 0
1977
+ - `log_level`: passive
1978
+ - `log_level_replica`: warning
1979
+ - `log_on_each_node`: True
1980
+ - `logging_nan_inf_filter`: True
1981
+ - `save_safetensors`: True
1982
+ - `save_on_each_node`: False
1983
+ - `save_only_model`: False
1984
+ - `restore_callback_states_from_checkpoint`: False
1985
+ - `no_cuda`: False
1986
+ - `use_cpu`: False
1987
+ - `use_mps_device`: False
1988
+ - `seed`: 42
1989
+ - `data_seed`: None
1990
+ - `jit_mode_eval`: False
1991
+ - `use_ipex`: False
1992
+ - `bf16`: True
1993
+ - `fp16`: False
1994
+ - `fp16_opt_level`: O1
1995
+ - `half_precision_backend`: auto
1996
+ - `bf16_full_eval`: False
1997
+ - `fp16_full_eval`: False
1998
+ - `tf32`: None
1999
+ - `local_rank`: 0
2000
+ - `ddp_backend`: None
2001
+ - `tpu_num_cores`: None
2002
+ - `tpu_metrics_debug`: False
2003
+ - `debug`: []
2004
+ - `dataloader_drop_last`: False
2005
+ - `dataloader_num_workers`: 0
2006
+ - `dataloader_prefetch_factor`: None
2007
+ - `past_index`: -1
2008
+ - `disable_tqdm`: False
2009
+ - `remove_unused_columns`: True
2010
+ - `label_names`: None
2011
+ - `load_best_model_at_end`: True
2012
+ - `ignore_data_skip`: False
2013
+ - `fsdp`: []
2014
+ - `fsdp_min_num_params`: 0
2015
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
2016
+ - `fsdp_transformer_layer_cls_to_wrap`: None
2017
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
2018
+ - `deepspeed`: None
2019
+ - `label_smoothing_factor`: 0.0
2020
+ - `optim`: adamw_torch
2021
+ - `optim_args`: None
2022
+ - `adafactor`: False
2023
+ - `group_by_length`: False
2024
+ - `length_column_name`: length
2025
+ - `ddp_find_unused_parameters`: None
2026
+ - `ddp_bucket_cap_mb`: None
2027
+ - `ddp_broadcast_buffers`: False
2028
+ - `dataloader_pin_memory`: True
2029
+ - `dataloader_persistent_workers`: False
2030
+ - `skip_memory_metrics`: True
2031
+ - `use_legacy_prediction_loop`: False
2032
+ - `push_to_hub`: False
2033
+ - `resume_from_checkpoint`: None
2034
+ - `hub_model_id`: None
2035
+ - `hub_strategy`: every_save
2036
+ - `hub_private_repo`: None
2037
+ - `hub_always_push`: False
2038
+ - `gradient_checkpointing`: False
2039
+ - `gradient_checkpointing_kwargs`: None
2040
+ - `include_inputs_for_metrics`: False
2041
+ - `include_for_metrics`: []
2042
+ - `eval_do_concat_batches`: True
2043
+ - `fp16_backend`: auto
2044
+ - `push_to_hub_model_id`: None
2045
+ - `push_to_hub_organization`: None
2046
+ - `mp_parameters`:
2047
+ - `auto_find_batch_size`: False
2048
+ - `full_determinism`: False
2049
+ - `torchdynamo`: None
2050
+ - `ray_scope`: last
2051
+ - `ddp_timeout`: 1800
2052
+ - `torch_compile`: False
2053
+ - `torch_compile_backend`: None
2054
+ - `torch_compile_mode`: None
2055
+ - `dispatch_batches`: None
2056
+ - `split_batches`: None
2057
+ - `include_tokens_per_second`: False
2058
+ - `include_num_input_tokens_seen`: False
2059
+ - `neftune_noise_alpha`: None
2060
+ - `optim_target_modules`: None
2061
+ - `batch_eval_metrics`: False
2062
+ - `eval_on_start`: False
2063
+ - `use_liger_kernel`: False
2064
+ - `eval_use_gather_object`: False
2065
+ - `average_tokens_across_devices`: False
2066
+ - `prompts`: None
2067
+ - `batch_sampler`: no_duplicates
2068
+ - `multi_dataset_batch_sampler`: proportional
2069
+
2070
+ </details>
2071
+
2072
+ ### Training Logs
2073
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_128_dot_ndcg@10 | NanoNFCorpus_128_dot_ndcg@10 | NanoNQ_128_dot_ndcg@10 | NanoBEIR_mean_128_dot_ndcg@10 | NanoMSMARCO_256_dot_ndcg@10 | NanoNFCorpus_256_dot_ndcg@10 | NanoNQ_256_dot_ndcg@10 | NanoBEIR_mean_256_dot_ndcg@10 | NanoClimateFEVER_dot_ndcg@10 | NanoDBPedia_dot_ndcg@10 | NanoFEVER_dot_ndcg@10 | NanoFiQA2018_dot_ndcg@10 | NanoHotpotQA_dot_ndcg@10 | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoQuoraRetrieval_dot_ndcg@10 | NanoSCIDOCS_dot_ndcg@10 | NanoArguAna_dot_ndcg@10 | NanoSciFact_dot_ndcg@10 | NanoTouche2020_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
2074
+ |:----------:|:-------:|:-------------:|:---------------:|:---------------------------:|:----------------------------:|:----------------------:|:-----------------------------:|:---------------------------:|:----------------------------:|:----------------------:|:-----------------------------:|:----------------------------:|:-----------------------:|:---------------------:|:------------------------:|:------------------------:|:-----------------------:|:------------------------:|:------------------:|:------------------------------:|:-----------------------:|:-----------------------:|:-----------------------:|:--------------------------:|:-------------------------:|
2075
+ | -1 | -1 | - | - | 0.5920 | 0.2869 | 0.6003 | 0.4930 | 0.5785 | 0.3370 | 0.6392 | 0.5183 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2076
+ | 0.0646 | 100 | 0.3598 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2077
+ | 0.1293 | 200 | 0.3648 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2078
+ | 0.1939 | 300 | 0.3272 | 0.3362 | 0.5728 | 0.2771 | 0.5552 | 0.4684 | 0.5932 | 0.3225 | 0.6162 | 0.5107 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2079
+ | 0.2586 | 400 | 0.3534 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2080
+ | 0.3232 | 500 | 0.3423 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2081
+ | **0.3878** | **600** | **0.3601** | **0.3204** | **0.5672** | **0.2679** | **0.5813** | **0.4721** | **0.611** | **0.3195** | **0.6453** | **0.5253** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** | **-** |
2082
+ | 0.4525 | 700 | 0.3279 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2083
+ | 0.5171 | 800 | 0.3235 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2084
+ | 0.5818 | 900 | 0.3359 | 0.3098 | 0.5840 | 0.2496 | 0.5808 | 0.4715 | 0.6014 | 0.3208 | 0.6265 | 0.5162 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2085
+ | 0.6464 | 1000 | 0.3215 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2086
+ | 0.7111 | 1100 | 0.325 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2087
+ | 0.7757 | 1200 | 0.3394 | 0.3065 | 0.5838 | 0.2449 | 0.5739 | 0.4676 | 0.6022 | 0.3227 | 0.6069 | 0.5106 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2088
+ | 0.8403 | 1300 | 0.331 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2089
+ | 0.9050 | 1400 | 0.3188 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2090
+ | 0.9696 | 1500 | 0.3225 | 0.3034 | 0.5701 | 0.2706 | 0.5760 | 0.4722 | 0.6020 | 0.3186 | 0.6113 | 0.5107 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
2091
+ | -1 | -1 | - | - | - | - | - | - | - | - | - | - | 0.3182 | 0.6035 | 0.8475 | 0.4422 | 0.8076 | 0.6106 | 0.3188 | 0.6459 | 0.9508 | 0.4004 | 0.6565 | 0.7238 | 0.4824 | 0.6006 |
2092
+
2093
+ * The bold row denotes the saved checkpoint.
2094
+
2095
+ ### Environmental Impact
2096
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
2097
+ - **Energy Consumed**: 0.136 kWh
2098
+ - **Carbon Emitted**: 0.053 kg of CO2
2099
+ - **Hours Used**: 0.398 hours
2100
+
2101
+ ### Training Hardware
2102
+ - **On Cloud**: No
2103
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
2104
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
2105
+ - **RAM Size**: 31.78 GB
2106
+
2107
+ ### Framework Versions
2108
+ - Python: 3.11.6
2109
+ - Sentence Transformers: 4.2.0.dev0
2110
+ - Transformers: 4.49.0
2111
+ - PyTorch: 2.6.0+cu124
2112
+ - Accelerate: 1.5.1
2113
+ - Datasets: 2.21.0
2114
+ - Tokenizers: 0.21.1
2115
+
2116
+ ## Citation
2117
+
2118
+ ### BibTeX
2119
+
2120
+ #### Sentence Transformers
2121
+ ```bibtex
2122
+ @inproceedings{reimers-2019-sentence-bert,
2123
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
2124
+ author = "Reimers, Nils and Gurevych, Iryna",
2125
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
2126
+ month = "11",
2127
+ year = "2019",
2128
+ publisher = "Association for Computational Linguistics",
2129
+ url = "https://arxiv.org/abs/1908.10084",
2130
+ }
2131
+ ```
2132
+
2133
+ #### CSRLoss
2134
+ ```bibtex
2135
+ @misc{wen2025matryoshkarevisitingsparsecoding,
2136
+ title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
2137
+ author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
2138
+ year={2025},
2139
+ eprint={2503.01776},
2140
+ archivePrefix={arXiv},
2141
+ primaryClass={cs.LG},
2142
+ url={https://arxiv.org/abs/2503.01776},
2143
+ }
2144
+ ```
2145
+
2146
+ #### SparseMultipleNegativesRankingLoss
2147
+ ```bibtex
2148
+ @misc{henderson2017efficient,
2149
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
2150
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
2151
+ year={2017},
2152
+ eprint={1705.00652},
2153
+ archivePrefix={arXiv},
2154
+ primaryClass={cs.CL}
2155
+ }
2156
+ ```
2157
+
2158
+ <!--
2159
+ ## Glossary
2160
+
2161
+ *Clearly define terms in order to be accessible across audiences.*
2162
+ -->
2163
+
2164
+ <!--
2165
+ ## Model Card Authors
2166
+
2167
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
2168
+ -->
2169
+
2170
+ <!--
2171
+ ## Model Card Contact
2172
+
2173
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
2174
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mixedbread-ai/mxbai-embed-large-v1",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 24,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.49.0",
23
+ "type_vocab_size": 2,
24
+ "use_cache": false,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.2.0.dev0",
4
+ "transformers": "4.49.0",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: ",
9
+ "passage": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "model_type": "SparseEncoder",
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e86b2a89f7f8933cf7bd90586cdf69d0012140e412818234b234f807e51ee574
3
+ size 1340612432
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_CSRSparsity",
18
+ "type": "sentence_transformers.sparse_encoder.models.CSRSparsity"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff