---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:713
- loss:TripletLoss
base_model: Alibaba-NLP/gte-base-en-v1.5
widget:
- source_sentence: Canon
sentences:
- Canon
- model
- manufacturer
- Canon GRAY 12.1 MP POWERSHOT - CNDA1100ISGRY
- source_sentence: $1,129.99
sentences:
- model
- Hello Kitty 5.1 Megapixel Digital Camera - 87009
- $219.99
- price
- source_sentence: Nikon
sentences:
- manufacturer
- Fujifilm
- Kodak M530 Green 12MP 3 X Optical - 1570217
- model
- source_sentence: Kodak 9.2 Megapixel Digital Camera - Red - M320
sentences:
- manufacturer
- Panasonic Lumix 12.1 Megapixel Compact Digital Camera - DMC-FH1P
- Kodak
- model
- source_sentence: Kodak
sentences:
- Kodak EasyShare 10 MegaPixel Compact Camera - Red - C142
- manufacturer
- model
- Kodak
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy
- silhouette_cosine
- silhouette_euclidean
model-index:
- name: SentenceTransformer based on Alibaba-NLP/gte-base-en-v1.5
results:
- task:
type: triplet
name: Triplet
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy
value: 1.0
name: Cosine Accuracy
- type: cosine_accuracy
value: 1.0
name: Cosine Accuracy
- task:
type: silhouette
name: Silhouette
dataset:
name: Unknown
type: unknown
metrics:
- type: silhouette_cosine
value: 0.9776584506034851
name: Silhouette Cosine
- type: silhouette_euclidean
value: 0.8698909282684326
name: Silhouette Euclidean
- type: silhouette_cosine
value: 0.9748208522796631
name: Silhouette Cosine
- type: silhouette_euclidean
value: 0.8665136098861694
name: Silhouette Euclidean
---
# SentenceTransformer based on Alibaba-NLP/gte-base-en-v1.5
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Alibaba-NLP/gte-base-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 768 dimensions
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("albertus-sussex/veriscrape-sbert-camera-reference_1_to_verify_9-fold-7")
# Run inference
sentences = [
'Kodak',
'Kodak',
'Kodak EasyShare 10 MegaPixel Compact Camera - Red - C142',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Triplet
* Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:--------------------|:--------|
| **cosine_accuracy** | **1.0** |
#### Silhouette
* Evaluated with veriscrape.training.SilhouetteEvaluator
| Metric | Value |
|:----------------------|:-----------|
| **silhouette_cosine** | **0.9777** |
| silhouette_euclidean | 0.8699 |
#### Triplet
* Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
| Metric | Value |
|:--------------------|:--------|
| **cosine_accuracy** | **1.0** |
#### Silhouette
* Evaluated with veriscrape.training.SilhouetteEvaluator
| Metric | Value |
|:----------------------|:-----------|
| **silhouette_cosine** | **0.9748** |
| silhouette_euclidean | 0.8665 |
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 713 training samples
* Columns: anchor, positive, negative, pos_attr_name, and neg_attr_name
* Approximate statistics based on the first 713 samples:
| | anchor | positive | negative | pos_attr_name | neg_attr_name |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
| type | string | string | string | string | string |
| details |
$177.99 | $244.99 | Sony | price | manufacturer |
| Samsung | VistaQuest | $310.99 | manufacturer | price |
| Polaroid 6 Megapixel Digital Camera - Black - I633 | Sony Cyber-shot DSC-W350 Point & Shoot Digital Camera - 14.1 Megapixel - 2.7" Active Matrix TFT Color LCD - Silver - DSCW350 | $396.99 | model | price |
* Loss: [TripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 80 evaluation samples
* Columns: anchor, positive, negative, pos_attr_name, and neg_attr_name
* Approximate statistics based on the first 80 samples:
| | anchor | positive | negative | pos_attr_name | neg_attr_name |
|:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
| type | string | string | string | string | string |
| details | Kodak EasyShare 10 MegaPixel Compact Camera - Red - C142 | Olympus FE-5010 12MP Digital Camera with 5x Optical Dual Ima - FE-5010_B | $138.99 | model | price |
| $364.99 | $53.99 | Olympus Stylus Tough 3000 Point & Shoot 12 Megapixel Digital Camera - 227615 | price | model |
| RELAUNCH AGGREGATOR OLYMPUS & PANASONIC TTL DIGITAL - SFD926O | Olympus Stylus 5010 14 Megapixel Compact Digital Camera - 227560 | $396.99 | model | price |
* Loss: [TripletLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
```json
{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: epoch
- `per_device_train_batch_size`: 128
- `per_device_eval_batch_size`: 128
- `num_train_epochs`: 5
- `warmup_ratio`: 0.1
#### All Hyperparameters