Cohere embed-english-light-v3.0

This repository contains the tokenizer for the Cohere embed-english-light-v3.0 model. See our blogpost Cohere Embed V3 for more details on this model.

You can use the embedding model either via the Cohere API, AWS SageMaker or in your private deployments.

Usage Cohere API

The following code snippet shows the usage of the Cohere API. Install the cohere SDK via:

pip install -U cohere

Get your free API key on: www.cohere.com

# This snippet shows and example how to use the Cohere Embed V3 models for semantic search.
# Make sure to have the Cohere SDK in at least v4.30 install: pip install -U cohere 
# Get your API key from: www.cohere.com
import cohere
import numpy as np

cohere_key = "{YOUR_COHERE_API_KEY}"   #Get your API key from www.cohere.com
co = cohere.Client(cohere_key)

docs = ["The capital of France is Paris",
        "PyTorch is a machine learning framework based on the Torch library.",
        "The average cat lifespan is between 13-17 years"]


#Encode your documents with input type 'search_document'
doc_emb = co.embed(docs, input_type="search_document", model="embed-english-light-v3.0").embeddings
doc_emb = np.asarray(doc_emb)


#Encode your query with input type 'search_query'
query = "What is Pytorch"
query_emb = co.embed([query], input_type="search_query", model="embed-english-light-v3.0").embeddings
query_emb = np.asarray(query_emb)
query_emb.shape

#Compute the dot product between query embedding and document embedding
scores = np.dot(query_emb, doc_emb.T)[0]

#Find the highest scores
max_idx = np.argsort(-scores)

print(f"Query: {query}")
for idx in max_idx:
  print(f"Score: {scores[idx]:.2f}")
  print(docs[idx])
  print("--------")

Usage AWS SageMaker

The embedding model can be privately deployed in your AWS Cloud using our AWS SageMaker marketplace offering. It runs privately in your VPC, with latencies as low as 5ms for query encoding.

Usage AWS Bedrock

Soon the model will also be available via AWS Bedrock. Stay tuned

Private Deployment

You want to run the model on your own hardware? Contact Sales to learn more.

Supported Languages

This model was trained on nearly 1B English training pairs.

Evaluation results can be found in the Embed V3.0 Benchmark Results spreadsheet.

Downloads last month: 26

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using Cohere/Cohere-embed-english-light-v3.0 13

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

78.627
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

43.501
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

73.124
accuracy on MTEB AmazonPolarityClassification
test set self-reported

94.795
ap on MTEB AmazonPolarityClassification
test set self-reported

92.142
f1 on MTEB AmazonPolarityClassification
test set self-reported

94.793
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

51.016
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

48.927
ndcg_at_10 on MTEB ArguAna
test set self-reported

50.806
v_measure on MTEB ArxivClusteringP2P
test set self-reported

46.193

View on Papers With Code