Semantic similarity
#29
by
ZijieAsus
- opened
I am trying to use this model for multilingual semantic search.
model = SentenceTransformer('intfloat/multilingual-e5-base')
prefix = "query: "
en_emb = model.encode(prefix + "how do i change my google profile photo?", normalize_embeddings=True)
zh_emb = model.encode(prefix + "我如何更改我的Google個人照片?", normalize_embeddings=True)
from sentence_transformers.util import cos_sim
print(cos_sim(en_emb, zh_emb)) # 0.9223
# When the input is a word, it seems to be more obvious.
en_emb = model.encode(prefix + "Apple", normalize_embeddings=True)
jp_ emb = model.encode(prefix + "リンゴ", normalize_embeddings=True)
print(cos_sim(en_emb, jp_emb)) # 0.7541
In the first case, I expected the cosine similarity to be very close to 1.0 (for example, 0.99, 0.98), but the result was 0.9223. Is this within expectations?
or is there a reason for this?
Thanks !