I got way better results now! Just needed to use the recommended version of transformers
.
I'll edit the main post when I'm ready with the graphs.
Thanks once more time.
I got way better results now! Just needed to use the recommended version of transformers
.
I'll edit the main post when I'm ready with the graphs.
Thanks once more time.
Hey Tom,
First, appreciate your work! Thanks for everything you're doing.
I did use the prompt dict for "intfloat/multilingual-e5-large", like: prompts = {"query": "query: ", "passage": "passage: "} to SentenceTransformer.
For "google/embeddinggemma-300m", I kept the default: model = SentenceTransformer("google/embeddinggemma-300m") and then evaluated with MTEB library, assuming that "MTEB will automatically detect and use these prompts if they are defined in your model's configuration," as written here https://sbert.net/docs/sentence_transformer/usage/mteb_evaluation.html
So in short, I did not add prompts for EmbeddingGemma, but added them to multilingual-e5-large, as per their instructions (didn't have time to check their model config, but I think it's not added by default).
BUT, I ran with transformers==4.55.4, so need to re-run maybe...
sentence-transformers==5.1.0, which is fine I guess.
Thanks!
pip install git+https://github.com/huggingface/transformers@v4.56.0-Embedding-Gemma-preview
pip install sentence-transformers>=5.0.0
try to reduce gpu_memory_utilization to some lower coefficient
Thank you.
Iโm also a big fan of Qwen models. However, in this case, I donโt think they are appropriate because Iโm not entirely confident in their capabilities regarding multilingual contexts. Thatโs why I chose Llama.
Overall, I agree that the Qwen series is excellent for most tasks.
Yeah, the issues with the tables.
For office formats, it's mostly fine. You tried using PDF or images?
I will work on improving this.