Issue with classification

#2
by woland2k - opened

I used the app you created for Amazon category classification to classify texts to categories. The model seem to struggle with some requests

  1. smartphone charging cable USB-C -> Electronics > Audio Headphones > Over-Ear Headphones (intead of Cell Phones & Accessories > Cell Phone Accessories > Cell Phone Chargers & Power Adapters > Cell Phone Wireless Chargers).

  2. yoga mat non-slip exercise equipment -> Electronics > Audio Headphones > Over-Ear Headphones (Sports & Outdoors > Sports & Fitness > Exercise & Fitness Equipment > Yoga Equipment > Yoga Mats)

On the other hand simple text transformer (sentence-transformers/all-MiniLM-L6-v2) was able to classify those correctly.

In the first example, USB-C seem to be a culprit, removing it gets better results but only for marqo-ecommerce-embeddings-B model.

Any insights on what is going on?

Marqo org

thanks @woland2k - how are you doing the classification exactly? using the text encoder on its own will have varied performance as it was not trained for that purpose. instead i would recommend using the images along with the text. you can also get good classification if you also have a taxonomy to add additional constraints. see below
https://huggingface.co/spaces/Marqo/e-commerce-taxonomy-mapping

@Jesse-marqo what is the recommended way for product classification? I'm looking for a way to utilize both text and image information with my own fashion taxonomy. Marqo-FashionCLIP and (siglip version) seems to accept image only. Am I wrong?

Marqo org

@pySilver The models can take in both text and images. See the example here -> https://huggingface.co/Marqo/marqo-fashionSigLIP
The model is trained to be able to combine the text and images into a single embedding through a weighted average.
The best way we found for the mapping was doing what is described here https://huggingface.co/spaces/Marqo/e-commerce-taxonomy-mapping/blob/main/app.py
It uses the hierarchy to enforce a consistent path. That implementation uses images only but if you have a title as well then you would just need to update this section
https://huggingface.co/spaces/Marqo/e-commerce-taxonomy-mapping/blob/main/app.py#L72
to also do inference on the text component and then create a combined embedding
something like below (you will still need to pass through the text etc)

text_tensor = tokenizer([text])
with torch.no_grad(), torch.cuda.amp.autocast():
        image_embedding = model.encode_image(image_tensor, normalize=True)
        text_embedding = model.encode_text(text_tensor, normalize=True)
        base_embedding = 0.9*image_embedding + 0.1*text_embedding

@Jesse-marqo Thank you for such a detailed explanation. I’m going to try it right away.

As I pointed out in another thread there is something wrong with either open clip or torch - sample scripts provided on model cards simply stopped working on Apple silicon some time ago. I’ll test it on Linux machine with cuda.

Btw what’s the big difference between Marqo/marqo-ecommerce-embeddings-L and Marqo/marqo-fashionSigLIP (that is probably silly question)

Marqo org

thanks. we don't use them on MPS, cuda only at this stage. the difference was the 1) training data and 2) training method. FashionSigLIP was narrower data (fashion only) and had a lot more data per sample compared to the ecommerce. the fashion siglip training is described here -> https://www.marqo.ai/blog/search-model-for-fashion

@Jesse-marqo I've managed to make it all work on MPS. I've cloned original app and then:

  • modified taxonomy file to my simplified list of categories
  • added marqo-fashionSigLIP support
  • regenerated vector caches (with new model including)
  • added support for optional product title
  • changed cosin similarity score to softmax with some adjustments
  • added negative categories to catch OOD

It works great most of the time! Thank you for your help!

Sign up or log in to comment