What is the llm used? And how do I download these
Hello, I've read the paper, and to my understanding you add a LORA adaptor + a projection layer on top of a LLM. You've mentioned a few LLMs that a image model has been trained with. However, it's not clear where we can get these LLMs from?
Also while I have your attention, is it safe to say that the numbers quoted in Table 3 is with Mistral-12B model (and not say with one of the smaller models mentioned in Table 4 like Jina BERT).
Hello, the numbers in Table 3 are based on Llama 3-8B. By default, unless otherwise specified, our paper uses Llama 3-8B. We expect to release all the parameters of the text model, adapter, and related components today. Previously, we experienced some delays due to precision issues during the Hugging Face conversion process, but we have quickly resolved them and will soon upload all the parameters you might need. We welcome your suggestions and requests, and we will strive to update the versions to meet your requirements, making it more convenient for everyone to conduct research.