Safetensors
English
mllama
remote-sensing
AdaptLLM commited on
Commit
078f44c
·
verified ·
1 Parent(s): 10494bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -3
README.md CHANGED
@@ -1,3 +1,100 @@
1
- ---
2
- license: llama3.2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ datasets:
4
+ - AdaptLLM/remote-sensing-visual-instructions
5
+ language:
6
+ - en
7
+ base_model:
8
+ - meta-llama/Llama-3.2-11B-Vision-Instruct
9
+ tags:
10
+ - remote-sensing
11
+ ---
12
+ # Adapting Multimodal Large Language Models to Domains via Post-Training
13
+
14
+ This repos contains the **biomedicine MLLM developed from Llama-3.2-11B-Vision-Instruct** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930). The correspoding training dataset is in [medicine-visual-instructions](https://huggingface.co/datasets/AdaptLLM/medicine-visual-instructions).
15
+
16
+ The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains/edit/main/README.md)
17
+
18
+ ## 1. To Chat with AdaMLLM
19
+
20
+ Our model architecture aligns with the base model: Llama-3.2-Vision-Instruct. We provide a usage example below, and you may refer to the official [Llama-3.2-Vision-Instruct Repository](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) for more advanced usage instructions,
21
+
22
+ **Note:** For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
23
+
24
+ <details>
25
+ <summary> Click to expand </summary>
26
+
27
+ Starting with transformers >= 4.45.0 onward, you can run inference using conversational messages that may include an image you can query about.
28
+
29
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
30
+
31
+ ```bash
32
+ import requests
33
+ import torch
34
+ from PIL import Image
35
+ from transformers import MllamaForConditionalGeneration, AutoProcessor
36
+
37
+ model_id = "AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct"
38
+
39
+ model = MllamaForConditionalGeneration.from_pretrained(
40
+ model_id,
41
+ torch_dtype=torch.bfloat16,
42
+ device_map="auto",
43
+ )
44
+ processor = AutoProcessor.from_pretrained(model_id)
45
+
46
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
47
+ image = Image.open(requests.get(url, stream=True).raw)
48
+
49
+ # NOTE: For AdaMLLM, always place the image at the beginning of the input instruction in the messages.
50
+ messages = [
51
+ {"role": "user", "content": [
52
+ {"type": "image"},
53
+ {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
54
+ ]}
55
+ ]
56
+ input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
57
+ inputs = processor(
58
+ image,
59
+ input_text,
60
+ add_special_tokens=False,
61
+ return_tensors="pt"
62
+ ).to(model.device)
63
+
64
+ output = model.generate(**inputs, max_new_tokens=30)
65
+ print(processor.decode(output[0]))
66
+ ```
67
+ </details>
68
+
69
+ ## 2. To Evaluate Any MLLM on Domain-Specific Benchmarks
70
+
71
+ See [remote-sensing-VQA-benchmark](https://huggingface.co/datasets/AdaptLLM/remote-sensing-VQA-benchmark) to reproduce our results and evalaute more MLLMs on the domain-specific benchmarks.
72
+
73
+ ## 3. To Reproduce this Domain-Adapted MLLM
74
+
75
+ See [Post-Train Guide](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md) to adapt MLLMs to domains.
76
+
77
+ ## Citation
78
+ If you find our work helpful, please cite us.
79
+
80
+ [AdaMLLM](https://huggingface.co/papers/2411.19930)
81
+ ```bibtex
82
+ @article{adamllm,
83
+ title={On Domain-Specific Post-Training for Multimodal Large Language Models},
84
+ author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang},
85
+ journal={arXiv preprint arXiv:2411.19930},
86
+ year={2024}
87
+ }
88
+ ```
89
+
90
+ [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024)
91
+ ```bibtex
92
+ @inproceedings{
93
+ cheng2024adapting,
94
+ title={Adapting Large Language Models via Reading Comprehension},
95
+ author={Daixuan Cheng and Shaohan Huang and Furu Wei},
96
+ booktitle={The Twelfth International Conference on Learning Representations},
97
+ year={2024},
98
+ url={https://openreview.net/forum?id=y886UXPEZ0}
99
+ }
100
+ ```