Update README.md
Browse files
README.md
CHANGED
|
@@ -14,45 +14,16 @@ pipeline_tag: image-to-text
|
|
| 14 |
|
| 15 |
This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
|
| 16 |
|
| 17 |
-
## Model Details
|
| 18 |
-
|
| 19 |
-
- **Base Model**: qwen2.5-vl
|
| 20 |
-
- **Model Type**: Vision-Language Model
|
| 21 |
-
- **Task**: Multimodal reasoning and visual question answering
|
| 22 |
-
- **Fine-tuning**: Custom training on reasoning datasets
|
| 23 |
-
|
| 24 |
-
## Model Files
|
| 25 |
-
|
| 26 |
-
This repository contains only the essential files for inference:
|
| 27 |
-
|
| 28 |
-
### Core Model Files
|
| 29 |
-
- `config.json`: Model configuration
|
| 30 |
-
- `generation_config.json`: Text generation configuration
|
| 31 |
-
- `model-*.safetensors`: Model weights in SafeTensors format
|
| 32 |
-
- `model.safetensors.index.json`: Model weights index
|
| 33 |
-
|
| 34 |
-
### Tokenizer Files
|
| 35 |
-
- `tokenizer.json`: Tokenizer configuration
|
| 36 |
-
- `tokenizer_config.json`: Tokenizer settings
|
| 37 |
-
- `vocab.json`: Vocabulary file
|
| 38 |
-
- `merges.txt`: BPE merge rules
|
| 39 |
-
- `added_tokens.json`: Additional tokens
|
| 40 |
-
- `special_tokens_map.json`: Special token mappings
|
| 41 |
-
|
| 42 |
-
### Vision Processing
|
| 43 |
-
- `preprocessor_config.json`: Image preprocessing configuration
|
| 44 |
-
- `chat_template.json`: Chat template for conversations
|
| 45 |
-
|
| 46 |
## Usage
|
| 47 |
|
| 48 |
```python
|
| 49 |
-
from transformers import
|
| 50 |
import torch
|
| 51 |
|
| 52 |
model_id = "ChaoHuangCS/DRIFT-VL-7B"
|
| 53 |
|
| 54 |
# Load model and processor
|
| 55 |
-
model =
|
| 56 |
model_id,
|
| 57 |
torch_dtype=torch.float16,
|
| 58 |
device_map="auto",
|
|
@@ -88,6 +59,7 @@ print(response)
|
|
| 88 |
|
| 89 |
This model was fine-tuned using:
|
| 90 |
- **Base Model**: Qwen2.5-VL
|
|
|
|
| 91 |
- **Training Method**: Custom reasoning-focused fine-tuning
|
| 92 |
- **Dataset**: Multimodal reasoning datasets
|
| 93 |
- **Architecture**: Preserves original Qwen2.5-VL architecture
|
|
@@ -102,7 +74,7 @@ The model has been optimized for:
|
|
| 102 |
|
| 103 |
## Citation
|
| 104 |
|
| 105 |
-
If you use this model, please cite
|
| 106 |
|
| 107 |
## License
|
| 108 |
|
|
|
|
| 14 |
|
| 15 |
This is a fine-tuned version of Qwen2.5-VL for enhanced reasoning capabilities, specifically optimized for multimodal reasoning tasks.
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
## Usage
|
| 18 |
|
| 19 |
```python
|
| 20 |
+
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
|
| 21 |
import torch
|
| 22 |
|
| 23 |
model_id = "ChaoHuangCS/DRIFT-VL-7B"
|
| 24 |
|
| 25 |
# Load model and processor
|
| 26 |
+
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
|
| 27 |
model_id,
|
| 28 |
torch_dtype=torch.float16,
|
| 29 |
device_map="auto",
|
|
|
|
| 59 |
|
| 60 |
This model was fine-tuned using:
|
| 61 |
- **Base Model**: Qwen2.5-VL
|
| 62 |
+
- **Merged Model**: DeepSeek-R1
|
| 63 |
- **Training Method**: Custom reasoning-focused fine-tuning
|
| 64 |
- **Dataset**: Multimodal reasoning datasets
|
| 65 |
- **Architecture**: Preserves original Qwen2.5-VL architecture
|
|
|
|
| 74 |
|
| 75 |
## Citation
|
| 76 |
|
| 77 |
+
If you use this model, please cite our paper.
|
| 78 |
|
| 79 |
## License
|
| 80 |
|