Xenova HF Staff commited on
Commit
d9e73f9
·
verified ·
1 Parent(s): f54c780

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -1
README.md CHANGED
@@ -4,4 +4,91 @@ license: apple-amlr
4
  pipeline_tag: image-text-to-text
5
  tags:
6
  - fastvlm
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  pipeline_tag: image-text-to-text
5
  tags:
6
  - fastvlm
7
+ ---
8
+
9
+
10
+ ## Usage
11
+
12
+ ### Transformers.js
13
+
14
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
15
+ ```bash
16
+ npm i @huggingface/transformers
17
+ ```
18
+
19
+ You can then caption images as follows:
20
+
21
+ ```js
22
+ import {
23
+ AutoProcessor,
24
+ AutoModelForImageTextToText,
25
+ load_image,
26
+ TextStreamer,
27
+ } from "@huggingface/transformers";
28
+
29
+ // Load processor and model
30
+ const model_id = "onnx-community/FastVLM-0.5B-ONNX";
31
+ const processor = await AutoProcessor.from_pretrained(model_id);
32
+ const model = await AutoModelForImageTextToText.from_pretrained(model_id, {
33
+ dtype: {
34
+ embed_tokens: "fp16",
35
+ vision_encoder: "q4",
36
+ decoder_model_merged: "q4",
37
+ },
38
+ });
39
+
40
+ // Prepare prompt
41
+ const messages = [
42
+ {
43
+ role: "user",
44
+ content: "<image>Describe this image in detail.",
45
+ },
46
+ ];
47
+ const prompt = processor.apply_chat_template(messages, {
48
+ add_generation_prompt: true,
49
+ });
50
+
51
+ // Prepare inputs
52
+ const url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg";
53
+ const image = await load_image(url);
54
+ const inputs = await processor(image, prompt, {
55
+ add_special_tokens: false,
56
+ });
57
+
58
+ // Generate output
59
+ const outputs = await model.generate({
60
+ ...inputs,
61
+ max_new_tokens: 512,
62
+ do_sample: false,
63
+ streamer: new TextStreamer(processor.tokenizer, {
64
+ skip_prompt: true,
65
+ skip_special_tokens: false,
66
+ // callback_function: (text) => { /* Do something with the streamed output */ },
67
+ }),
68
+ });
69
+
70
+ // Decode output
71
+ const decoded = processor.batch_decode(
72
+ outputs.slice(null, [inputs.input_ids.dims.at(-1), null]),
73
+ { skip_special_tokens: true },
74
+ );
75
+ console.log(decoded[0]);
76
+ ```
77
+
78
+ <details>
79
+
80
+ <summary>See here for example output</summary>
81
+
82
+ ```
83
+ The image depicts a vibrant and colorful scene featuring a variety of flowers and plants. The main focus is on a striking pink flower with a dark center, which appears to be a type of petunia. The petals are a rich, deep pink, and the flower has a classic, slightly ruffled appearance. The dark center of the flower is a contrasting color, likely a deep purple or black, which adds to the flower's visual appeal.
84
+
85
+ In the background, there are several other flowers and plants, each with their unique colors and shapes. To the left, there is a red flower with a bright, vivid hue, which stands out against the pink flower. The red flower has a more rounded shape and a lighter center, with petals that are a lighter shade of red compared to the pink flower.
86
+
87
+ To the right of the pink flower, there is a plant with red flowers, which are smaller and more densely packed. The red flowers are a deep, rich red color, and they have a more compact shape compared to the pink flower.
88
+
89
+ In the foreground, there is a green plant with a few leaves and a few small flowers. The leaves are a bright green color, and the flowers are a lighter shade of green, with a few petals that are slightly open.
90
+
91
+ Overall, the image is a beautiful representation of a garden or natural setting, with a variety of flowers and plants that are in full bloom. The colors are vibrant and the composition is well-balanced, with the pink flower in the center drawing the viewer's attention.
92
+ ```
93
+
94
+ </details>