Android Models
Collection
LiteRT models that can run on Android
•
20 items
•
Updated
•
44
This model provides HuggingFaceTB/SmolVLM-256M-Instruct model in TFLite format.
You can use this model with Custom Cpp Pipiline or run with python pipeline (see COLAB example below).
Please note that, at the moment, AI Edge Torch VLMS not supported
on MediaPipe LLM Inference API,
for example qwen_vl model,
that was used as reference to write SmolVLM-256M-Instruct convertation scripts.
mkdir cache
bazel run --verbose_failures -c opt //ai_edge_torch/generative/examples/cpp_image:text_generator_main -- \
--tflite_model="/home/dragynir/ai_vlm/ai-edge-torch-smalvlm/ai_edge_torch/generative/examples/smalvlm/models/SmolVLM-256M-Instruct-tflite-single/smalvlm-256m-instruct_q8_ekv2048.tflite" \
--sentencepiece_model="/home/dragynir/ai_vlm/ai-edge-torch-smalvlm/ai_edge_torch/generative/examples/smalvlm/models/SmolVLM-256M-Instruct-tflite/tokenizer.model" \
--start_token="<|im_start|>" --stop_token="<end_of_utterance>" --num_threads=16 \
--prompt="User:<image>What in the image?<end_of_utterance>\nAssistant:" --weight_cache_path="/home/dragynir/llm/ai-edge-torch/ai_edge_torch/generative/examples/cpp/cache/model.xnnpack_cache" \
--use_single_image=true --image_path="/home/dragynir/ai_vlm/car.jpg" --max_generated_tokens=64
To fine-tune SmolVLM on a specific task, you can follow the fine-tuning tutorial.
Than, you can convert model to TFlite using custom smalvlm scripts (see Readme.md).
You can also check the official documentation ai-edge-torch generative.
The model was converted with the following parameters:
python convert_to_tflite.py --quantize="dynamic_int8"\
--checkpoint_path='./models/SmolVLM-256M-Instruct' --output_path="./models/SmolVLM-256M-Instruct-tflite"\
--mask_as_input=True --prefill_seq_lens=256 --kv_cache_max_len=2048
Base model
HuggingFaceTB/SmolLM2-135M