AI & ML interests
Google ❤️ Open Source AI
Recent Activity
View all activity
Articles
Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications.
Collection of open models to accelerate the development of therapeutics.
-
50
Compare Siglip1 Siglip2
🚀Compare SigLIP1 and SigLIP2 on zero shot classification
-
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper • 2502.14786 • Published • 151 -
google/siglip2-base-patch16-224
Zero-Shot Image Classification • 0.4B • Updated • 585k • 71 -
google/siglip2-base-patch16-256
Zero-Shot Image Classification • 0.4B • Updated • 58.4k • 6
Vision-Language Models available in multiple 3B, 10B and 28B variants.
-
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133 -
google/paligemma2-3b-pt-224
Image-Text-to-Text • 3B • Updated • 1.25M • 158 -
google/paligemma2-3b-pt-448
Image-Text-to-Text • 3B • Updated • 3.51k • 46 -
google/paligemma2-3b-pt-896
Image-Text-to-Text • 3B • Updated • 2.33k • 22
A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/)
Groups the Gemma models released by the Google team.
A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B.
The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".
The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling
The MT5 release follows the T5 family, but is pretrained on multilingual data. The update UMT5 models are pretrained on an updated corpus.
This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts.
Datasets released in "IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs" (https://arxiv.org/abs/2404.16816)
A series of pioneering open models that help ground LLMs in real-world data through Data Commons.
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
-
google/timesfm-1.0-200m
Time Series Forecasting • Updated • 357 • 771 -
google/timesfm-1.0-200m-pytorch
Time Series Forecasting • Updated • 2.96k • 29 -
google/timesfm-2.0-500m-jax
Time Series Forecasting • Updated • 205 • 16 -
google/timesfm-2.0-500m-pytorch
Time Series Forecasting • 0.5B • Updated • 9.73k • 226
Collection of concept apps built built with MedGemma models to inspire the community.
VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks.
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
google/videoprism-base-f16r288
Video Classification • Updated • 82.6k • 88 -
google/videoprism-large-f8r288
Video Classification • Updated • 246 • 17 -
google/videoprism-lvt-base-f16r288
Video Classification • Updated • 30.3k • 9
Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def`
-
33
Path Foundation Demo
🔬Browse pathology images for analysis
-
20
CXR Foundation Demo
🩻Demo usage of the CXR Foundation model embeddings
-
201
MedGemma - Radiology Explainer Demo
🩺Radiology Image & Report Explainer Demo. Built with MedGemma
-
136
Appoint Ready - MedGemma Demo
📋Simulated Pre-visit Intake Demo built using MedGemma
Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory
-
google/gemma-3-4b-it-qat-q4_0-gguf
Image-Text-to-Text • 4B • Updated • 3.23k • 202 -
google/gemma-3-4b-pt-qat-q4_0-gguf
Image-Text-to-Text • 4B • Updated • 100 • 23 -
google/gemma-3-1b-it-qat-q4_0-gguf
Text Generation • 1.0B • Updated • 1.58k • 89 -
google/gemma-3-1b-pt-qat-q4_0-gguf
Text Generation • 1.0B • Updated • 69 • 12
ShieldGemma is a family of models for text and image content moderation.
A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/)
Groups models released for use in health AI by Google. Read more about HAI-DEF at https://developers.google.com/health-ai-developer-foundations
Pretrained and mix checkpoints for PaliGemma
The 2.6B parameter version of Gemma 2.
A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications.
Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English.
This collection regroups the ELECTRA models released by the Google team.
The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1
-
google-t5/t5-base
Translation • 0.2B • Updated • 1.42M • • 754 -
google-t5/t5-small
Translation • 60.5M • Updated • 3.63M • • 498 -
google-t5/t5-large
Translation • 0.7B • Updated • 246k • • 223 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194).
Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343
-
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.01M • 611 -
google/siglip-so400m-patch14-224
Zero-Shot Image Classification • 0.9B • Updated • 45.9k • 54 -
google/siglip-so400m-patch16-256-i18n
Zero-Shot Image Classification • 1B • Updated • 738 • 30 -
google/siglip-base-patch16-256-multilingual
Zero-Shot Image Classification • 0.4B • Updated • 14.4k • 50
arXiv: https://arxiv.org/abs/2405.02793
Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data.
A Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language the same level of performance of EN only queries on Gemma 2.
Collection of concept apps built built with MedGemma models to inspire the community.
VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks.
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 37 -
google/videoprism-base-f16r288
Video Classification • Updated • 82.6k • 88 -
google/videoprism-large-f8r288
Video Classification • Updated • 246 • 17 -
google/videoprism-lvt-base-f16r288
Video Classification • Updated • 30.3k • 9
Collection of concept apps built around HAI-DEF open models/libraries to inspire the community. Learn more at http://goo.gle/hai-def`
-
33
Path Foundation Demo
🔬Browse pathology images for analysis
-
20
CXR Foundation Demo
🩻Demo usage of the CXR Foundation model embeddings
-
201
MedGemma - Radiology Explainer Demo
🩺Radiology Image & Report Explainer Demo. Built with MedGemma
-
136
Appoint Ready - MedGemma Demo
📋Simulated Pre-visit Intake Demo built using MedGemma
Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications.
Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory
-
google/gemma-3-4b-it-qat-q4_0-gguf
Image-Text-to-Text • 4B • Updated • 3.23k • 202 -
google/gemma-3-4b-pt-qat-q4_0-gguf
Image-Text-to-Text • 4B • Updated • 100 • 23 -
google/gemma-3-1b-it-qat-q4_0-gguf
Text Generation • 1.0B • Updated • 1.58k • 89 -
google/gemma-3-1b-pt-qat-q4_0-gguf
Text Generation • 1.0B • Updated • 69 • 12
Collection of open models to accelerate the development of therapeutics.
ShieldGemma is a family of models for text and image content moderation.
-
50
Compare Siglip1 Siglip2
🚀Compare SigLIP1 and SigLIP2 on zero shot classification
-
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper • 2502.14786 • Published • 151 -
google/siglip2-base-patch16-224
Zero-Shot Image Classification • 0.4B • Updated • 585k • 71 -
google/siglip2-base-patch16-256
Zero-Shot Image Classification • 0.4B • Updated • 58.4k • 6
Vision-Language Models available in multiple 3B, 10B and 28B variants.
-
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133 -
google/paligemma2-3b-pt-224
Image-Text-to-Text • 3B • Updated • 1.25M • 158 -
google/paligemma2-3b-pt-448
Image-Text-to-Text • 3B • Updated • 3.51k • 46 -
google/paligemma2-3b-pt-896
Image-Text-to-Text • 3B • Updated • 2.33k • 22
A collection of MetricX-23 models (https://aclanthology.org/2023.wmt-1.63/)
A collection of MetricX-24 models (https://aclanthology.org/2024.wmt-1.35/)
Groups models released for use in health AI by Google. Read more about HAI-DEF at https://developers.google.com/health-ai-developer-foundations
Pretrained and mix checkpoints for PaliGemma
The 2.6B parameter version of Gemma 2.
Groups the Gemma models released by the Google team.
A series of safety classifiers, trained on top of Gemma 2, for developers to filter inputs and outputs of their applications.
A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B.
Regroups the original BERT models released by the Google team. Except for the models marked otherwise, the checkpoints support English.
The ALBERT release was done in two steps, over 4 checkpoints of different sizes each time. The first version is noted as "v1", the second as "v2".
This collection regroups the ELECTRA models released by the Google team.
The Flan-T5 covers 4 checkpoints of different sizes each time. It also includes upgrades versions trained using Universal sampling
The original T5 transformer release was done in two steps, the original T5 checkpoints and the improved T5v1
-
google-t5/t5-base
Translation • 0.2B • Updated • 1.42M • • 754 -
google-t5/t5-small
Translation • 60.5M • Updated • 3.63M • • 498 -
google-t5/t5-large
Translation • 0.7B • Updated • 246k • • 223 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 14
The MT5 release follows the T5 family, but is pretrained on multilingual data. The update UMT5 models are pretrained on an updated corpus.
The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194).
This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts.
Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343
-
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.01M • 611 -
google/siglip-so400m-patch14-224
Zero-Shot Image Classification • 0.9B • Updated • 45.9k • 54 -
google/siglip-so400m-patch16-256-i18n
Zero-Shot Image Classification • 1B • Updated • 738 • 30 -
google/siglip-base-patch16-256-multilingual
Zero-Shot Image Classification • 0.4B • Updated • 14.4k • 50
Datasets released in "IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs" (https://arxiv.org/abs/2404.16816)
arXiv: https://arxiv.org/abs/2405.02793
A series of pioneering open models that help ground LLMs in real-world data through Data Commons.
Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data.
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
-
google/timesfm-1.0-200m
Time Series Forecasting • Updated • 357 • 771 -
google/timesfm-1.0-200m-pytorch
Time Series Forecasting • Updated • 2.96k • 29 -
google/timesfm-2.0-500m-jax
Time Series Forecasting • Updated • 205 • 16 -
google/timesfm-2.0-500m-pytorch
Time Series Forecasting • 0.5B • Updated • 9.73k • 226
A Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language the same level of performance of EN only queries on Gemma 2.