Microsoft
company
Verified
AI & ML interests
None defined yet.
Recent Activity
NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data.
Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths.
-
microsoft/Phi-3.5-mini-instruct
Text Generation • 4B • Updated • 219k • • 885 -
microsoft/Phi-3.5-MoE-instruct
Text Generation • 42B • Updated • 36.5k • 558 -
microsoft/Phi-3.5-vision-instruct
Image-Text-to-Text • 4B • Updated • 985k • 694 -
microsoft/Phi-3-mini-4k-instruct
Text Generation • 4B • Updated • 384k • • 1.23k
Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968)
MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team.
The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.
The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images.
-
microsoft/table-transformer-detection
Object Detection • 0.0B • Updated • 3.89M • 367 -
microsoft/table-transformer-structure-recognition
Object Detection • 0.0B • Updated • 1.28M • 195 -
microsoft/table-transformer-structure-recognition-v1.1-all
Object Detection • 0.0B • Updated • 1.01M • 74 -
microsoft/table-transformer-structure-recognition-v1.1-fin
Object Detection • 0.0B • Updated • 296 • 1
Models for biomedical research applications, such as radiology report generation and biomedical language understanding.
UDOP is a general multimodal model for document AI
-
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11 -
microsoft/udop-large
Image-Text-to-Text • 0.7B • Updated • 2.76k • 116 -
microsoft/udop-large-512
Image-Text-to-Text • 0.7B • Updated • 188 • 5 -
microsoft/udop-large-512-300k
Image-Text-to-Text • 0.7B • Updated • 77 • 32
-
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 94 -
microsoft/Florence-2-large
Image-Text-to-Text • Updated • 956k • 1.59k -
microsoft/Florence-2-base
Image-Text-to-Text • Updated • 606k • 280 -
microsoft/Florence-2-large-ft
Image-Text-to-Text • Updated • 45.5k • 358
Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them.
A collection of SMLs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717
Phi-4 family of small language, multi-modal and reasoning models.
-
microsoft/Phi-4-mini-flash-reasoning
Text Generation • 4B • Updated • 5 • 21 -
microsoft/Phi-4-mini-reasoning
Text Generation • 4B • Updated • 29.4k • 185 -
microsoft/Phi-4-reasoning
Text Generation • 15B • Updated • 10.8k • 193 -
microsoft/Phi-4-reasoning-plus
Text Generation • 15B • Updated • 13.7k • 303
Phi-1 family of small language models.
🔥BitNet family of large language models (1-bit LLMs).
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 9.92k • 1.12k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 8.14k • 32 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 11.1k • 186 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74
LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever.
-
microsoft/LLM2CLIP-EVA02-L-14-336
Zero-Shot Image Classification • Updated • 543 • 59 -
microsoft/LLM2CLIP-Openai-L-14-336
Zero-Shot Classification • 0.6B • Updated • 1.32k • 43 -
microsoft/LLM2CLIP-EVA02-B-16
Updated • 14 • 10 -
microsoft/LLM2CLIP-Openai-B-16
Zero-Shot Classification • 0.4B • Updated • 495 • 18
TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification.
-
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Paper • 2107.07653 • Published • 2 -
microsoft/tapex-large-finetuned-wtq
Table Question Answering • 0.4B • Updated • 2.74k • • 76 -
microsoft/tapex-base-finetuned-wikisql
Table Question Answering • Updated • 1.54k • • 18 -
microsoft/tapex-large-sql-execution
Table Question Answering • 0.4B • Updated • 195 • • 17
The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA.
The Orca family of LMs developed by Microsoft.
GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.
Industrial Foundation Models
A collection of SMLs based on Phi3.5-mini-instruct adapted to clinical natural language processing tasks: https://arxiv.org/abs/2505.10717
NextCoder family of code-editing LMs developed with Selective Knowledge Transfer and its training data.
Phi-4 family of small language, multi-modal and reasoning models.
-
microsoft/Phi-4-mini-flash-reasoning
Text Generation • 4B • Updated • 5 • 21 -
microsoft/Phi-4-mini-reasoning
Text Generation • 4B • Updated • 29.4k • 185 -
microsoft/Phi-4-reasoning
Text Generation • 15B • Updated • 10.8k • 193 -
microsoft/Phi-4-reasoning-plus
Text Generation • 15B • Updated • 13.7k • 303
Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths.
-
microsoft/Phi-3.5-mini-instruct
Text Generation • 4B • Updated • 219k • • 885 -
microsoft/Phi-3.5-MoE-instruct
Text Generation • 42B • Updated • 36.5k • 558 -
microsoft/Phi-3.5-vision-instruct
Image-Text-to-Text • 4B • Updated • 985k • 694 -
microsoft/Phi-3-mini-4k-instruct
Text Generation • 4B • Updated • 384k • • 1.23k
Phi-1 family of small language models.
Artifacts for the paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968)
🔥BitNet family of large language models (1-bit LLMs).
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 9.92k • 1.12k -
microsoft/bitnet-b1.58-2B-4T-bf16
Text Generation • 2B • Updated • 8.14k • 32 -
microsoft/bitnet-b1.58-2B-4T-gguf
Text Generation • 2B • Updated • 11.1k • 186 -
BitNet b1.58 2B4T Technical Report
Paper • 2504.12285 • Published • 74
MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team.
LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever.
-
microsoft/LLM2CLIP-EVA02-L-14-336
Zero-Shot Image Classification • Updated • 543 • 59 -
microsoft/LLM2CLIP-Openai-L-14-336
Zero-Shot Classification • 0.6B • Updated • 1.32k • 43 -
microsoft/LLM2CLIP-EVA02-B-16
Updated • 14 • 10 -
microsoft/LLM2CLIP-Openai-B-16
Zero-Shot Classification • 0.4B • Updated • 495 • 18
The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks.
TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification.
-
TAPEX: Table Pre-training via Learning a Neural SQL Executor
Paper • 2107.07653 • Published • 2 -
microsoft/tapex-large-finetuned-wtq
Table Question Answering • 0.4B • Updated • 2.74k • • 76 -
microsoft/tapex-base-finetuned-wikisql
Table Question Answering • Updated • 1.54k • • 18 -
microsoft/tapex-large-sql-execution
Table Question Answering • 0.4B • Updated • 195 • • 17
The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images.
-
microsoft/table-transformer-detection
Object Detection • 0.0B • Updated • 3.89M • 367 -
microsoft/table-transformer-structure-recognition
Object Detection • 0.0B • Updated • 1.28M • 195 -
microsoft/table-transformer-structure-recognition-v1.1-all
Object Detection • 0.0B • Updated • 1.01M • 74 -
microsoft/table-transformer-structure-recognition-v1.1-fin
Object Detection • 0.0B • Updated • 296 • 1
The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA.
Models for biomedical research applications, such as radiology report generation and biomedical language understanding.
The Orca family of LMs developed by Microsoft.
UDOP is a general multimodal model for document AI
-
Unifying Vision, Text, and Layout for Universal Document Processing
Paper • 2212.02623 • Published • 11 -
microsoft/udop-large
Image-Text-to-Text • 0.7B • Updated • 2.76k • 116 -
microsoft/udop-large-512
Image-Text-to-Text • 0.7B • Updated • 188 • 5 -
microsoft/udop-large-512-300k
Image-Text-to-Text • 0.7B • Updated • 77 • 32
GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.
-
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Paper • 2311.06242 • Published • 94 -
microsoft/Florence-2-large
Image-Text-to-Text • Updated • 956k • 1.59k -
microsoft/Florence-2-base
Image-Text-to-Text • Updated • 606k • 280 -
microsoft/Florence-2-large-ft
Image-Text-to-Text • Updated • 45.5k • 358
Industrial Foundation Models
Locomotion policies for hundreds of simulated humanoid locomotion clips and demonstration data for training them.