---
license: cc-by-4.0
datasets:
- timbrooks/instructpix2pix-clip-filtered
language:
- en
---

# Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

### CVPR 2025
[Project Page](https://bolinlai.github.io/projects/InstaManip/) | [Paper](https://arxiv.org/pdf/2412.01027) | [Code](https://github.com/BolinLai/InstaManip)

[Bolin Lai](https://bolinlai.github.io/), [Felix Juefei-Xu](https://xujuefei.com/), [Miao Liu](https://aptx4869lm.github.io/), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Nikhil Mehta](https://hockeybro12.github.io/), [Chenguang Zhu](https://cs.stanford.edu/~cgzhu/), [Zeyi Huang](https://oodbag.github.io/), [James M. Rehg](https://rehg.org/), [Sangmin Lee](https://sites.google.com/view/sangmin-lee), [Ning Zhang](https://n-zhang.github.io/), [Tong Xiao](http://xiaotong.me/)


<img src="https://bolinlai.github.io/projects/InstaManip/figures/teaser.png"/>

This repo is the model weights for our paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation". 

There are four models released in this repo.

- InstaManip-17B-1shot: model trained specifically for 1-shot image manipulation.

- InstaManip-17B-2shot: model trained specifically for 2-shot image manipulation.

- InstaManip-17B-3shot: model trained specifically for 3-shot image manipulation.

- InstaManip-17B-dynamic: model trained for arbitrary amount of exemplar image pairs.

Please refer to the code on [github](https://github.com/BolinLai/InstaManip) for detailed instructions on how to use it.

If you find our paper helpful to your work, please cite with this BibTex.

```BibTex
@article{lai2024unleashing,
  title={Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation},
  author={Lai, Bolin and Juefei-Xu, Felix and Liu, Miao and Dai, Xiaoliang and Mehta, Nikhil and Zhu, Chenguang and Huang, Zeyi and Rehg, James M and Lee, Sangmin and Zhang, Ning and others},
  journal={arXiv preprint arXiv:2412.01027},
  year={2024}
}
```