jacobitterman commited on
Commit
6208d68
·
verified ·
1 Parent(s): 9cafd79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -73
README.md CHANGED
@@ -6,7 +6,6 @@ pinned: true
6
  language:
7
  - en
8
  license: other
9
- pipeline_tag: text-to-video
10
  library_name: diffusers
11
  ---
12
 
@@ -14,7 +13,6 @@ library_name: diffusers
14
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
15
 
16
  LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
17
- We provide a model for both text-to-video as well as image+text-to-video usecases
18
 
19
  <img src="./media/trailer.gif" alt="trailer" width="512">
20
 
@@ -40,7 +38,7 @@ We provide a model for both text-to-video as well as image+text-to-video usecase
40
 
41
  ## Model Details
42
  - **Developed by:** Lightricks
43
- - **Model type:** Diffusion-based text-to-video and image-to-video generation model
44
  - **Language(s):** English
45
 
46
 
@@ -73,7 +71,7 @@ The model is accessible right away via the following links:
73
  - [LTX-Studio image-to-video (13B distilled)](https://app.ltx.studio/motion-workspace?videoModel=ltxv)
74
  - [Fal.ai image-to-video (13B full)](https://fal.ai/models/fal-ai/ltx-video-13b-dev/image-to-video)
75
  - [Fal.ai image-to-video (13B distilled)](https://fal.ai/models/fal-ai/ltx-video-13b-distilled/image-to-video)
76
- - [Replicate text-to-video and image-to-video](https://replicate.com/lightricks/ltx-video)
77
 
78
  ### ComfyUI
79
  To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
@@ -116,7 +114,7 @@ python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_
116
 
117
  ### Diffusers 🧨
118
 
119
- LTX Video is compatible with the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index). It supports both text-to-video and image-to-video generation.
120
 
121
  Make sure you install `diffusers` before trying out the examples below.
122
 
@@ -202,74 +200,6 @@ video = [frame.resize((expected_width, expected_height)) for frame in video]
202
  export_to_video(video, "output.mp4", fps=24)
203
  ```
204
 
205
- ### text-to-video:
206
- ```py
207
- import torch
208
- from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
209
- from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
210
- from diffusers.utils import export_to_video
211
-
212
- pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.7-dev", torch_dtype=torch.bfloat16)
213
- pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.7", vae=pipe.vae, torch_dtype=torch.bfloat16)
214
- pipe.to("cuda")
215
- pipe_upsample.to("cuda")
216
- pipe.vae.enable_tiling()
217
-
218
- def round_to_nearest_resolution_acceptable_by_vae(height, width):
219
- height = height - (height % pipe.vae_spatial_compression_ratio)
220
- width = width - (width % pipe.vae_spatial_compression_ratio)
221
- return height, width
222
-
223
- prompt = "The video depicts a winding mountain road covered in snow, with a single vehicle traveling along it. The road is flanked by steep, rocky cliffs and sparse vegetation. The landscape is characterized by rugged terrain and a river visible in the distance. The scene captures the solitude and beauty of a winter drive through a mountainous region."
224
- negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
225
- expected_height, expected_width = 512, 704
226
- downscale_factor = 2 / 3
227
- num_frames = 121
228
-
229
- # Part 1. Generate video at smaller resolution
230
- downscaled_height, downscaled_width = int(expected_height * downscale_factor), int(expected_width * downscale_factor)
231
- downscaled_height, downscaled_width = round_to_nearest_resolution_acceptable_by_vae(downscaled_height, downscaled_width)
232
- latents = pipe(
233
- conditions=None,
234
- prompt=prompt,
235
- negative_prompt=negative_prompt,
236
- width=downscaled_width,
237
- height=downscaled_height,
238
- num_frames=num_frames,
239
- num_inference_steps=30,
240
- generator=torch.Generator().manual_seed(0),
241
- output_type="latent",
242
- ).frames
243
-
244
- # Part 2. Upscale generated video using latent upsampler with fewer inference steps
245
- # The available latent upsampler upscales the height/width by 2x
246
- upscaled_height, upscaled_width = downscaled_height * 2, downscaled_width * 2
247
- upscaled_latents = pipe_upsample(
248
- latents=latents,
249
- output_type="latent"
250
- ).frames
251
-
252
- # Part 3. Denoise the upscaled video with few steps to improve texture (optional, but recommended)
253
- video = pipe(
254
- prompt=prompt,
255
- negative_prompt=negative_prompt,
256
- width=upscaled_width,
257
- height=upscaled_height,
258
- num_frames=num_frames,
259
- denoise_strength=0.4, # Effectively, 4 inference steps out of 10
260
- num_inference_steps=10,
261
- latents=upscaled_latents,
262
- decode_timestep=0.05,
263
- image_cond_noise_scale=0.025,
264
- generator=torch.Generator().manual_seed(0),
265
- output_type="pil",
266
- ).frames[0]
267
-
268
- # Part 4. Downscale the video to the expected resolution
269
- video = [frame.resize((expected_width, expected_height)) for frame in video]
270
-
271
- export_to_video(video, "output.mp4", fps=24)
272
- ```
273
 
274
  ### For video-to-video:
275
 
 
6
  language:
7
  - en
8
  license: other
 
9
  library_name: diffusers
10
  ---
11
 
 
13
  This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
14
 
15
  LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
 
16
 
17
  <img src="./media/trailer.gif" alt="trailer" width="512">
18
 
 
38
 
39
  ## Model Details
40
  - **Developed by:** Lightricks
41
+ - **Model type:** Diffusion-based image-to-video generation model
42
  - **Language(s):** English
43
 
44
 
 
71
  - [LTX-Studio image-to-video (13B distilled)](https://app.ltx.studio/motion-workspace?videoModel=ltxv)
72
  - [Fal.ai image-to-video (13B full)](https://fal.ai/models/fal-ai/ltx-video-13b-dev/image-to-video)
73
  - [Fal.ai image-to-video (13B distilled)](https://fal.ai/models/fal-ai/ltx-video-13b-distilled/image-to-video)
74
+ - [Replicate image-to-video](https://replicate.com/lightricks/ltx-video)
75
 
76
  ### ComfyUI
77
  To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
 
114
 
115
  ### Diffusers 🧨
116
 
117
+ LTX Video is compatible with the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for image-to-video generation.
118
 
119
  Make sure you install `diffusers` before trying out the examples below.
120
 
 
200
  export_to_video(video, "output.mp4", fps=24)
201
  ```
202
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
203
 
204
  ### For video-to-video:
205