Lightricks
/

LTX-Video

@@ -6,7 +6,6 @@ pinned: true
 language:
 - en
 license: other
-pipeline_tag: text-to-video
 library_name: diffusers
 ---
@@ -14,7 +13,6 @@ library_name: diffusers
 This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
 LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
-We provide a model for both text-to-video as well as image+text-to-video usecases
 <img src="./media/trailer.gif" alt="trailer" width="512">
@@ -40,7 +38,7 @@ We provide a model for both text-to-video as well as image+text-to-video usecase
 ## Model Details
 - **Developed by:** Lightricks
-- **Model type:** Diffusion-based text-to-video and image-to-video generation model
 - **Language(s):** English
@@ -73,7 +71,7 @@ The model is accessible right away via the following links:
 - [LTX-Studio image-to-video (13B distilled)](https://app.ltx.studio/motion-workspace?videoModel=ltxv)
 - [Fal.ai image-to-video (13B full)](https://fal.ai/models/fal-ai/ltx-video-13b-dev/image-to-video)
 - [Fal.ai image-to-video (13B distilled)](https://fal.ai/models/fal-ai/ltx-video-13b-distilled/image-to-video)
-- [Replicate text-to-video and image-to-video](https://replicate.com/lightricks/ltx-video)
 ### ComfyUI
 To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
@@ -116,7 +114,7 @@ python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_OR_VIDEO_
 ### Diffusers 🧨
-LTX Video is compatible with the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index). It supports both text-to-video and image-to-video generation.
 Make sure you install `diffusers` before trying out the examples below.
@@ -202,74 +200,6 @@ video = [frame.resize((expected_width, expected_height)) for frame in video]
 export_to_video(video, "output.mp4", fps=24)
 ```
-### text-to-video:
-```py
-import torch
-from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline
-from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
-from diffusers.utils import export_to_video
-pipe = LTXConditionPipeline.from_pretrained("Lightricks/LTX-Video-0.9.7-dev", torch_dtype=torch.bfloat16)
-pipe_upsample = LTXLatentUpsamplePipeline.from_pretrained("Lightricks/ltxv-spatial-upscaler-0.9.7", vae=pipe.vae, torch_dtype=torch.bfloat16)
-pipe.to("cuda")
-pipe_upsample.to("cuda")
-pipe.vae.enable_tiling()
-def round_to_nearest_resolution_acceptable_by_vae(height, width):
-    height = height - (height % pipe.vae_spatial_compression_ratio)
-    width = width - (width % pipe.vae_spatial_compression_ratio)
-    return height, width
-prompt = "The video depicts a winding mountain road covered in snow, with a single vehicle traveling along it. The road is flanked by steep, rocky cliffs and sparse vegetation. The landscape is characterized by rugged terrain and a river visible in the distance. The scene captures the solitude and beauty of a winter drive through a mountainous region."
-negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
-expected_height, expected_width = 512, 704
-downscale_factor = 2 / 3
-num_frames = 121
-# Part 1. Generate video at smaller resolution
-downscaled_height, downscaled_width = int(expected_height * downscale_factor), int(expected_width * downscale_factor)
-downscaled_height, downscaled_width = round_to_nearest_resolution_acceptable_by_vae(downscaled_height, downscaled_width)
-latents = pipe(
-    conditions=None,
-    prompt=prompt,
-    negative_prompt=negative_prompt,
-    width=downscaled_width,
-    height=downscaled_height,
-    num_frames=num_frames,
-    num_inference_steps=30,
-    generator=torch.Generator().manual_seed(0),
-    output_type="latent",
-).frames
-# Part 2. Upscale generated video using latent upsampler with fewer inference steps
-# The available latent upsampler upscales the height/width by 2x
-upscaled_height, upscaled_width = downscaled_height * 2, downscaled_width * 2
-upscaled_latents = pipe_upsample(
-    latents=latents,
-    output_type="latent"
-).frames
-# Part 3. Denoise the upscaled video with few steps to improve texture (optional, but recommended)
-video = pipe(
-    prompt=prompt,
-    negative_prompt=negative_prompt,
-    width=upscaled_width,
-    height=upscaled_height,
-    num_frames=num_frames,
-    denoise_strength=0.4,  # Effectively, 4 inference steps out of 10
-    num_inference_steps=10,
-    latents=upscaled_latents,
-    decode_timestep=0.05,
-    image_cond_noise_scale=0.025,
-    generator=torch.Generator().manual_seed(0),
-    output_type="pil",
-).frames[0]
-# Part 4. Downscale the video to the expected resolution
-video = [frame.resize((expected_width, expected_height)) for frame in video]
-export_to_video(video, "output.mp4", fps=24)
-```
 ### For video-to-video:

 language:
 - en
 license: other
 library_name: diffusers
 ---
 This model card focuses on the model associated with the LTX-Video model, codebase available [here](https://github.com/Lightricks/LTX-Video).
 LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 30 FPS videos at a 1216×704 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content.
 <img src="./media/trailer.gif" alt="trailer" width="512">
 ## Model Details
 - **Developed by:** Lightricks
+- **Model type:** Diffusion-based image-to-video generation model
 - **Language(s):** English
 - [LTX-Studio image-to-video (13B distilled)](https://app.ltx.studio/motion-workspace?videoModel=ltxv)
 - [Fal.ai image-to-video (13B full)](https://fal.ai/models/fal-ai/ltx-video-13b-dev/image-to-video)
 - [Fal.ai image-to-video (13B distilled)](https://fal.ai/models/fal-ai/ltx-video-13b-distilled/image-to-video)
+- [Replicate image-to-video](https://replicate.com/lightricks/ltx-video)
 ### ComfyUI
 To use our model with ComfyUI, please follow the instructions at a dedicated [ComfyUI repo](https://github.com/Lightricks/ComfyUI-LTXVideo/).
 ### Diffusers 🧨
+LTX Video is compatible with the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for image-to-video generation.
 Make sure you install `diffusers` before trying out the examples below.
 export_to_video(video, "output.mp4", fps=24)
 ```
 ### For video-to-video: