LTX-2.3を使ってみる

環境：LTX-2.3, ComfyUI 0.16.4

イスラエルのLightricksが先日公開した動画生成AIのLTX-2.3を使ってみました。前世代モデルのLTX-2と比べて、「プロンプトの理解力の向上」、「画像から動画を生成する際の一貫性向上」、「音声品質向上」が盛り込まれている。

正式名	LTX-2.3
公開日	2026年3月6日
開発会社	Lightricks
ライセンス	Apache License 2.0（商用利用可）
モデル規模	220億パラメータ(22B)

使用するGPUはRTX PRO 5000 Blackwell(VRAM 48GB)です。

Image to Videoワークフロー
作例
感想

Image to Videoワークフロー

Image to Videoだけやってみました。ワークフローはComfyUIのテンプレートを使用。

	ファイルと容量	コメント
Checkpoints	ltx-2.3-22b-dev-fp8.safetensors(27.1GB)	拡散モデル本体。画像を描くモデル。FP8精度。
LoRA	ltx-2.3-22b-distilled-lora-384.safetensors(7.08GB)	少ないステップで高速に生成するための加速装置。
Text Encoder	gemma_3_12B_it_fp4_mixed.safetensors(8.79GB)	テキスト理解・プロンプト解釈用LLM（言語モデル）。GoogleのGemma 3。
Latent Upscale Model	ltx-2.3-spatial-upscaler-x2-1.0.safetensors(949MB)	潜在空間（latent space）のまま拡大処理を行うためのモデル。

LTX-2（標準版）が19Bだったので少し大きくなっていますね。

作例

ローラーコースター

ではいつもの猫ちゃんから。

プロンプト

**Scene**: Two tabby cats sit in a blue and yellow roller coaster car against a bright blue sky with wispy clouds, their expressions calm as the track curves upward.  
**Action**: The coaster speeds forward rapidly along the orange track, with motion blur on wheels and rails, then the camera pans to follow the car as it passes by.  
**Camera**: Starts with a dynamic side-tracking shot as the car passes, then smoothly transitions to a low-angle follow shot from behind, keeping the cats centered.

WAN向けのプロンプトだがちゃんと理解してくれているように見える。
動画生成用のプロンプト支援アプリをつくる

1280×720, 241frame, 24fps, 10秒
生成時間は79.29秒でした。この生成時間はWAN2.2に比べるとだいぶ短い。WAN2.2は5秒動画の720Pでだいたい200秒ほどかかるので、体感速度は5倍ぐらいか。

ミニチュア都市

Z-Image Turboで生成したミニチュアのネオン街。

プロンプト

**Scene**: A hyper-realistic miniature diorama of a futuristic Japanese night city, with glowing neon reflections and tilt-shift lens effect, as a train approaches the camera on an elevated track.  
**Action**: The train moves forward toward the camera, headlights illuminating the scene, while miniature cars and pedestrians remain static below.  
**Camera**: The camera tracks alongside the train with a smooth dolly movement, maintaining focus on the front as it advances.

1280×720, 241frame, 24fps

生成時間は89.39秒。WANに比べてカメラの動きがスムーズに見える。

自転車レース

これもZ-Image Turboで生成した自転車レースの画像。

プロンプト

**Scene**: First-person POV on a professional race track, with a tabby cat calmly lying in the front basket, while the rider overtakes a rival cyclist navigating a sharp hairpin turn.  
**Action**: The rider accelerates through the curve, the rival’s bike and cat blur slightly behind, emphasizing speed and overtaking motion.  
**Camera**: Tracking shot, smoothly following the rider’s movement through the turn, with slight camera shake to simulate real motion, shallow depth of field keeping focus on the cat and handlebars.

1280×720, 361frame, 24fps

生成時間は223.18秒。

なぜかバブリング音が出てますね。笑
こんな音がでる自転車があるなら欲しい。

感想

前世代（LTX-2）よりスムーズで綺麗になった感じがします。LTX-2が動画生成AIを始めるきっかけになったのですが、そのあとすぐWAN2.2にハマってそのまま使い続けています。

「画像の一貫性の向上」はその通りで、前世代モデルはイラスト調の画像が時間が経つにつれリアルになっていくということが少なくなかったですが、今回は割と同じ画風が維持されていました。しかし前世代と同じで、カメラが勝手に動きますね。制御の仕方がわからない。アクション向きならいいのですが、静止している風景だと困る。

WANもプロンプトを研究して最適化していったらクオリティが上がったので、LTXも研究する必要がありそうですね。処理時間も短いし、クオリティがよいので乗り換えるかは迷います。

ところでWANは今後アップデートされないのでしょうかね・・。