Mochi 1 - AI Video Models Tool

Overview

Mochi 1 is an open-source, state-of-the-art text-to-video diffusion model from Genmo that emphasizes photorealistic motion, strong prompt adherence, and hackability. The released checkpoint implements a 10-billion-parameter Asymmetric Diffusion Transformer (AsymmDiT) and is distributed under a permissive Apache 2.0-style research preview license, enabling researchers and developers to run locally or through hosted APIs. ([github.com](https://github.com/genmoai/mochi?utm_source=openai)) The system pairs AsymmDiT with a custom AsymmVAE video codec that causally compresses video tokens (an approximately 128x compression using an 8x8 spatial and 6x temporal factor into a 12-channel latent), which reduces token bandwidth and enables efficient attention over longer temporal context. Typical published defaults generate short, photorealistic clips at ~480p with multi-GPU or single H100 inference; a single-GPU run requires on the order of 60 GB VRAM in the reference implementation. ([internal.replicate.com](https://internal.replicate.com/genmoai/mochi-1?utm_source=openai)) Genmo and the Replicate-hosted deployment provide an API playground for experimentation; Replicate’s listing notes typical prediction times under a few minutes and an approximate hosted cost per run on H100 hardware. The project explicitly documents limitations: current quality is optimized for photoreal styles (not animation), initial release resolution is 480p, and extreme motion can produce minor warping artifacts. The codebase also includes LoRA training utilities and an inference harness for parallel/context-efficient runs. ([internal.replicate.com](https://internal.replicate.com/genmoai/mochi-1?utm_source=openai))

Key Features

  • 10-billion-parameter AsymmDiT diffusion backbone for multi-modal video generation.
  • AsymmVAE codec compresses video tokens ~128× for efficient temporal attention.
  • Reference single-GPU and multi-GPU inference harness (60 GB VRAM suggested).
  • LoRA fine-tuning utilities to adapt the model on user video datasets.
  • Available via Replicate API playground and downloadable open-source weights.

Example Usage

Example (python):

import replicate

# Call the Replicate-hosted Mochi 1 model (requires REPLICATE_API_TOKEN in env)
model = replicate.models.get("genmoai/mochi-1")
version = model.versions.get("latest")

# Example predict call (parameters derived from reference README)
output = version.predict(
    prompt="A photorealistic river at sunset, cinematic lighting, gentle camera dolly",
    height=480,
    width=848,
    num_frames=31,
    num_inference_steps=64,
    cfg_scale=6.0,
    seed=12345,
)

print("Generated outputs:", output)

# Note: for local/self-hosted runs, see Genmo's GitHub repo for the pipeline and weights (inference harness, LoRA trainer, etc.).

Pricing

Replicate lists an approximate hosted cost of about $0.42 per run on H100 hardware and reports typical prediction times under a few minutes; Mochi 1 is also released open-source so you can self-host (which avoids hosted per-run fees but incurs hardware costs). Third-party UIs and services sell credits/subscriptions for Mochi-based workflows at varying prices — verify those vendors individually. Sources: Replicate listing and Genmo GitHub. ([internal.replicate.com](https://internal.replicate.com/genmoai/mochi-1?utm_source=openai))

Benchmarks

Model size (parameters): ≈10 billion parameters (Source: https://github.com/genmoai/mochi)

AsymmVAE compression: ≈128× (8×8 spatial × 6 temporal -> 12-channel latent) (Source: https://replicate.com/genmoai/mochi-1)

Recommended single-GPU VRAM: ≈60 GB VRAM (single-GPU reference); H100 recommended (Source: https://replicate.com/genmoai/mochi-1)

Hosted cost (Replicate example): ≈$0.42 per run (Replicate-hosted, varies with inputs) (Source: https://replicate.com/genmoai/mochi-1)

Default released resolution: 480p (initial release) (Source: https://replicate.com/genmoai/mochi-1)

Last Refreshed: 2026-01-09

Key Information

  • Category: Video Models
  • Type: AI Video Models Tool