Out-of-Distribution Test

These examples correspond to synthetic motion specifications and are shown without ground-truth videos.

A drone view where the drone flies forward rapidly while banking left, photorealistic, 4k.

IMU: forward_bank_left

Forward surge with sustained left roll.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

In this example, the commanded motion requires a strong forward trajectory together with an aggressive left bank. The model trained without physics supervision moves roughly forward, but fails to maintain the intended banking behavior and introduces visible artifacts as the turn develops. In contrast, Aero-World better preserves scene structure while more faithfully executing the commanded leftward banking motion.

A drone view where the drone flies forward rapidly while banking right, photorealistic, 4k.

IMU: forward_bank_right

Forward surge with sustained right roll.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

Here, the desired trajectory corresponds to fast forward motion with a sustained right bank. Without physics supervision, the model begins to bank in the correct direction but quickly distorts the environment, indicating weak motion consistency under control. Aero-World follows the commanded right-banking trajectory more faithfully while preserving the geometry of the scene.

A drone view where the drone pitches upward and then pitches downward, photorealistic, 4k.

IMU: pitch_up_then_down

Negative wy first, then positive wy.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

This example requires a temporally structured maneuver: the drone should first pitch upward and then pitch downward. The model without physics supervision mostly continues drifting forward and only weakly reflects the upward pitch command. Aero-World captures the initial upward motion much more clearly, showing improved response to the commanded trajectory, although both models still struggle to fully realize the subsequent downward phase of the maneuver.

A drone view where the drone flies forward and turns left smoothly, photorealistic, 4k.

IMU: forward_left_turn

Forward motion with left yaw and slight coordinated left roll.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

The target motion in this case is a smooth forward left turn. The model without physics supervision fails to follow this command and instead begins rotating in the opposite direction, revealing poor alignment between the conditioning signal and the generated camera motion. Aero-World correctly executes the leftward turn and maintains a more coherent scene evolution throughout the maneuver.

A drone view where the drone flies forward and turns right smoothly, photorealistic, 4k.

IMU: forward_right_turn

Forward motion with right yaw and slight coordinated right roll.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

This sequence requires the drone to move forward while smoothly turning right. Without physics supervision, the model does not realize the intended rightward turn and instead produces noticeable scene morphing and unstable camera behavior. Aero-World follows the commanded right-turn trajectory much more reliably, yielding both better control alignment and more stable visual dynamics.

A drone view where the drone flies forward while ascending, photorealistic, 4k.

IMU: ascend_forward

Forward surge with upward motion.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

In this case, the motion command specifies simultaneous forward motion and upward ascent. The model without physics supervision largely fails to gain altitude and does not clearly reflect the ascending component of the trajectory. Aero-World responds much more directly to the control signal, producing a visibly upward flight path that better matches the commanded motion.

A drone view where the drone flies forward while descending, photorealistic, 4k.

IMU: descend_forward

Forward surge with downward motion.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

This example requires the drone to move forward while descending. The model trained without physics supervision does not follow the intended downward trajectory and instead exhibits a drifting, swirling motion that deviates from the control input. Aero-World produces a much clearer descending path, demonstrating stronger adherence to the commanded motion.

A drone view where the drone flies forward while swaying left and right in a slalom motion, photorealistic, 4k.

IMU: slalom_left_right

Forward motion with lateral oscillation and matching roll.

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

This slalom example is particularly challenging because it requires repeated fine-grained left-right lateral oscillations rather than a single coarse maneuver. Both the model without physics supervision and Aero-World struggle to fully capture the alternating side-to-side motion, suggesting that this type of rapidly varying control remains difficult. Nevertheless, Aero-World produces somewhat more structured motion, while the baseline responses remain less faithful to the intended slalom pattern.

Validation Test

These examples use 4-way comparisons. Ground truth videos are included for contrast. These are from our AeroBench validation set, which is held out from training but still in-distribution.

Validation Example 1

A drone view where the drone flies forward normally and sways to the right mildly and moves downward rapidly and turns right gently, then the drone flies forward normally and sways to the right mildly and moves downward very rapidly and turns right gently, then the drone flies forward normally and sways to the right mildly and moves downward very rapidly, then the drone flies forward rapidly and sways to the right mildly and moves downward rapidly, then the drone flies forward rapidly and sways to the right mildly and moves upward rapidly, then the drone flies forward rapidly and sways to the right mildly and moves upward very rapidly, then the drone flies forward very rapidly and sways to the right mildly and moves upward very rapidly, then the drone flies forward very rapidly and moves upward very rapidly, photorealistic, 4k.

Ground Truth / Reference

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

Validation Example 2

A drone view where the drone flies forward very rapidly and sways to the right very aggressively and moves upward very rapidly and turns right very aggressively, then the drone flies forward rapidly and sways to the right very aggressively and moves upward very rapidly and turns right very aggressively, then the drone flies forward rapidly and sways to the right very aggressively and moves upward slowly and turns right very aggressively, then the drone flies forward rapidly and sways to the right very aggressively and moves downward slowly and turns right very aggressively, then the drone flies forward normally and sways to the right very aggressively and moves downward rapidly and turns right very aggressively, then the drone sways to the right very aggressively and moves downward rapidly and turns right very aggressively, then the drone sways to the right very aggressively and moves downward slowly and turns right very aggressively, photorealistic, 4k.

Ground Truth / Reference

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

Validation Example 3

A drone view where the drone sways to the right mildly and moves downward rapidly, then the drone sways to the right mildly and moves upward slowly, then the drone sways to the right mildly and moves upward very rapidly, then the drone flies forward normally and sways to the right mildly and moves upward very rapidly, then the drone flies forward normally and moves upward slowly, then the drone flies forward normally and moves downward slowly, then the drone flies forward normally and moves downward rapidly, then the drone flies forward normally and moves downward slowly, then the drone flies forward normally and turns right gently, photorealistic, 4k.

Ground Truth / Reference

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

Validation Example 4

A drone view where the drone flies forward rapidly and sways to the left mildly and moves downward very rapidly and turns right gently, then the drone flies forward rapidly and moves downward very rapidly and turns right gently, then the drone flies forward rapidly and moves downward very rapidly, then the drone flies forward very rapidly and moves downward very rapidly, then the drone flies forward very rapidly and moves downward rapidly, then the drone flies forward very rapidly and moves downward slowly, then the drone flies forward very rapidly and moves upward slowly, then the drone flies forward very rapidly and moves upward rapidly, then the drone flies forward very rapidly and moves upward very rapidly, photorealistic, 4k.

Ground Truth / Reference

DiT Backbone

Finetuning With No Physics Supervision

Aero-World (Ours)

BibTeX

@article{radi2026aero,
  title={Aero-World: Action-Conditioned Aerial Video Generation from Inertial Controls},
  author={Radi, Abdul Mohaimen Al and Li, Kunyang and Shang, Yuzhang and Shah, Mubarak and Tian, Yu},
  journal={arXiv preprint arXiv:2605.19728},
  year={2026}
}