Studio Brilly

Creating Spatial Video for Apple Vision Pro on Linux

March 13, 2025

Apple Vision Pro offers an incredible immersive experience, especially with spatial videos. While Apple provides tools for macOS users, Linux enthusiasts might wonder how to create compatible spatial videos using open-source tools. This guide walks through the process of creating Vision Pro-compatible spatial video on Linux - but this is just the beginning of our journey into the world of spatial video creation.

The input video for our process comes from our custom VR upscaler and enhancer technology, which significantly improves the quality of VR content. In future articles, we'll describe our complete workflow for video stabilization and enhancement using custom tools we've developed specifically for spatial video. Our pipeline processes stereoscopic 3D content to maximize clarity, depth perception, and overall immersion before encoding it for spatial playback. If you're interested in learning more about our enhancement technology or need help with your VR content, please contact us.

Note: This guide builds upon the excellent work described in SpatialGen's article on encoding MV-HEVC with FFmpeg, uses techniques inspired by Mike Swanson's Spatial Video Tool, and incorporates findings from this detailed SuperUser discussion about creating Vision Pro-compatible MV-HEVC files. We highly recommend reading these resources for a deeper understanding of spatial video formats.

Prerequisites

Before starting, ensure you have the following tools installed on your Linux system:

Step 1: Create HEVC Video with x265

First, we need to encode our video using x265 with specific parameters that match Apple Vision Pro requirements. In our workflow, x265 receives input directly through a pipe from our custom VR enhancer, which saves storage space by avoiding intermediate files and allowing for a seamless processing pipeline. The following command creates an HEVC file with the appropriate settings:

x265 --y4m --multiview-config multiview.conf --fps 90 --input-res 3840x3840 --input-csp i420 --input-depth 8 --profile main --colorprim bt709 --transfer bt709 --colormatrix bt709 --crf 20 --vbv-maxrate 160000 --vbv-bufsize 240000 --keyint 180 --min-keyint 180 --no-open-gop --repeat-headers --output video_avp.hevc --preset slow

This command uses a multiview configuration file to handle the stereoscopic 3D aspect, sets the framerate to 90fps, and uses a resolution of 3840x3840. One significant limitation to note is that FFmpeg doesn't yet fully support 10-bit MV-HEVC encoding, which is why we're using 8-bit depth in our workflow. This is one of the key differences between our Linux-based approach and what's possible with Apple's native tools.

Step 2: Create Intermediate MP4 File

Next, we need to create an intermediate MP4 file that will properly handle the [ctts] atom (Composition Time to Sample):

MP4Box -add video_avp.hevc:fps=90 -new video_avp.mp4

This step is crucial as it ensures proper timing information is stored in the MP4 container. MP4Box analyzes the HEVC stream and adds the necessary CTTS atom, which is essential for proper playback timing in spatial videos. Without this step, FFmpeg would return "pts has no value" errors in the next step.

Step 3: Add Metadata with FFmpeg

In this step, we use FFmpeg to add custom metadata and ensure the file uses the proper HEVC tag. Since x265 doesn't write PTS (Presentation Time Stamp) values in the output files, we rely on MP4Box from the previous step to handle the timing information. FFmpeg preserves the CTTS atom that was added by MP4Box while adding our custom metadata:

ffmpeg -i video_avp.mp4 -c:v copy -metadata "videoai=brilly.tv" -tag:v hvc1 -movflags +faststart+use_metadata_tags+write_colr video_avp.mov

This command copies the video stream without re-encoding, adds our custom metadata, sets the HEVC tag to "hvc1" (required for Apple compatibility), and adds several important MOV flags.

Important Note: At this point, your spatial video might already be compatible with Apple Vision Pro, depending on your software versions and specific setup. If you're able to transfer the file to your Vision Pro and it plays correctly in 3D, then congratulations - give yourself a high five! 👏 However, if you encounter playback issues or the video isn't recognized as spatial content, you'll need to follow the additional steps below to modify the file's atoms. These steps are often necessary due to variations in software versions and the evolving nature of spatial video standards.

Step 4: Extract Spatial Atoms from Reference File

To make our video compatible with Apple Vision Pro, we need specific spatial metadata atoms. This is a critical step, as the Vision Pro requires these special atoms to properly recognize and display the content as spatial video. We'll extract these from a reference file created with Mike Swanson's Spatial Video Tool:

mp4extract moov/trak[1]/mdia/minf/stbl/stsd/hvc1/vexu spatial.mov vexu.bin
mp4extract moov/trak[1]/mdia/minf/stbl/stsd/hvc1/hfov spatial.mov hfov.bin
mp4extract moov/trak[1]/mdia/minf/stbl/stsd/hvc1/lhvC spatial.mov lhvC.bin
mp4extract moov/trak[1]/tapt spatial.mov tapt.bin

These commands extract the necessary spatial atoms:

If you encounter playback issues, a helpful troubleshooting approach is to use mp4dump to compare your file with a reference file created by the Spatial Video Tool:

mp4dump spatial.mov > spatial_dump.txt
mp4dump video_avp.mov > video_dump.txt

Then compare these dumps to identify differences in the atom structure. You can add or remove atoms as needed to match the structure produced by the Spatial Video Tool. This comparison method is particularly useful when dealing with different software versions or when the standard requirements change.

Step 5: Insert Spatial Atoms and Remove Unnecessary Ones

Finally, we'll use mp4edit to insert the spatial atoms into our video and remove any unnecessary atoms:

mp4edit --insert moov/trak[0]/mdia/minf/stbl/stsd/hvc1:lhvC.bin --insert moov/trak[0]/mdia/minf/stbl/stsd/hvc1:vexu.bin --insert moov/trak[0]/mdia/minf/stbl/stsd/hvc1:hfov.bin --insert moov/trak[0]:tapt.bin:1 --remove moov/trak[0]/mdia/minf/stbl/stsd/hvc1/pasp video_avp.mov video_avp_metadata.mov

This command inserts all our extracted spatial atoms into the correct locations in the MOV file structure and removes the "pasp" (Pixel Aspect Ratio) atom, which isn't needed for spatial video.

The Result: Vision Pro-Ready Spatial Video

The final file, video_avp_metadata.mov, is now ready to be played on an Apple Vision Pro. The video contains all the necessary metadata for the Vision Pro to recognize and display it as a proper spatial video, giving viewers an immersive 3D experience.

Technical Notes

A few important technical considerations:

Just the Beginning of Our Journey

This workflow demonstrates that Linux users can create fully compatible spatial videos for Apple Vision Pro without needing macOS-specific tools. By understanding the underlying container format and required metadata, we can produce professional-quality spatial videos using entirely open-source software.

However, this is just the beginning of our journey into spatial video creation. As the technology evolves, we'll continue to refine our process, develop new tools, and explore innovative ways to create immersive content. The field of spatial video is still in its early days, and we're excited to be part of its evolution.

We're constantly experimenting with new techniques for capturing, processing, and delivering spatial content. Our team is working on advanced methods to optimize the viewing experience on Apple Vision Pro and other spatial-capable devices. Stay tuned for future updates as we continue to push the boundaries of what's possible with spatial video on Linux.

If you're working on spatial video projects or have questions about our process, we'd love to hear from you. Contact us at Studio Brilly to discuss collaborations, custom solutions, or to learn more about our VR enhancement technology.

For full-resolution materials, inquiries and suggestions please contact us at: studio@brilly.tv