Skip to main content

The motion in our videos


table of contents

Left: Sparse Optical Flow ; Right: Dense Optical Flow
Visualization of sparse and dense optical flow. From an article on nanonets.com

Whether you've seen the datamoshing glitch effects…
or seen keywords such as "pixel motion" or "optical flow" in your favorite software…
or you're a power user of AI animation tools like Deforum or warpfusion…
or just got screwed by YouTube, Instagram, TikTok or X (Twitter) compression when your uploaded intricate videos became all mushy…
There's something to learn and gain from understanding the underlying concepts of motion in digital video.

I'll start from the core fundamentals, but for those already familiar with the concept, feel free to skip to a section that you feel is relevant for you. This post will stick to the theory though, while the next post gets hands on with these techniques and how to implement them in TouchDesigner.

In essence, this entire topic emerges from two key components:

Image data Motion data
Static imagery, in video codecs called "I frame" or keyframe. Describes how the image data needs to be pushed and warped frame after frame to construct the appearance of object motion.

From the interplay of these two, appears a plethora of ways to store, reconstruct, alter, remix, and create video and animation content.

Video compression - it's all about that motion

Compressed digital video, so pretty much all video (unless you're dealing with production files) is concerned mainly with pushing pixels around. Rather than storing individual video frames in an image sequence, video compression techniques are usually smarter than that and focus on patches of pixels moving across the screen, from one video frame to another. This process, or any kind of way to store videos in a file, is known as encoding.

Way simplified, the video encoding process looks at source material, picks image data every few frames, then builds the motion data to warp that image data across frames, reproducing the source material. Better codecs get more complex with this process and build upon this with additional techniques, achieving lower bitrates with less quality loss.

*Codecs are encoding algorithms (formats) employed in compressing a video. Not to be confused with container formats (like .mp4 or .mov), which can contain a variety of codecs.

*Bitrate is the amount of data the video holds across time, usually expressed per second. (Kbps). More means larger files, and better quality (more detailed motion data and more frequent image data).

Mushy videos when uploaded online

No motion or simple uniform motion means it's easy to encode as pixels moving across the frame (motion data). Complex, non uniform motion, and changes of lighting, texture or color means it's harder to encode as motion data. If you're uploading something like the latter, the downgrade in quality is the most visible, as the platforms need to limit the amount of space your videos take up, and the cost to stream them. However, this is gradually evolving, as advanced codecs take things such as film grain into consideration.

Datamosh

Altering, corrupting, mixing up the image and motion data in encoded video files is what leads to the characteristic data-moshing glitch effects. Usually, it's one of the following:

  1. The motion data connected to specific image data is swapped. So it moves or "warps" an image frame it was not meant to warp.

  2. The motion data is somehow corrupted, boosted, or left in effect for longer than intended.
    The result looks like something that starts moving, never stops moving, or moves way more than in the original unaltered video.

You can play with datamosh glitches yourself in this simple webapp.

Optical flow (motion vectors)

From "LearnOpenCV" article. Top - sparse optical flow, bottom - dense (per pixel).

Motion vectors describe the exact movement for patches or all pixels in a video, frame after frame, also called "optical flow". Essentially, it is the pure form motion data I mentioned already. Compared to video compression, where the data is often very optimized, segmented, and meant to serve only to decrease video bitrate, motion vectors can be a more universal and practical way to work with video motion for creative purposes and VFX. Optical flow can be either sparse or dense, with the latter describing motion for every single pixel in a frame.

Motion blur

Easiest practical application for motion vectors is fabrication of motion blur in post-production stage. If you know how much each pixel moves on the screen and where, you can apply the exact directionality and amount of blur to simulate objects blurring to a camera.
Under the hood, this is how CGI rendering can reproduce believable motion blur, especially relevant when you need to match live action footage.

Optical flow estimation

When working in 3D CGI, the exact optical flow in the form of motion vectors is inherently available to retrieve and make use of (such as "vector" render pass in Blender).
However for shot footage, using motion data inherent in encoded video files is not very practical. To retrieve optical flow data, advanced tracking algorithms, or an AI model such as RAFT can estimate optical flow in a video quite accurately. This may be used for the aforementioned motion blur, or optical flow warping.

Optical flow warping - the transference of motion

If you've read everything so far, we're coming full circle, back to motion data pushing around and warping image data

Increasing framerate

Most often, using accurate optical flow data to warp images leads to the ability to cleverly interpolate more in-between frames, which is often used in techniques for increasing the framerate in videos or games. Particularly in real time game rendering, this exploits aspects of motion similarly to how video compression does, relying on identifiable motion across the frame to skip rendering some of the frames (Nvidia DLSS3 for example). If you know at what speed point A gets to point B, you can extrapolate the position of that point at any time in-between.

Sticking textures onto moving surfaces

Alternatively, optical flow may be used to "stick" textures onto objects in a video, as is the point in software such as "Lockdown" for After Effects

The fun stuff in AI frame-by-frame animation

More relevant to this blog though are the AI animation tools and implementations that make excellent use of this technique. This is where Warpfusion get its name from, and what all that warp and optical flow jazz stands for in Deforum

One of the earlier combinations of generetaive AI and optical flow that I remember, which at that time relied on style transfer GANs.

Potential drawback with generating AI images to be stitched into animation, is the flickering and temporal inconsistency. When working with a source video, one can generate only some "keyframe" images, rather than every single video frame, which in the case of Deforum is described as "cadence". Then, optical flow warping can be used to warp those keyframes into place, making the generated textures follow object motion in the video (a lot like the video encoding process). Overall, this usually leads to smoother results and textures "sticking" onto objects and subjects in a video. Even with cadence of 1 (all frames as keyframes), optical flow warping "aligns" the results of each previous frame with the current video source frame, which helps in video stylization tasks. Also briefly explained here by the author of warpfusion.

Deforum, and especially another experimental extension for SD webui called SD-CN-Animation can go further by "dreaming up" motion solely from the generated frames. Instead of estimating optical flow from input video, it's possible to ask an optical flow model to try and estimate optical flow from adjacent frames of generated animation. Than can be used to drive further motion and warping. The exact specifics of how that is implemented in such tools and how it relates to exposed parameters is sometimes difficult to dissect, but the code is open source! So read up, investigate, and ask the authors if you want one or another specific effect regarding optical flow.

These warping techniques are not to be confused with generative video AI, or video enhancements like animatediff. While optical flow and warping are sort of "hacks" to repurpose generative image models for use in animation and video, true video native AI models or enhancements on a neural-network level "think" about motion inside the neural network itself, but do not expose, or take in discrete optical flow data. It's a black box, that in an ideal case has a more nuanced understanding of motion in video, but as users we just have to trust it and hope that we have enough strings to pull to influence its temporal behavior.

Creative use cases

Very delibrate and planned out use of datamoshing, with each new scene starting from a still of last scene, being moved by new motion data.

Did you get any ideas so far? Besides the mentioned cases, usage of motion vectors turns up a lot in interactive installation art, enabling interactivity from live video feed and real time optical flow estimation. It likely turns up in many more fields and practical applications I wouldn't even think of.

In the next post, I'll be going over how motion vectors can be used in TouchDesigner to warp images, either by providing motion data from 3D rendered scenes, or estimating it with optical flow detection algorithms or AI models.

For example, it can be made into a way to project textures onto 3D geometry, without relying on any texture mapping.

A simple prototype of projection drawing onto 3D renders in TouchDesigner, using motion vector data.

In my case, I've been using it to replicate AI animation toolkits such as Deforum. Where sparse "key" frames are warped into place continuously to follow the motion in a video or 3D source.