Skip to main content

Optical flow warp in TouchDesigner - datamosh effects, AI video, and more!


table of contents

Following the overview of optical flow and related concepts from the previous post, this post is a practical guide. As mentioned there, optical flow can be used to warp images for a variety of use cases, including some frame-by-frame AI animation techniques, which this blog is focused on.

A test of using Blender with optical flow warping in my frame-by-frame AI animation workflow, closely mirroring how it is implemented in Deforum or Warpfusion. In this case, Blender's depth pass was used as depth ControlNet, with a little bit of foggy EEVEE render mixed into the img2img feedback loop.

I'll show you how to set it up in TD using either Blender renders, or optical flow TOP on video sources, however I'll explain everything thoroughly and keep it open ended, so you should be able to apply it in other scenarios as well!

TouchDesigner project file

I've included the essentials and some eaxmples shown in this post in a .toe file that you can get HERE

Core setup scheme

Regardless of software, performing iterative optical flow warp looks pretty much like this:

The three essential elements are the optical flow data itself, the displacement effect, and a frame-by-frame feedback loop. Let's dissect them:

Getting your dense optical flow data (motion vectors)

*"dense" means it contains motion for every single pixel in a frame, whereas sparse holds only limited set of points, which is insufficient for accurate warping. 

Dense optical flow data should consist of X and Y components for horizontal and vertical motion of every pixel in the frame, from one video frame to the next. This is either an array of 2D vectors, or a Red/Green image, depending on how you look at it. Additionally, it may come as 4D vectors, or RGBA image, which is simply 2 sets of X/Y motion data bundled together. That can be for backwards X/Y flow, in addition to forwards X/Y flow, or it can be data for one more adjacent video frame (in the case of Blender that's explained below).

  • From external 3D scenes (such as Blender). 

    If you'll be working with prepared 3D scenes that you made in software like Blender, you may export motion vectors as a separate render pass. In Blender's case, it's called "vector" pass, and is currently only available with Cycles renderer. After having it checked, export your scene in a multi-layer EXR sequence to have it bundled together in a correct format, or render it as a separate EXR sequence through the compositor if you like.

    Rendering from Blender, this will give you 4D vector data, corresponding to X/Y motion from previous frame to current, and X/Y motion from current frame to next, as explained here.

    This motion data is in pixel space, which will be important to know when setting it up for warping.

  • From video source - estimated live, powered by Nvidia GPU in Optical Flow TOP.

    While not the most accurate, this method easily works real-time, allowing applications with live camera feed, and live VJing. Works best if your input video has little to no noise, and no flat-color regions.

    Data from this operator corresponds to motion per-second in screen space normalized to screen width, which will be important to know when setting it up for warping.

  • From video source - estimated with an neural network model.

    Another, more accurate option is to use a neural network model, such as RAFT to estimate motion vectors from a video. This is how optical flow warping is done in Deforum or Warfusion. I've yet to try this in my own workflow, so I can't provide examples so far. Using such model requires a bit more setup either outside of TD, or with hooking it up inside TD using python scripts and imported libraries.

Warping - done by displacement of pixels

Optical flow warping is usually done through an effect in video/pixel 2D workflows called "displacement". It is easiest to visualize by looking into a shattered mirror or uneven reflective surface, such as water!

This operation looks at a displacement map, displacing the input image pixel coordinates according to it. The displacement map is a lot like that uneven shattered or warped reflective surface, which in practice is usually another image, with specific channels mapping to horizontal (X), and vertical (Y) displacement. Usually these are simply mapped to Red and Green components of an RGB input. Each pixel in the displacement map corresponds to a pixel in the input image, with the X component displacing left or right, and Y displacing up or down. This is all very similar to the way optical flow data is represented.

In TouchDesigner, you'll want to be using Displace TOP for this. As seen above, It's easy enough to play with this for various creative effects by making your own arbitrary displacement maps with noise and some gradients, however for precise optical flow warping, dense optical flow data is to be used as the displacement map.

Understanding the Displace TOP

By default, Displace TOP expects the displacement map to contain values in range of 0-1, and it displaces pixels in screen space UV coordinates (which are also 0-1*), meaning the maximum value of 1, or minimum of 0, will offset pixel's coordinate equal to the distance of full frame width (if it's for horizontal X component), or full frame height (if it's for vertical Y component). A displacement map pixel value of exactly 0.5 in this case is "neutral", resulting in no displacement. This neutral value corresponds to the Source Midpoint parameter.
*If using a fixed 8bit or 16bit color image for the displacement map, TD will map the values to the range of 0-1 for you anyway.

Another way to look at the logic of this operation, is that the displacement map is a bit like a lookup table for each pixel, although not for colors like in the case of color grading LUTs, but for screen space UV coordinates. Each displacement vector tells how far left-right, up-down should it look for a pixel to replace the original. With source midpoing being 0.5, a horizontal vector of 0.6 on a pixel would "look" for the pixel that's slightly on the right side of the original (0.6-0.5=0.1 or "10%" of width to the right side) and copy its color value. ...and so on for every pixel.

Having this in mind, you may need to adjust your optical flow data, or calibrate the displacement weights to work correctly when connected to a displace TOP.

Calibrating weights to fit your optical flow data

If you can wrap your head around how displacement works, and you know the kind of data you have, you can figure out potential range or weight calibrations needed to be done on your setup. You may also simply trial-and-error your way until it looks right through brute perseverance, which I'd have to admit to be doing sometimes, so that I could eventually thoroughly understand it and write about it, woohoo!

For now you can input anything into the source input of Displace TOP, we'll return to that.
Here are specifics for the 2 cases I have experience with:

Calibrating for Blender's vector pass

Blender's motion vector pass is in float format if you exported it correctly to EXRs. To open up an EXR sequence in TD, you'll want to use Point File In TOP. Despite its name, it can be used to read EXRs as image sequences, just like with Movie File In. (You may also use Point File Select)

You'll find the vector pass in the RGBA channel selection menus, which I recommend setting up in an order XYZW:

  • Vector.X - X displacement from previous frame to current frame
  • Vector.Y - Y displacement from previous frame to current frame
  • Vector.Z - X displacement from current frame to next frame
  • Vector.W - Y displacement from current frame to next frame 

We're mainly concerned with the first 2 vectors. They're in unbound, pixel coordinate space, describing the location offsets for each pixel compared to previous frame. This means, it corresponds to how many pixels each pixel moves between neighboring animation frames. For example, If your frame width is 1000px, then an object moving halfway across the frame from the right side will have X vector offsets of 500 for that animation frame, and moving halfway from the left would have -500. If that sounds confusing, it's because it is. Thankfully, it fits with default Displace TOP behavior really well. It just needs to be normalized to 0-1 range to work with default Displace TOP, or the weights on the Displace TOP itself can be readjusted.

Here's a setup that adjusts the Displace TOP to work correctly:

Horizontal and vertical are set to Red and Green channels from the Point File In. The source midpoint has been set to 0 to account for float format centered around 0. Displace weight is configured through expressions to normalize pixel space coordinates to screen space 0-1 coordinates.

Calibrating for Optical Flow TOP

Output from this operator unlike vectors from Blender are for movement per-second, not per-frame. Values are in -1 to 1 range, but inverted, and normalized to frame width. Read up the docs if you'd like. Here's a setup that adjusts the Displace TOP to work correctly:

Horizontal and vertical are set to Red and Green channels. The source midpoint has been set to 0 to account for float format centered around 0. Displace weight is configured through expressions to normalize for 0-1 coordinates, divided by framerate to get per-frame values, and negated to match how displacement maps work.

Make it feed-back!

Needing to make a feedback loop is what unfortunately rules out using some of the other software for this, such as After Effects or Blender, because natively you couldn't iterate over frame buffer like this frame after frame.

The last piece of the puzzle is to set up our displacement to iteratively displace it's own results frame after frame. Following the simplified graph I showed earlier, doing that in TD is easy with Feedback TOP. Here's the complete basic node setup (in this case using optical flow TOP):

"Target TOP" for the Feedback TOP is set to the Displace TOP. It can also be something else down the chain if you want to add more effects in your feedback loop.

Your global project (or local Comp) framerate should match your video input framerate to work correctly, and should remain stable. If your TD is dropping frames, it's probably because your GPU is struggling to keep up with the video. Consider lowering your framerate, or on the Optical Flow TOP: increasing the grid size, lowering quality.

Also, consider binding an easy access to feedback "reset" parameter to refresh the feedback loop and get a clean start. It depends on your goals when you would want to do that.

To visualize if your warping is working alright, composite your warping image over your input video or 3D render source, like this:

An example of a simple warping setup, using a video from Movie File In TOP, warping some text from a Text TOP. The Feedback TOP "reset" par is referencing the panel Button "RESET", which I click every so often. The Composite TOP is set to "over" operation.

This is only the beginning of potential use cases and effects! Here are some examples of where you could take it from here:

Painting and stamping on video

By flashing images for one frame and compositing that inside the feedback loop you can build on top of existing warped loop. Here, I've created a simple painting panel* that displays a brush (either a circle or a TOP) wherever the user is clicking.
*covering custom panel creation is beyond the scope of this guide, but seek out information online! This specific example and panel component will be included with the project files though.

Here, my custom panel displays a brush over transparent background whenever and wherever it is being clicked on, which is being composited over the existing feedback loop each frame. The background video is taken from the final resul on the right ("over1" TOP).

This setup is included in the project files.

Datamosh-like transitions between videos

You may mix and match various video sources for optical flow and warp on the go, just like in video datamosh techniques. For instance, you may create transitions by crossfading from the last frame of video A being warped by video B, into pure video B, like this:

Here, a pulse button is used to toggle (through "logic1" CHOP) two TOP switches. One of them is meant for the video frame fading out, put through displace feedback loop, another is for the video fading in. The crossfade itself is linked to timer's progress, triggered by the pulse button. The feedback is reset, and optical flow weight is set to 0 on the pulse as well to allow for a clean transition.

This setup is included in the project files.

Usage in video stylization with generative image AI

Generating hundreds of Stable Diffusion images per second through extensive optimization is all the rage now, but I haven't seen anyone make use of optical flow to enhance their live-AI filter experiments. As mentioned previously, this is a tested and proven method in some cases of frame-by-frame generative AI techniques, as seen in Deforum and Warpfusion, and even earlier methods that used GAN based models. It allows to focus only on sparse "key" frames to be diffused, rather diffusing on every single video frame, which often produces a smoother, less "flickery" result when these key frames are blended across. In Deforum's case, this is expressed as "Cadence".

Here's a simple prototype in action, working on live camera feed from my phone and transmitting the result back to it from TD. It warps and crossfades generated "key" frames, working as a sort of real life texture projection. It is also composited over the original video feed, that has a simple "edge" filter, to reveal where the textures have yet to be generated.

The main trick of this was to warp a temporary UV map upon sending a frame to be generated by SD. Upon completion the frame is fitted onto live video with Remap TOP, up to date with the current motion flow, using the UV map that has already been warped into place and can be continued to be warped from there.

I won't get further into details on its network for now, because it's quite rough around the edges, is dependent on some specific workarounds, and isn't even using the state of the art pipelines such as StreamDiffusion, but it wouldn't be anything advanced to TD veterans. I put it together using API calls to A1111 webui, deriving my own custom implementation from Oleg Chomp's component, and making use of LCM LoRA with canny T2I-Adapter (runs faster than the equivalent ControlNet). All running on a RTX 3060. I can't afford more time on this at the moment, but I encourage anyone to try a "proper" shot at this, especially if Derivative adds more complete functionality to the Optical Flow TOP. Leave a comment if you have questions!

Drawbacks and resulting artifacts

Depending on your setup, your warped image might quickly get blurry, turbulent, or "smeary" and glitchy looking. If you're into experimenting with such aesthetics, hooray! But if you're looking to dig deeper, read on...

For start, if you're using optical flow detection from video, there will always be imperfections. The default parameters on the Optical Flow TOP should be okay, but experiment with them to see if you can smooth it out, or try blurring it (I for example applied a 3x3 median filter thru GLSL). You may also experiment with using "cost output" to ignore or cut out areas of optical flow that it is not confident about. This is also where importing pure motion data from something like Blender shines, because that comes with zero imperfections and no ambiguity.

Also, this technique is inherently flawed at dealing with new areas coming from the sides of the frame, and occluded regions coming into line of sight. This is how the smearing, and ghost trails appear when warping. I'll talk more on this down below, but for the frame edges, you can at least choose to "hold" or have them come out empty ("zero") on Displace TOP "Extend" parameter.

Displacing in higher resolution to reduce blurriness

The blurriness builds up from repeatedly moving pixels around. One trick is to perform your displacement loop in a temporary higher resolution, up-scaling it 2x for example, then down-sampling it back to your native desired resolution on output outside the feedback loop. In my own non-realtime workflow, even simple Lanczos 2x up-sampling has proven to help a lot, with the best results coming from using neural network upscaling models.

A cropped comparsion between no temporary upscaling, Lanczos, and RealESRGAN effect on image crispness after 15 frames of iterative feedback warp. Blender's motion vectors were used for optical flow.

Accounting for occlusions, aka using "consistency masks"

For the most advanced and complex use cases, you may detect occluded regions coming into line of sight and either cut them out completely, replace them with pixels from previous frames, pixels form the background, or even inpaint them either with simple inpainting algorithms or Stable Diffusion inpainting.

When working with Nvidia Optical Flow, some of these advanced workflows to deal with occlusions are in theory possible with their SDK, however in TD some of that functionality hasn't been exposed yet, such as combined forwards-backwards flow estimation, and global flow estimation.

A screengrab from the explanation by Warpfusion author. On the left - combined "consistency" mask, showing new areas coming into view between 2 video frames. On the right - a "dry" warp, using only the base image, to showcase the trailling effect.

For those coming from AI frame-by-frame animation background, you may recognize the term "consistency mask", which is well explained in this short walktrhough by the author of WarpFusion/ComfyWarp. It relates to methodology to deal with occlusions and the resulting "trails" when stylizing video footage.

Forwards and backwards flow - detecting occluded regions

But how do you actually detect these occluded regions? As written in a blog post by Nvidia that I linked earlier:

One simple check usually employed is to compare the forward and backward flow. If the Euclidean distance between forward flow and backward flow exceeds a threshold, the flow can be marked as invalid.

While not straightforward to do with the current implementation of Nvidia's optical flow in TD, we can easily do it with Blender's motion vector output, because it holds both forward and backward flow data in 4D vectors. I've worked this out thanks to an absolutely excellent explanation from Mark Stead, who was doing optical flow warping for the purpose of experimental temporal denoising:

I've followed Mark's logic and recreated the vector subtraction idea that he explains, inside TD. It's included in the project files.

Conclusion and role in Diffusion Pilot

I hope you've found enough examples, tips and information to start playing and make use of optical flow in your projects! As I already mentioned, I hoped this to be not only a tutorial for TD or Blender, but also a manual on working with optical flow in general.

In my case, I've delved deep into this technique to implement it as one of the integral elements for combining 3D with Stable Diffusion in an a highly controllable way that I seek. My prototyping and development is ongoing, but I was already able to make plenty of promising tests from Blender's motion vector pass.

While so far I'm focusing my efforts on non-realtime workflows and production, I can foresee this being a really fun tool for integrating generative image models into installations and interactive experiences, like you saw in my experiment with the optical flow TOP. (if real time optical flow estimation gets more accurate)

Finally, I know generative video models or enhancements like animatediff are taking over the spotlight now, but so far their capacity to make diverse kinds of movement, in a big range of timing and speeds seems to be limited. With optical flow, you can take full, manual control over the motion of your generative AI experiments, and have the model "paint" on the surfaces and objects in your scenes.