table of contents
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-8iAi4aNr7b6-UD5dzgIICmKqVqcepRc73V_PRFe85BntXk_xojkV66iqmMR7tc1ohhc4qEUeDTPUgk7HZu6ECG3Of5AinkAonlEdEVsIziMA0XnwZwc68OIu2gSA75aqW1b_LOaRxlhj8WsbCAypzj8200F0ZME1dnfGrXzEfY1Yn3NLb13TdOkhNslU/s320-rw/00120-4274178265.png)
![Condensed graph showing the overview of generative AI animation techniques and tools (December 2023)](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3p_F5PZG0rvGwbG7Uqhtw5nPUEexSF-W__XO8P4AvKTqjlTkamH9M-et4yOJx-VIb6e1oM5YtTXmFHc3BCF15SwnD1f_Iyc0t_-0TCbx0XkX2SxriYNvDRrH78Vm3nRWRLoYJVDdxYk86WrC5UumvlemFlLYyqmhqNHjwJB-JuaZiFci-WzKKlw8yaoKA/s1600-rw/overview%20of%20gen%20ai%20anim.png)
In this post I attempt to hierarchically lay out and categorize the current array of techniques involving generative AI that can be used in animation, giving brief descriptions, examples, pros and cons, and links to find associated tools. It's the kind of resource I wish I had a year ago as an animator, when trying to navigate the chaotic network of possibilities and ever growing progress. Video stylization use cases, while somewhat overlapping, are mostly left out here.
It is aimed at anybody curious, but mostly at other animators and creatives who might feel intimated by the accelerating progress in the field. Hopefully this allows you to catch up and keep an eye on the scene on a deeper level than TikTok feed.
Disclaimers:
- It's my best attempt the time of writing, based on my possibly subjective analysis as an animator, and some amount of personal opinion. I hope to keep refining it collectively though!
-
The list skips older tools, like those based on GAN models, as diffusion
based models have become more established and popular.
-
This guide is not a tutorial, but the communities of most tools are teeming
with helpful content. To get started, use keywords from this guide to look
online!
Glossary:
What actually is AI?
AI Model
Refers to Neural Network models, with each being trained on specific
kind of data, and having specialized intended behavior in mind. An "AI"
as used broadly in the media usually refers to an application (tool)
that employs one of such models, or sometimes several working together.
As a user you can rely on these applications, which usually (but not
necessarily) conceal the actual model and expose only limited control
and parameters, or use these models directly if they are open source,
which also allows to potentially fine-tune them through further training
or other customization.
AI Tool
Refers to any code, software and applications, both online and running
local on your computer, that are wrapped around AI models or somehow
relying on them. I won't fight you if you object referring to AI as
"tools", but it only makes sense in this specific context, at least for
now.
Diffusion
Diffusion refers to archetype of generative diffusion-based models, that dominate the field at the moment. They generate results by iteratively "revealing" results from noise, step-by-step, in a process called "denoising".
[Input 2 Output]
A widespread expression to indicate the type of input/output pair used on an AI application or a model. The "input" conditions the "output" result. It is usually used very loosely. "Video 2 video" for example can mean very different things under the hood on different occasions, but nevertheless it's useful to indicate the type of possible workflow to an end user.
Notebook
Refers to python based collection of structured code, easily shared and annotated. Most applications work by controlling the AI models through python and specialized libraries like PyTorch, which can be ran on these notebooks. They are often shared as user-ready tools for people to run either locally or on remote hardware.
Seed
An initial input, often a random vector or value, used to initialize the generation that produces the result. The same seed will generally result in same result if other variables don't change. Manipulating seed across many generations can be done creatively to induce various desired or experimental effects.
-
Generative image
Techniques that rely on using generative image AI models, that were trained on static images.
-
Generative image as material and assets
Author of the short film "Planets and Robots" uses digital cutout to animate the generated AI images. It also plays with LLMs to generate voice over script. Using static images generated from any AI app as assets in traditional workflow such as 2D cutout, digital manipulation, collage, or even as a source for other AI tools that for example offer "image2video". Besides the origin of images and material, this technique depends on your usual skillset of cutting and manipulating images.
PROS CONS - Easy to transition into for existing animators.
- Can help with backgrounds.
- Doesn't feel too "fresh".
- Relies on great synergy with between material and animation.
TOOLS FREE PAID (Any generative image model or app):
-
Stable Diffusion (on local machine), or any
online app like this
- Craiyon
- Krea AI
- Invokeai (using SD)
- Enfugue (using SD)
- SkyBox AI - generation of VR ready 360 scenes.
- DALL-E 3 on Microsoft image creator
- Leonardo AI - refined app for working with generative image AI. Offers some free daily credits.
- Stable Projectorz - sophisticated 3D texturing using SD
- ComfyUI nodes in Blender
- Generative AI for Krita - streamlined artist friendly way to work with Stable Diffusion, powered by ComfyUI backend.
Plugins and addons:
Additionally, you may find some free demos on Hugging face spaces.(Any generative image model or app):
- MidJourney
- Runway
- DALL-E 3 on ChatGPT
- Adobe's FireFly
- RenderNet - app for using advanced SD techniques and tricks, in a streamlined interface over cloud.
Animating can be done using After Effects, Moho, Blender, etc. -
Generative image frame-by-frame
Animation likely done with Stable WarpFusion, involving I2I loops, and some underlying video input that is warping (displacing) the animation. Author - Sagans. This encompasses all techniques that use generative diffusion image models in a rather animation-native spirit, generating sequences of motion frame-by-frame, like you would draw and shoot traditional animation. The key aspect here is that these models have no concept of time or motion when generating each image, but it is up to mechanics added on top and various applications or extensions to help produce some sort of animated imagery in the end, often refereed to as having "temporal consistency".
These techniques usually posses the characteristic flicker in the animations. While many users of these tools aim to clean that up as much as possible, animators will tell you that it's called "boiling" and has been a staple of animation art all this time.
Mostly applicable to open source models such as Stable Diffusion and tools built on them, which can be used with exposed parameters and possibly on local hardware. For comparison, something like MidJourney has its model concealed and with interface streamlined for pictures, thus it couldn't be used for these techniques.
It usually consists of these techniques mixed and layered together:
-
Standalone (Text 2 Images):
There are several novel techniques to generate animations with only text prompts and parameters this way:
-
Parameter interpolation (morphing)
Prompt editing with gradually changing weights creating a transition. Depth ControlNet was used to keep the overall hand shape consistent. Gradually interpolating parameters on each generated image frame to produce a change in the animation. Parameters can be anything to do with the model, such as the text prompt itself, or the underlying seed ("latent space walk").
-
Image 2 Image (I2I) feedback loops
Using a starting image, and a prompt of something different makes it deteriorate into something else frame-by-frame. Using each generated image frame as input for the following frame in animation through "image 2 image". This allows to produce similar looking frames in sequence while other parameters are changing and the seed is not staying fixed. Controlled usually through "denoising" strength, or "strength schedule" in Deforum. The starting frame can also be a pre-existing picture.
It's a core building block of most animation implementations that use Stable Diffusion, on which relies many other techniques listed below. Very delicate to balance, dependent a lot on the sampler (noise scheduler) used.
-
2D or 3D transformation (on I2I loops)
The endless zoom-in that everybody and your grandma has seen already. It works so well because you can rely on SD continuously dreaming up new details. Gradually transforming each generated frame before it is sent back as input in I2I loops. 2D transformations correspond to simple translation, rotation, and scale. 3D techniques imagine a virtual camera moving in 3D space, which is usually done by estimating 3D depth in each generated frame and then warping it according to the imagined camera motion.
-
Experimental, motion synthesis, hybrid, and other techniques
Made with SD-CN Animation, which has an unique method of hallucinating motion across generated frames. Starting image was used for init, but nothing else. Motion synthesis is about trying to "imagine" motion flow between subsequent generated frames, and then using that to warp them frame-by-frame to instill organic motion on I2I loops. This usually relies on AI models trained on motion estimation (optical flow) in videos, but instead of looking at subsequent video frames, it is told to look at subsequent generated frames (through I2I loops), or some sort of hybrid approach.
Other techniques may include advanced use of inpainting together with warping, multiple processing steps or even taking snapshots of model's training process. Deforum for example is loaded with knobs and settings to tinker with.
-
-
Transformative (Images 2 Images):
Additionally, some sort of source input can be used to drive the generated frames and resulting animation:
-
Blending (stylizing) - mixing with video source or/and conditioning (ControlNets)
Deforum's hybrid mode with some ControlNet conditioning, that is fed from a source video (seen on the left). Masking and background blur were done separately and are unrelated to this technique. This is a broad category of ways to mix and influence generated sequences with input videos (broken into individual frames), often used to stylize real life videos. At the moment riding a trend wave of stylizing dance videos and performances, often going for the Anime look and sexualized physiques. You may use anything as input though, for example rough frames of your own animation, or any miscellaneous and abstract footage. There are wide possibilities for imitating "pixilation" and replacement-animation techniques.
Input frames can either be blended directly with generated images each frame, before inputting them back each I2I loop, or in more advanced cases are used in additional conditioning such as ControlNets.
-
Optical flow warping (on I2I loops with video input)
Deforum's hybrid mode allows this technique with variety of settings. Increased "cadence" was also used for less flickery result, so the warping would show up better. Masking and background blur were done separetly and are unrelated to this technique. "Optical flow" refers to motion estimated in a video, which is expressed through motion vectors on each frame, for each pixel in screen space. When optical flow is estimated for the source video used in a transformative workflow, it can be used to warp the generated frames according to it, making generated textures "stick" to objects as they or camera move across the frame.
-
3D derived
The conditioning done with transformative workflows may also be tied directly to 3D data, skipping a layer of ambiguity and processing done on video frames. Examples being openpose or depth data supplied from a virtual 3D scene, rather than estimated from a video (or video of a CG render). This allows the most modular and controllable approach that's 3D native, especially powerful if combined with methods that help with temporal consistency such as optical flow warping.
This is probably the most promising overlap between established techniques and AI for VFX, as seen in this video.
One of the most extensive tools for this technique is a project that simplifies and automates generation of ControlNet ready character images from Blender. In this example, the hand rig is used to generate openpose, depth, and normal-map images for ControlNet, with final SD result seen on the right. (openpose was discarded in the end as it proved to be unusable for hands only) This blog and "Diffusion Pilot" also focuses on this approach, stay tuned!๐
-
PROS CONS - Novel, evolving aesthetics, unique to the medium.
- Conceptually reflects the tradition of animation.
- The most customizable, hands on, and susceptible to directing.
- Modular, layered approach.
- Can be conditioned with video frames or complex data such as 3D render passes.
- Often flickery and somewhat chaotic.
-
Dense on technical level, delicate to balance, advanced
results have steep learning curve.
- Usually inconvenient to do without having good local hardware. (nvidia GPU)
TOOLS FREE PAID -
Small scripts for parameter interpolation animations
(travels):
steps,
prompts,
seeds.
- Deforum - the best powerhouse for all animated SD needs, incorporating most of the techniques listed above.
- Parseq - popular visual parameter sequencer for Deforum.
- "Deforum timeline helper" - another parameter visualization and scheduling tool.
- Deforumation - GUI for live control of Deforum parameters, allowing reactive adjustment and control.
- TemporalKit - adopts some principles of EBsynth to use together with SD for consistent video stylization.
- SD-CN Animation - somewhat experimental tool, allowing some hybrid stylization workflows and also interesting optical flow motion synthesis that results in turbulent motion.
- TemporalNet - a ControlNet model meant to be used in other workflows like Deforum's, aiming to improve temporal consistency.
Tools to use in A1111 webui (If you have sufficient hardware)*:Python notebooks: (to be ran on Google Colab or Jupyter)*:- Stable WarpFusion - experimental code toolkit aimed at advanced video stylization and animation. Overlaps with Deforum a lot.
Plugins and addons:- Dream Textures for Blender
- AI Render for Blender
- Stabiliy Ai's Blender plugin
- Character bones that look like Openpose for Blender - for use with ControlNets outside of Blender.
- Unreal Diffusion for Unreal Engine 5
- After-Diffusion for After effects (highly WIP)
- A1111, ComfyUI API components, and streamdiffusion implementation for TouchDesigner from Oleg Chomp - if you know what you're doing, can be set up for animation or anything you can imagine.
- Stability AI's Animation API
- Kaiber's "Flipbook" mode - based on Deforum's code, as stated in their credits.
- Deforum studio - official online service version of Deforum.
- AI Animation Generator on gooey.ai - simplified way to run Deforum online, offers some free credits.
- Neural frames - generator service inspired by Deforum.
Plugins and addons:- Diffusae for After Effects
- A1111, ComfyUI, StreamDiffusion, and other API components for TouchDesigner by DotSimulate - available through his Patreon tiers with regular updates.
There might be many random apps and tools out there, but even if they're paid, they are likely based on the open source Deforum code and act as simplified cloud versions of the same thing.* Optimally you have decent enough hardware, namely GPU, to run these tools locally. Alternatively, you may be able to try it through remote machines, like in Google Colab, but most free plans and trials are very limiting. Anything that was designed as a notebook for Google Colab can still be run on local hardware though.MORE EXAMPLES:
Profesionally orchestrated production mixing together traditional sets, actors, vfx techniques, and contemporary generative AI tools. The primary painterly aesthetic came from using stable diffusion frame-by-frame through image2image. Clever animation likely made with a fine tuned model or strong reference conditioning. It makes use of optical flow warping a lot, with source for that probably being videos of similar dancers. Deforum animation incorporating advanced optical warp techniques. YOU LOOK LEAN DEAR, LET GRANNY FIX YOU SOMETHIN #stableDifusion #StableDiffusionAI #AIart #aianimation #aiartcommunity #automatic1111 pic.twitter.com/9lsMiBKgJm
— 30XIS PRIMUS DIFFUSIUX (@J0HN9R1M3) August 13, 2023Animation done with SD-CN Animation extension, which employes motion synthesis techniques that provide the turbulent motion. Deforum animation from one of the main current contributors to its code. This one showcases 3D camera movement technique especially well. There's a principle in Yamato-e that I find very interesting: it doesn't shy away from the two-dimensionality of its support. It doesn't even seem to consider it; it simply accepts it. Its artists paint in a way that blends perfectly with the support.pic.twitter.com/GkINYz4e7r
— Fellowship (@fellowshiptrust) March 31, 2024Animation from a solo show of LEGIO_X, who has used neural frames for their work. -
-
-
Generative video
Techniques that rely on using generative video AI models, that were trained on moving videos, or otherwise enhanced with temporal comprehension on a neural network level.
At the moment, a common trait of these models seems to be that they're often limited to clips of very short duration (several seconds), bound by available video memory on the GPU. In cases where this has been worked around, the clips usually lack meaningful change and action over longe periods of time, and are more akin to animated slideshows.
*Since initial publication of this article, a big elephant has entered the room going by the name of Sora. I will discuss and integrate it in this guide only when it's available to the general public.
-
Generative video models
AI-generated video made from only Image and Text prompts using Runway's Gen-2 by Paul Trillo. This refers to using models, that were made and trained from the ground up to work with video footage.
Results will likely have somewhat wobbly, AI-awkward, uncanny results today. The same way most generated AI images had been not so long ago. It's slightly lagging behind, rapidly improving, but my personal take is that the same progress we saw with static images won't convert proportionally to progress on video generation, as it is an exponentially harder problem to crack. Generally, the better generative video clip to looks, the less interesting its action and motion is, because drastic movement is usually where they fall apart into the uncanny.
I suppose the boundary between animation and conventional film is messy here. As long as results don't yet match reality, all of it in a way is weird new genre of animation and video art. For now, I'd encourage to forget about replicating real film, and use this as new form of experimental media. Have fun!
-
Standalone (Text 2 video)
One of the animation tests Kyle Wiggers did for his article using Runway's Gen2 Using text prompts to generate entirely new video clips
In theory this is limitless, with possibilities to go for both live-action look or anything surreal and stylized, as long as you can describe it, just like with static image generation. In practice though, gathering diverse and big enough datasets to train video models is much harder, so niche aesthetics are difficult on these models with only text conditioning.
Multi Motion Brush allows for more expressive and precise generations. With MMB, you can select up to five areas of independent motion and control across 3 directional axis (x, y, z), as well as ambient noise.
— Runway (@runwayml) January 19, 2024
Multi Motion Brush is now available for all users inside Gen-2 at… pic.twitter.com/3ukC3k1mhGRunway's presentation of "Multi motion brush" feature on their video generator tools With this, true creative control is quite weak, but it becomes much more empowering when coupled with image or video conditioning, which you may call as "transformative" workflows. Additionally, there are new forms of motion control and conditioning emerging such as MotionCtrl, or Runway's Multi motion brush.
-
Transformative:
Using text prompts in addition with further conditioning from existing images or videos.
-
Image 2 Video
The album artwork was used as a starting image for each of the generated clips. Author - Stable Reel. Many generative video tools enable you to condition the result on an image. Either starting exactly with the image you specify, or using it as a rough reference for semantic information, composition, and colors.
Often people generate the starting image as well using traditional static image models before supplying it to a video model.
-
Video 2 Video
With some luck and appropriate prompts, you can use an input video to "inspire" the model to reimagine the motion in source video with a completely different look. Done with Zeroscope in webui txt2vid extension, using vid2vid mode. Similarly to Image 2 Image process in generative image models, it is possible to embed input video information into a video model as it is generating (denoising) the output, in addition to the text prompt. I lack the expertise to understand exactly what's happening, but it appears this process matches the input video clip not only on frame-by-frame level (as stylization with Stable Diffusion would), but also on a holistic and movement level. It is controlled with a denoising strength just like image 2 image.
-
PROS CONS - The most open ended set of techniques, that will only improve with time.
- No barrier of entry in terms of professional animation knowledge.
- Compared to frame-by-frame techniques, way smoother and usually more coherent as well.
- Potentially a more straightforward way for transformative workflows than with frame-by-frame approaches.
- Often awkward and uncanny looking, more so than static images. Mostly apparent on realistic footage involving people.
- Computationally expensive. Less accessible to run on your own hardware than image AI.
- Limited by short duration and context (for now).
TOOLS FREE PAID (with trials) - Stable Video (SVD) - open source video diffusion model from StabilityAI. Rapidly being implemented in various host applications and tools:
- MotionCtrl - Enhancement allowing object motion and camera trajectory control in various video models.
- CameraCtrl - Enhancement focusing on camera trajectory control in various video models.
- Emu video - a preview demo of Meta's generative video model.
- Luma Dream machine - limited early acess to Luma's advanced video model Text 2 Video extension for A1111 webui to be used with one of these models: (if you have sufficient hardware)*
Plugins and addons:- Pallaidium for Blender - a multi-functional toolkit crammed with generative functionality across image, video and even audio domains.
Additionally, you may find some free demos on Hugging face spaces.- Runway's Gen2
- Kaiber's "Motion" mode.
- Pika labs
* Optimally you have decent enough hardware, namely GPU, to run these tools locally. Alternatively, you may be able to try running these models through remote machines, like in Google Colab, but most free plans and trials are very limiting.MORE EXAMPLES:
Every bit of this animation is generated from @runwayml #gen2 using input images and text but the results are always a bit of a surprise. Still believe traditional animation still has a bright future. It will only be amplified by being able to work smarter#ai #aiart #animation pic.twitter.com/t0Dq6K8GCt
— Paul Trillo (@paultrillo) April 13, 2023Smoke and mirrored reflection of reality. A collaboration with @paultrillo and @hokutokonishi
— Nathan Shipley (@CitizenPlain) August 3, 2023
We combined simple iPhone shots of choreography from Hok and then used @Runwayml 's #Gen1 to stylize the results!
Original footage and process in thread ๐งต⬇️ pic.twitter.com/woQm55ebaIShort film, made with generative video Modelscope model. -
-
Image models enhanced with motion comprehension
Animation done using AnimateDiff in ComfyUI, by animating between several different prompt subjects. With growing popularity of AnimateDiff, this is an emerging field of enhancing established image diffusion models with video or "motion" comprehension. The results are more similar to native video models (shown above), than what you would get with frame-by-frame techniques. The catch is that you can also utilize everything that has been built for these image models such as Stable Diffusion, including any community created checkpoint, LoRA, ControlNet, or other kinds of conditioning.
The motion itself in this technique is often quite primitive, only loosely interpolating objects and flow throughout the clip, often morphing things into other things. It does that with way more temporal consistency though (less flicker), and it is still in its infancy. Best results are with abstract, less concrete subjects and scenes.
I developed a workflow that allows you to render ANY 3D scene in ANY style with AI!
— Mickmumpitz (@mickmumpitz) March 18, 2024
You can create different prompts for all the elements in your scene allowing for full flexibility.
Here is how it works ๐ pic.twitter.com/S4B2zDFwLcAn example of a workflow that has been optimized specifically for 3D render treatment in any style with precise control. However it still struggles to maintain coherency on complex shapes moving and overlapping each other The community is actively experimenting with this tech (see "MORE EXAMPLES"). The techniques draw both from static image models (such as prompt travel), and from video native models and their advancments. In some cases, people are trying to squeeze from it smoother video or 3D render stylization results when compared to image model frame-by-frame techniues.
PROS CONS - Benefits from all development that has been done on existing image diffusion models.
- Can be conditioned with video or complex data such as 3D render passes.
- Very good with abstract, flowing motion.
- Does not work well to produce complex, coherent motion of characters or unusual objects, often leading to morphing instead.
- Computationally expensive just like video native models. Less accessible to run on your own hardware than image AI.
- Limited by somewhat short context window (for now), although there are always workarounds that people experiment with.
TOOLS FREE PAID Currently, implementations of AnimateDiff (for SD v1.5) are leading the charge here:- A1111 webui extension for AnimateDiff.
- AnimateDiff implementation in ComfyUI and a plethora of community made workflows around it.
- SparseCtrl - method to condition a video model with sparse set of keyfrane data, similarly to ControlNets, but in the context of video. Supported by AnimateDiff v3.
- VisionCrafter - a GUI tool for AnimatedDiff implementation and other projects.
for SD XL:Multi-functional implementations:MORE EXAMPLES:
Just some of my work
byu/StrubenFairleyBoast inStableDiffusionStormy ๐งช#animatediff #audioreactive pic.twitter.com/zGSJZ2EsDY
— Lyell (@dotsimulate) October 12, 2023While we’re all waiting for access to Sora…
— Karen X. Cheng (@karenxcheng) February 21, 2024
Here’s our test using open source tools. You can get a decent level of creative control with AnimateDiff
Collab with @CitizenPlain
Music @Artlist_io pic.twitter.com/jNoWzzZDK7AI VFX experiment: AnimateDiff + ControlNet
— Nathan Shipley (@CitizenPlain) October 6, 2023
This combines a simple logo animation (left) and #AnimateDiff with the QR Monster controlnet into a loop. Instant water simulation!#aianimation #stablediffusion #VFX #mograph pic.twitter.com/plolBc09HkSome days ago I achieved this type of realism with ai. Everything done in #ComfyUI + @krea_ai realtime and scaler for some previous magic trick. You can check more creative dev at my instagram Boldtron. ๐ธ๐ช pic.twitter.com/Bd6HMywbca
— Boldtron (@edbyus) April 4, 2024Crazy video test with ComfyUI and QR monster workflow #ai #aivideo pic.twitter.com/5NhzzLHflk
— Antti Karppinen ๐จ postphotography.xyz (@antti_karppinen) December 1, 2023
-
-
Animated faces with speech synthesis
The author demonflyingfox had created a step-by-step tutorial before even releasing the viral Belenciaga videos. I know it, You know it. It's the technique behind a viral meme. Whenever you see a relatively still character (could be moving camera too), with an animated talking face, it likely relates to particular methodology using AI face animation and synthetic speech tools.
It's a combination of several steps and components. The source images are often made with generative image AI, but you may also use any image with a face. The speech gets generated from text, conditioned on a chosen character voice. Then a different tool (or a model within a packaged tool) synthesizes facial animations with appropriate lip sync from the voice, usually only generating motion in the face and head area of the image. Using pre-trained avatars allows for movement on the body as well.
PROS CONS - Easy memes.
- Mass-producible talking avatars for games, installations etc.
- Less advanced and older tools produce kinda somewhat uncanny results.
- For the most part, reliant on closed-source facial animation tools in paid apps.
- Results are usually stiff and not too dynamic, even when training it with your own footage for an avatar.
TOOLS FREE PAID (with trials) -
"Wav2Lip" - a A1111w webui extension that generates "lip-sync"
animation. Seems to be limited to mouth area.
- SadTalker - another talking head generator based on audio. Available for use through A1111 webui and Discord.
- ElevenLabs - constrained usage, but limits seem to refresh monthly.
Or search online "Text 2 Speech", there are too many to count, but likely inferior to ElevenLabs.- D-ID
- Heygen
- Synesthesia
Face animation (and usually speech synthesis bundled together):
Search for "D-ID" alternatives". -
generative 3D character motion
Trailer for Nikita's genius meta AI film, that exposes the AI motion learning process and channels it into a ridiculously entertaining short. This refers to motion synthesis in the context of 3D characters. It can apply to 3d animated film, video games, or other 3D interactive applications. Just like with images and video, these emerging AI tools allow you to prompt character motion through text. Additionally some also build it from very limited amount of key-poses, or produce animations dynamically on-the-fly in interactive settings.
Because this list focuses on generative tools, I am leaving out some AI applications that automate certain non creative tasks, like AI powered motion tracking, compositing, masking etc, as seen in Move.ai or Wonder Dynamics.
PROS CONS - Fits inside established 3D animation workflow, reducing tedious tasks, potentially working as a utility for skilled animators.
- Handles physics and weight really well.
- Future of dynamic character animation in video games?๐๐ฎ
- Usually limited to humanoid bipedal characters.
- Not self sufficient. Only one component of 3D animation workflow. You need to know where to take it next.
- Training is usually done on human motion capture data, which means these techniques so far only deal with realistic physics based motion, nothing stylized and cartoony.
TOOLS FREE (or limited plans) PAID - Mootion
- Omni Animation
- Cascadeur - animation assistant that creates smooth, physics based animation and poses from minimal input. Highly controllable and looks like a major player in the future.
- ComfyUI MotionDiff - implementation of MDM, MotionDiffuse and ReMoDiffuse into ComfyUI.
Paid plans of free tools that provide more features and expanded limits.MORE EXAMPLES:
-
LLM powered tools
In theory, with LLMs (Large Language Models) showing great performance in coding tasks, especially when fine tuned, you could tell it to program and write scripts inside animation-capable software. This means the animation would follow the usual workflow, but AI assists you throughout. In an extreme case, AI does everything for you, while it assigns appropriate tasks in an back-end pipeline.
In practice, you can kind of already try it! Blender for example is equipped with very extensive python API that allows to operate it through code, so there are couple chatGPT-like assistant tools available already. This is an unavoidable trend. Everywhere there is code, LLMs will likely show some practical use cases.
PROS CONS - The promise - deconstruction of any technical barrier for creatives.
- Useful as a copilot or assistant in creative software, eliminating tedious, repetitive tasks, digging through documentation for you.
- If AI will create everything for you, then what's the point of being creative in the first place?
- For now, running LLMs is only possible on powerful remote machines, thus being paid per tokens/subscriptions.
TOOLS FREE PAID - Blender Chat Companion - (similar to Blender Copilot) a ChatGPT implementation inside Blender, specialized to handle appropriate tasks. Uses ChatGPT API tokens which are paid.
- Genmo chat - promises a step towards "Creative General Intelligence ", with multi-step process all controlled through chat interface.
- Blender Copilot - (similar to Blender Chat Companion) a ChatGPT implementation inside Blender, specialized to handle appropriate tasks. Uses ChatGPT API tokens which are paid.
There's also upcoming ChatUSD - a chatbot to work with and manage USDs, which is a standard initially created by Pixar to unify and simplify 3D data exchange and parallelization on animated film production. Can't tell you much more here, but Nvidia seems to be embracing it as a standard for anything 3D, not just film.
Whev! That was ALOT, but I likely still missed something. Please comment below to suggest entries and tweaks to improve this and keep it up-to-date. Thank you for reading!