table of contents
This is part 1 of introductory posts, that explain the context and personal reasons
for starting "Diffusion Pilot", focusing on theory.
To plunge deeper, learn about the tools and techniques, see level 2 and 3 posts.
Hi! Online I go by the pseudonym of aulerius. My practice spans many styles and techniques of visual art, but most relevant at the time of writing this, is that I'm on my way to obtaining my MA degree in Animation, at EKA (Estonia). It's a fine arts degree that is aimed at non commercial auteur film making.
The three intro posts are essays where I present the personal reasons and discoveries that lead to "Diffusion Pilot" as a blog and as a self-developed toolkit/workflow. I also address some common questions and comments I get whenever I pitch my film and MA thesis.
Master's thesis through a blog
This animation MA degree I'm in offers almost limitless freedom for creative
expression, which can sometimes incubate artists, and especially animators,
that have more ideas and scope than they have the discipline to execute it on
time :)))
So then, in addition to presenting compelling ideas and findings, I feel that the art Master's thesis paper is student's chance to justify and ground their lengthy creative process. By writing mine, I aim to connect the dots between theory and practice, and document the technical research and development of my own animation techniques.
However, I realized that my insufferable thirst for digging into technical rabbit holes could contribute findings and techniques that might make a nice contribution to the larger animation and digital arts community. Thus, I decided to turn it into a blog, for it to be more accessible, open ended, and broken down into smaller bite-sized chunks, which also helps me stay motivated. After the intro, posts will represent development diary for "Diffusion Pilot" as a tool, various making-of material, and knowledge that I compile and share along the way.
The short animated film which fueled the birth of my MA thesis and this blog could be described as using
experimental mix of 3D, digital animation and generative image AI.
Keyword is AI
If that keyword triggers thoughts in you such as "stealing from artists", "replacement of jobs", or "it's rotten to the core!" , then strap in, as we're only catching up to all the issues and problematic implications of this "revolution". However, here for the most part I'm taking a naive detour from this enormous ethical and philosophical discourse, instead seeking after interesting animation tools and new forms of human artistic expression in this reckless technology race, enabled by the open source nature of Stable Diffusion in particular.
Artificially intelligent
I don't enjoy the term "AI" too much since it's overused, so consider it more
of a stepping stone in my writing. At the moment,
It's a buzzword
that can mean many things, but more often than not, these "AIs" are still very
narrowly focused and specialized prediction machines, sculpted by big data,
occasionally surprising us with emergent behavior that tempts people to call
it intelligent. The term is simply too volatile, like a catalyst for
all sorts of wild scenarios, both speculative and realistic. Every time I would mention my chosen
animation technique has something to do with AI, people would jump straight to
the most profound, holistic human
vs AI dilemmas, because all artists do in 21st century is apparently
"challenge" the shit out of everything and anything.
I am much more in favor of juggling the names of the underlying field called "machine learning" (ML), and its sub-field called "deep learning" (DL), which deals with techniques to "teach" a "neural network" through examples, just like correct examples can teach humans (their fleshy neural network brains). Today, DL enables most AI-like applications, meaning literally machines (networks of digital neurons) that form their behavior through training and learning, as opposed to manually programming the behavior step-by-step.
What really tickles me are the implications and aesthetics that come from how the field of DL takes inspiration from real neurons in organic brains, and how that can be employed in arts.
I particularly enjoy when DL merges with art forms of painting or photography, for example when artists train neural network models on their own work, or when "post-photography" takes a new meaning of shooting pictures through a lens of generative image model, also sometimes called "Synthography".
...and before you ask, no this blog is not written by chatGPT, and the topics of LLMs (Large Language Models) or AGI ("Artificial general intelligence") for the most part are walked past in this project, but check part 2 for some examples of that.
Actual keywords:
generative neural aesthetics and ambiguity of perception
what??
-
"generative" stands for a long living practice of utilizing computer and machines autonomously (or semi-autonomously) in art practice. In other words, "generating" art. In this context, it means using deep-learning-enabled generative image models in making of my graduation animated film.
-
"neural" stands for neural networks, which corresponds to both the organic networks of neurons in our brains, and the neural networks used in machine learning to train programs, which is the core principle in DL. I personally find this connection to be very symbolic and conceptually meaningful. Computer scientists are basically using behaviour inspired by real brains (although simplified and more structured) to teach computers things we can all do so effortlessly, such as telling apart cat from a dog.
-
"aesthetics" means I'm mainly concerned with resulting aesthetics of using generative AI - tools built through DL, rather than related philosophical concerns: human vs AI, or AI as a independent creative agent that can create new concepts. In other words, I'm exploring DL-enabled image generation like a sculptor would explore different materials and their qualities, such as stone vs wood. Or as an entirely new medium, such as the advent of photography in relation to photorealistic painting of that time.
-
"ambiguity" refers to the common trait of said generative image AI, where generated objects, shapes and concepts behave kind of soup like, morphing between different representations when interpolated, or looking uncanny, almost right, but not quite right, etc. A loss in translation between actual things and digested outputs of neural networks. On the human side, this refers to things like poetry, illusions and psychedelic aesthetics.
-
"perception" refers to the science of human perception as a whole, especially intriguing to me when concerned with perception of memories and dreams.
Putting it all together, first, my initial burst of interest is reflected in the second phrase. "Ambiguity of perception" is about how we perceive dreams, remember memories, or misinterpret real world, namely in terms of visuals. This is one of the core driving forces that propels my research. I was thinking:
When we dream, why some things can be recalled so vividly, while others not? Why is it so hard to tell where one scene ends and another begins? Or how some elements we "feel" or "know" in a dream despite having evidence. How perception bias phenomena, and disorders emerge? Such as Pareidolia or Alice in Wonderland syndrome. And what is the shape of memories if we had to "print" them, look into them directly?
...or simply put, what does our imagination look like?
I wanted to work on these questions through my animated film. I realized that digging into all this image AI shenanigans could be exactly the way to explore these questions aesthetically, hence the "generative neural aesthetics". In fact, there's little that's original with this terminology, as you can find "neural aesthetics" linking to an entire network of work and mentions.
From the side of sciences, there are research papers experimenting with reading and decoding brain signals into imagery through generative image neural networks such as Stable Diffusion. Coincidence?! I think not!
So, I argue there are many parallels between those questions derived from "ambiguity of perception", and how DL enables computers to learn to create or classify visuals. The faults, artifacts and glitches of the early versions of these tools expose how organic and "neural" they are, possibly being a reflection of our own artifacts of biological, neural minds, struggling to draw hands perfectly just like I was struggling to get a good grasp on drawing hands myself.
Speaking of hands...
The film for which I initially started "Diffusion Pilot" is about hands. About the right hand trying to reunite with the left hand. Unlike technical aspects of my work, I'll stay a bit more secretive about those, but for what is relevant to this thesis and blog, its concept should go... um "hand in hand" with the research questions I'm asking and technique I'm going for. It follows a surreal dream-like narrative, shown in one continuous scene (with no cuts), from a first person point of view. You see the film through somebodies eyes, as if you would be observing somebodies dream as they're dreaming.
Visual media seen from the first person perspective is easy to associate with video games in the contemporary age, but I think there is something universally profound and potentially extremely dream-like about storytelling through the gaze of eyes.
Moreover, besides the aforementioned theory reasons, I figured that generative image aesthetics also lean heavily into something I'd call...
"Digitally organic" - "Organically digital"
Studying in EKA, I got heavily inspired by my peers employing expressive techniques, full of hand made analog warmth, texture, and organic flow, like the one linked at the beginning of this post. While traditional CGI is not that, I believe that generative AI is an exceptional avenue to explore computer graphics that lead into such characteristics. In my conceived workflow, it may start as rigid 3D, but by the end of the process should feel nothing like the stiff, cold, geometric, or the contemporary photo-realistic 3D.
Recently, stylization and overall "painterly-ness" of digital 3D computer animation is receiving increasing popularity on the big commercial screens. One of the most prominent examples being Alberto Mielgo with his art direction on Spider-Man: Into the Spider-Verse and later "The Witness"
Something that either accidentally or on purpose I always want to do in my projects is to brake the repetitive and very successful “look” and pipeline that all the big Giants Corps in animation had been smashing in our faces for the last decade, up to a point that is difficult to differentiate who did what.
- Alberto Mielgo
Finally, despite this artistic woowoo and AI spaghetti that I've been pilling up throughout this text, It is important to note that fundamentally, I was lead here by ideas for a film that in many aspects still follows the usual conventions. It's not some revolutionary AI co-authored art movement, or opaque and heavy contemporary video art. I came up with an idea, a loose narrative, (weird) body part characters, environment, scenes, beginning and end, things leading from one thing to another, things having symbolic meaning to me and an audience. In the age of AI, I still want to animate and direct film myself. However, the way I will manifest that into images and video should be worth a blog.
I hope after reading this post you're at least partially convinced as I am.
And a blog is born
At the time of writing this, I am actually well into progress with the development and research for my animated film, having recently reached a tipping point and feeling like going downhill after a long uphill push. The fruits of that can be seen in the most recent posts. However, in part 2 of intro, I'll discuss the conflicting process of making a film by yourself in the first place and ramble some more about AI in general. Then, in part 3 I'll recall the "uphill push" and how that eventually brought me to something that I now call "Diffusion Pilot".