AI Art: Bad With The good

Posted on: 2022-09-10

At this point, we all knew that AI could make some good-looking art. Large media have covered it, artists have used it in “AI assisted” workflows. The release of Stable Diffusion(SD), which is free and open source, made AI-generated art more accessible than ever. In fact, you can download the required software right now and take it out for a spin on your own computer (6 GiB VRAM required).

Stable Diffusion and other similar models could create images from text, iterate on existing images, and extend images beyond their borders. Now, a question may emerge: Is AI art really that good? Or, is it on the same level of human artists, or even better?

In this post I’ll be using SD, other choices currently available include OpenAI’s DALL-E 2 and Midjourney.

It’s excellent at…

Style transfer

Prompt: Fantasy landscape, river, elevated land, 4k, 8k, $Coloring_Method

SD is not only capable of creating images that fit the input text, but also apply different styles/coloring methods on an image. It sometimes gets the prompt wrong, but that’s understandable. It’s a program that you can run for as many times as you want, until you get the desired result.

Memorization and feature extraction

Prompt: Donald Trump, 4k, 8k, interview

If I ask you to draw the face of a certain person, would you be able to recreate it without any references? Or to make it easier, how about try to draw your own face without a mirror.

This is one of the things most people can’t do. You can draw a perfect face, but not a specific one. For humans, it’s difficult to memorize scenes that are super detailed. It’s like recreating what you saw in a dream. Everything look convincing to be what it should be, but not detailed enough to be that exact thing.

This is where AI excels. With a dataset of only 4 GiB, SD has managed to fit in more stuff than you could ever imagine. It’s able to recreate almost every celebrity’s face and significant features, as well as other real life objects. It has learned about the difference between photos, drawings and sculptures. It also knows about TV series, cartoons, and even video games. I’m amazed that all this information could be fit onto a thumb drive, a small one in 2022.

Intricate patterns

Fractals contain beautiful and detailed patterns driven by mathematical equations. The pattern and structures scales arbitrarily, that’s infinite zoom without loosing details. Credit: @Mathigon on YouTube

Like fractals, AI art can contain very detailed and complex patterns that humans will have a hard time producing.

Most people have probably seen this image by now. In short, someone used another AI model Midjourney to create this piece, and submitted it to an art contest, in which it won the first place. I’ll use this award-winning “painting” as an Example.

If you show me that image, and tell me that it took someone one whole year to draw, my reactions would be as follows:

Wait, what?
That’s some dedication right there.

If you zoom in, the level of detail is astonishing. But, by doing that, the entire scene quickly falls apart. The village or whatever outside the window, the luxurious decorations on the walls, the items displayed on the table… They are nothing but an illusion, emerged from a bunch of meaningless noise. Even the human figures are just large blobs.

AI has arguably created something unique here, something that even humans can’t do.

Abstract concepts

I love art that leaves you with the space to imagine, to freely wander in the world it depicts. And AI shines by being able to turn abstract concepts without any thinking.

**watercolor, sky, stars, night, wallpaper, fantasy**

Without any concrete instructions, AI can still create stunning images. The images have no meanings on their own, but our minds will fill them in.

It’s particularly bad at..

Creative design

It cannot draw stuff that has zero to none references on the internet(which is where the training data came from).

I have no idea what it’s trying to do here.

Stereotypes

No, there’s nothing offensive here, let’s look at some anime.

Prompt: $anything, anime, key visual big heads

Notice anything common in these two images? They both have the big head centered in the background. This is based on luck, but with some prompts, I was able to get the heads to appear in almost all results. It’s understandable if you think about how many key visuals out there that use this exact composition. Personally, I would say it’s overused and boring, but the AI doesn’t know that.

Image transformation

Although the AI can do image to image transformations, this doesn’t mean that it would do a good job following the reference material. In fact, it’s quite opposite.

The first image it spits out would look very different from the original image. But the second image is very similar to the first generated image, unless your prompt has changed significantly.

This might be a result of the AI unable to represent the input image with its internal structures. In other words, it could not fully comprehend what’s going on in the image. The generated image already came out from the same pipeline it uses to re-serialize the image, so traits can be largely preserved. That’s just my guess, though.

Controlled output

Let’s just put it as this: the AI has a mind of its own.

And it’s a beast that’s hard to tame.

I do agree that text-based prompts don’t contain enough information, but even with that, it’s still not going to produce art that align with your vision. The AI has no ways of telling whether an image is visually appealing or not, nor can it deduce whether it’s copying someone else’s work or creating something original.

The creative process still lies in the hands of the user. When you select the best image from a bunch of half-baked results, what you’re ultimately trying to do, is nudging the AI onto the right track. Most times it never would, and you will end up with trashed works. And other times it would produce something useable, albeit not being the exact thing you were expecting. That being said, if it found another way to represent your idea, you can call it job done.

Creativity and aesthetics are something the machine can’t learn. And I do hope it remains that way. That’s what made us human, after all.

Examples

I spent way much time generating anime images.

Anime girl supposedly with animal ears in the style of Arknights under sunlight, white particles floating, overly thin waist compared to rest of the body Overall nice aesthetic, but the waist area looks unnatural. Fixing with masking made it even more inconsistent.

Handheld energy weapons Does it look like a gun? Yes. Does it make sense? Not at all.

Male anime character wearing japanese samurai clothing, katana hanging on belt, with something blue-ish looking like projectiles of those fire/magic shooting swords in the foreground, unreadable writing on the top right corner What was originally an artifact got transformed into something that looks cool.

Same image but the magic projectile got incorpeted into the clothing, becoming a really long cape, colored with pleasing blue and purple gradient And further iterations even incorperated that smudge into the clothing. Truly amazing.

Male anime character with dark fluffy hair facing sideways, wearing futuristic cloat like clothes, no generation artifacts at all(holy crap) I would have no problem using this as a profile picture. Probably the best one.

Dark green trees on a hill with crimson trees in the background. The sky is filled with polka dot stars. Tried recreating a scene from a novel. Looks interesting.

Afterword

In its own ways, AI-generated art has exceeded my expectations. I don’t plan to use it for making any artworks, but they can be decent reference materials. Being able to generate images on demand just saves so much time. Maybe it’s time to ditch Pinterest and go full stick figure + AI when doing your next art commission?

I shall try that next time.