Lights, Camera, Action Figures

The Internet has been overflowing with people’s ChatGPT-generated action figures of themselves lately. I didn’t have any strong inclination to make one of myself, but the viral moment for this particular AI capability gave me ideas for something else.

Use case

I’m always in need of images to accompany my Chinese Lore Podcast episodes. It can be hit or miss trying to find relevant, high-quality, and copyright-compliant images for the subject matter. I’ve been experimenting with using ChatGPT to help fill this void. As I previously chronicled, however, using generative AI to create the right image can be a bit of a struggle.

I tried this with the first 10 episodes of my Journey to the West podcast series. The ChatGPT-generated images summarizing each episode often made significant mistakes in interpreting and visualizing characters and scenes. For instance, on a couple occasions, it forgot that Sun Wukong, one of the main characters, was a monkey and illustrated him as a human-like figure. In short, if my goal was to get images that represented the story with reasonable accuracy, ChatGPT illustrations weren’t the way to go.

Upon discovering ChatGPT’s ability to make realistic images of action figures, I did a lot of experimentation over several days, trying to generate realistic renderings of characters from Journey to the West as a way to assess the feasibility of this approach for creating usable art for my podcast. Here’s a look at my process, the outcomes, and some takeaways.

Process

A typical starter prompt from my experiments went something like this:

Using the attached image, create a realistic action figure of [character name]. Give him [a list of accessories]. Put him against a plain background. In the back, add the title [character name].

For reference images, I generally used screenshots from the 1986 Chinese TV series based on the novel. For billions of Chinese and other Asians of my generation, this was THE canonical TV adaptation of Journey to the West, and its depictions of the characters have indelibly shaped the mainstream conception of what they should look like and how they should act (essentially, think Colin Firth as Mr. Darcy in Pride and Prejudice).

After some experiments with creating standalone figures, I took the experiment up another level. I pasted in scripts of my podcast episodes and asked ChatGPT to create an image of a scene from the episode, using realistic action figures.

Takeaways

Overall, this worked surprisingly well. I created about a dozen characters. The initial output for each character typically bore a fairly strong resemblance to the reference image, though usually there was something not quite right, requiring multiple follow-up prompts to refine.

A 4 by 3 grid collage of 12 images of Journey to the West action figures created with ChatGPT. — Some of the images of Journey to the West action figures that I created with ChatGPT

The initial creations tended to have weird proportions between the upper and lower torsos. The figures all looked a bit stocky, with short legs and arms that extended down a bit too much. They often looked like the dwarf Gimli from Lord of the Rings (or in some cases, Ewoks). A “Make it taller” follow-up prompt typically fixed this.

A row of 4 images of action figures generated by ChatGPT that had odd proportions. They generally look too short and stocky, with their arms hanging down too far. — Some of the figures came out with odd proportions.

Interestingly, there were certain specific things that ChatGPT just couldn’t seem to grasp. It would do a terrific job creating a face that looked like the reference image, but then have all sorts of trouble making an accessory that seemed fairly straightforward. An example was the spade for the character Sandy, which had a flat shovel on one end and a crescent on the other. ChatGPT rendered the shovel end pretty well, but could never understand what to do with the crescent end, despite my providing numerous reference images. In the end, I had to take a “close enough” output and Photoshop it.

An image of what Sandy's spade should look like — Sandy’s spade

A row of 4 images of the character Sandy from Journey to the West, generated by ChatGPT. All the images are hilariously off-base in their rendering of Sandy's spade. — Sandy’s spade on ChatGPT

At the same time, ChatGPT did seem smarter about creating accessories for “a realistic action figure” than for a more artistic illustration. For instance, when I asked it to create illustrations, it never figured out how to properly illustrate Sun Wukong’s Golden-Band Cudgel (basically just a straight golden rod). But with the action figures, ChatGPT got it right on the first try and on all subsequent attempts.

Left: ChatGPT got action figure Sun Wukong’s Golden-Band Cudgel correct on the first try. Right: One of ChatGPT’s many failed illustration attempts to create the right cudgel (it never could figure that out).

When asked to create scenes from episode scripts using realistic action figures, ChatGPT seemed to be more grounded than when it rendered illustrations. Its illustrations tended to have more hallucinations, particularly in extraneous background characters or elements. When creating scenes with realistic action figures, however, it seemed to stick fairly close to the scene it’s depicting. This was in part because it tended to zoom in on the key characters, thus limiting the number of background elements, which probably helped to reduce the chance for hallucinations.

A more "realistic" depiction of a scene between a dragon king and the emperor and a more illustrative scene. — The more “realistic” depiction of a scene between a dragon and an emperor (left) vs. a more illustrative attempt (right). The one on the left is much closer (for starters, there should be no one else in the room).

There were, however, some uncanny valley moments (probably because most of the scenes featured a talking monkey).

“Something’s happening here but you don’t know what it is, do you, Sun Wukong?”

The most annoying part about using ChatGPT to generate images is that you can’t tell it to just change one thing, even if you literally tell it to just change one thing and keep everything else the same. This seems counterintuitive, since one would think it’s easier to make just the one change that was requested instead of tweaking the whole image each time.

Bottom line

Unlike its illustrations, ChatGPT’s action figures were much more on point and much more likely to be usable for my needs. In fact, I’ve started using some of them on a page on my podcast website. So far, this seems like a promising enough option to explore further. As a bonus, along the way, I get a good laugh out of ChatGPT bloopers, like these attempt to make a rendering of a demon from Journey to the West:

A screenshot from the 1986 TV show that I fed into ChatGPT as a reference image for the Silver-Horned Demon King

A row of 4 images of hilariously bad attempts by ChatGPT to render the Silver Horned King demon. — From left: 1) Vikings fan who went overboard with the body paint? 2) An ad for Norse mead? 3) Hugo Weaving? 4) Did makeup run out of silver paint on the set? Did the overzealous Vikings fan use it all?

Note: As a writer, designer, and media creator, I fully understand the anxiety, ambivalence, or outright hostility that many creatives feel toward AI-generated content. However, much like the Internet, mobile, and social media before it, AI is coming at us fast whether we like it or not. That’s why I experiment with the technology so I can better understand its capabilities, limits, and pitfalls. There are many legit questions about the tradeoffs, costs, and threats of AI, but as someone who works in a creative field, I also can’t afford to ignore what AI can and will do while society works through those questions.