{"id":650,"date":"2025-04-22T00:51:34","date_gmt":"2025-04-22T04:51:34","guid":{"rendered":"https:\/\/johnzhu.com\/blog\/?p=650"},"modified":"2025-05-28T05:28:09","modified_gmt":"2025-05-28T05:28:09","slug":"lights-camera-action-figures","status":"publish","type":"post","link":"https:\/\/johnzhu.com\/blog\/2025\/04\/22\/lights-camera-action-figures\/","title":{"rendered":"Lights, Camera, Action Figures"},"content":{"rendered":"\n<p>The Internet has been overflowing with people\u2019s ChatGPT-generated action figures of themselves lately. I didn\u2019t have any strong inclination to make one of myself, but the viral moment for this particular AI capability gave me ideas for something else.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Use case<\/h2>\n\n\n\n<p>I\u2019m always in need of images to accompany my <a href=\"https:\/\/chineselore.com\" target=\"_blank\" rel=\"noreferrer noopener\">Chinese Lore Podcast<\/a> episodes. It can be hit or miss trying to find relevant, high-quality, and copyright-compliant images for the subject matter. I\u2019ve been experimenting with using ChatGPT to help fill this void. <a href=\"https:\/\/johnzhu.com\/blog\/2025\/01\/19\/monkeying-around-with-ai-generated-podcast-images\/\" data-type=\"post\" data-id=\"600\" target=\"_blank\" rel=\"noreferrer noopener\">As I previously chronicled<\/a>, however, using generative AI to create the right image can be a bit of a struggle.&nbsp;<\/p>\n\n\n\n<p>I tried this with the first 10 episodes of my <a href=\"https:\/\/chineselore.com\/series\/series-journey-to-the-west\/\" target=\"_blank\" rel=\"noreferrer noopener\">Journey to the West podcast series<\/a>. The ChatGPT-generated images summarizing each episode often made significant mistakes in interpreting and visualizing characters and scenes. For instance, on a couple occasions, it forgot that Sun Wukong, one of the main characters, was a monkey and illustrated him as a human-like figure. In short, if my goal was to get images that represented the story with reasonable accuracy, ChatGPT illustrations weren\u2019t the way to go.<\/p>\n\n\n\n<p>Upon discovering ChatGPT\u2019s ability to make realistic images of action figures, I did a lot of experimentation over several days, trying to generate realistic renderings of characters from <em>Journey to the West<\/em> as a way to assess the feasibility of this approach for creating usable art for my podcast. Here\u2019s a look at my process, the outcomes, and some takeaways.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Process<\/h2>\n\n\n\n<p>A typical starter prompt from my experiments went something like this:<\/p>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>Using the attached image, create a realistic action figure of [character name]. Give him [a list of accessories]. Put him against a plain background. In the back, add the title [character name].<\/p><\/blockquote><\/figure>\n\n\n\n<p>For reference images, I generally used screenshots from the <a href=\"https:\/\/www.youtube.com\/playlist?list=PLIj4BzSwQ-_sfc7l2xm1wQswAd5jqrrDS\" target=\"_blank\" rel=\"noreferrer noopener\">1986 Chinese TV series<\/a> based on the novel. For billions of Chinese and other Asians of my generation, this was THE canonical TV adaptation of <em>Journey to the West<\/em>, and its depictions of the characters have indelibly shaped the mainstream conception of what they should look like and how they should act (essentially, think Colin Firth as Mr. Darcy in <em>Pride and Prejudice<\/em>).<\/p>\n\n\n\n<p>After some experiments with creating standalone figures, I took the experiment up another level. I pasted in scripts of my podcast episodes and asked ChatGPT to create an image of a scene from the episode, using realistic action figures.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Overall, this worked surprisingly well. I created about a dozen characters. The initial output for each character typically bore a fairly strong resemblance to the reference image, though usually there was something not quite right, requiring multiple follow-up prompts to refine.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures2-scaled.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"844\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures2.jpg?resize=750%2C844&#038;ssl=1\" alt=\"A 4 by 3 grid collage of 12 images of Journey to the West action figures created with ChatGPT.\" class=\"wp-image-653\"\/><\/a><figcaption class=\"wp-element-caption\">Some of the images of Journey to the West action figures that I created with ChatGPT<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The initial creations tended to have weird proportions between the upper and lower torsos. The figures all looked a bit stocky, with short legs and arms that extended down a bit too much. They often looked like the dwarf Gimli from Lord of the Rings (or in some cases, Ewoks). A \u201cMake it taller\u201d follow-up prompt typically fixed this.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures3_wrong-proportions-scaled.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"281\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures3_wrong-proportions.jpg?resize=750%2C281&#038;ssl=1\" alt=\"A row of 4 images of action figures generated by ChatGPT that had odd proportions. They generally look too short and stocky, with their arms hanging down too far.\" class=\"wp-image-654\"\/><\/a><figcaption class=\"wp-element-caption\">Some of the figures came out with odd proportions.<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Interestingly, there were certain specific things that ChatGPT just couldn&#8217;t seem to grasp. It would do a terrific job creating a face that looked like the reference image, but then have all sorts of trouble making an accessory that seemed fairly straightforward. An example was the spade for the character Sandy, which had a flat shovel on one end and a crescent on the other. ChatGPT rendered the shovel end pretty well, but could never understand what to do with the crescent end, despite my providing numerous reference images. In the end, I had to take a \u201cclose enough\u201d output and Photoshop it.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/spade.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"462\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/spade.jpg?resize=750%2C462&#038;ssl=1\" alt=\"An image of what Sandy's spade should look like\" class=\"wp-image-658\"\/><\/a><figcaption class=\"wp-element-caption\">Sandy&#8217;s spade<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures3_sandy-spade-scaled.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"281\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures3_sandy-spade-1024x384.jpg?resize=750%2C281\" alt=\"A row of 4 images of the character Sandy from Journey to the West, generated by ChatGPT. All the images are hilariously off-base in their rendering of Sandy's spade.\" class=\"wp-image-657\"\/><\/a><figcaption class=\"wp-element-caption\">Sandy&#8217;s spade on ChatGPT<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>At the same time, ChatGPT did seem smarter about creating accessories for \u201ca realistic action figure\u201d than for a more artistic illustration. For instance, when I asked it to create illustrations, it never figured out how to properly illustrate Sun Wukong\u2019s Golden-Band Cudgel (basically just a straight golden rod). But with the action figures, ChatGPT got it right on the first try and on all subsequent attempts.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures4_wukongs-cudgel.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"563\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures4_wukongs-cudgel.jpg?resize=750%2C563&#038;ssl=1\" alt=\"\" class=\"wp-image-659\"\/><\/a><figcaption class=\"wp-element-caption\">Left: ChatGPT got action figure Sun Wukong&#8217;s Golden-Band Cudgel correct on the first try. Right: One of ChatGPT&#8217;s many failed illustration attempts to create the right cudgel (it never could figure that out).<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>When asked to create scenes from episode scripts using realistic action figures, ChatGPT seemed to be more grounded than when it rendered illustrations. Its illustrations tended to have more hallucinations, particularly in extraneous background characters or elements. When creating scenes with realistic action figures, however, it seemed to stick fairly close to the scene it\u2019s depicting. This was in part because it tended to zoom in on the key characters, thus limiting the number of background elements, which probably helped to reduce the chance for hallucinations. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures5_dragon-king-scene.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"375\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures5_dragon-king-scene.jpg?resize=750%2C375&#038;ssl=1\" alt=\"A more &quot;realistic&quot; depiction of a scene between a dragon king and the emperor and a more illustrative scene.\" class=\"wp-image-661\"\/><\/a><figcaption class=\"wp-element-caption\">The more &#8220;realistic&#8221; depiction of a scene between a dragon and an emperor (left) vs. a more illustrative attempt (right). The one on the left is much closer (for starters, there should be no one else in the room).<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>There were, however, some uncanny valley moments (probably because most of the scenes featured a talking monkey).<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/ep013_chatgpt.png?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"500\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/ep013_chatgpt-1024x683.png?resize=750%2C500\" alt=\"\" class=\"wp-image-666\"\/><\/a><figcaption class=\"wp-element-caption\">&#8220;Something&#8217;s happening here but you don&#8217;t know what it is, do you, Sun Wukong?&#8221;<\/figcaption><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The most annoying part about using ChatGPT to generate images is that you can\u2019t tell it to just change one thing, even if you literally tell it to just change one thing and keep everything else the same. This seems counterintuitive, since one would think it\u2019s easier to make just the one change that was requested instead of tweaking the whole image each time.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Bottom line<\/h2>\n\n\n\n<p>Unlike its illustrations, ChatGPT\u2019s action figures were much more on point and much more likely to be usable for my needs. In fact, I\u2019ve started using some of them on <a href=\"https:\/\/chineselore.com\/series\/series-journey-to-the-west\/major-characters-journey-to-the-west\/\" target=\"_blank\" rel=\"noreferrer noopener\">a page on my podcast website<\/a>. So far, this seems like a promising enough option to explore further. As a bonus, along the way, I get a good laugh out of ChatGPT bloopers, like these attempt to make a rendering of a demon from <em>Journey to the West<\/em>:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/silver-horned-king-reference.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/silver-horned-king-reference.jpg?w=750&#038;ssl=1\" alt=\"\" class=\"wp-image-663\"\/><\/a><figcaption class=\"wp-element-caption\">A screenshot from the 1986 TV show that I fed into ChatGPT as a reference image for the Silver-Horned Demon King<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures6-bloopers-scaled.jpg?ssl=1\" target=\"_blank\" rel=\" noreferrer noopener\"><img data-recalc-dims=\"1\" height=\"281\" width=\"750\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures6-bloopers.jpg?resize=750%2C281&#038;ssl=1\" alt=\"A row of 4 images of hilariously bad attempts by ChatGPT to render the Silver Horned King demon.\" class=\"wp-image-662\"\/><\/a><figcaption class=\"wp-element-caption\">From left: 1) Vikings fan who went overboard with the body paint? 2) An ad for Norse mead? 3) Hugo Weaving? 4) Did makeup run out of silver paint on the set? Did the overzealous Vikings fan use it all?<\/figcaption><\/figure>\n\n\n\n<p><em>Note: As a writer, designer, and media creator, I fully understand the anxiety, ambivalence, or outright hostility that many creatives feel toward AI-generated content. However, much like the Internet, mobile, and social media before it, AI is coming at us fast whether we like it or not. That\u2019s why I experiment with the technology so I can better understand its capabilities, limits, and pitfalls. There are many legit questions about the tradeoffs, costs, and threats of AI, but as someone who works in a creative field, I also can\u2019t afford to ignore what AI can and will do while society works through those questions.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The good, the bad, and the ugly from my experiments with using ChatGPT to generate realistic images of characters from Journey to the West.<\/p>\n","protected":false},"author":1,"featured_media":763,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,8,11,14],"tags":[],"class_list":["post-650","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-design","category-podcasting","category-tools-and-tips","entry"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/johnzhu.com\/blog\/wp-content\/uploads\/2025\/04\/chatgpt-action-figures.jpg?fit=2500%2C938&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/posts\/650","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/comments?post=650"}],"version-history":[{"count":1,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/posts\/650\/revisions"}],"predecessor-version":[{"id":889,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/posts\/650\/revisions\/889"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/media\/763"}],"wp:attachment":[{"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/media?parent=650"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/categories?post=650"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/johnzhu.com\/blog\/wp-json\/wp\/v2\/tags?post=650"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}