Co-Designing with AI

I’ve been experimenting a lot lately with ChatGPT, particularly its image-generation capabilities. After some initial explorations, I wanted to try a large project that puts ChatGPT through similar tasks numerous times to test the consistency of its output. I also wanted to create something tangible at the end of that experimentation. So I decided to use ChatGPT to help me create a poster and an accompanying webpage featuring all 108 bandit chieftains from the classic Chinese novel The Water Margin.

I chose this topic for several reasons. I had previously done a podcast series retelling The Water Margin in English, so the subject matter is familiar enough for me to effectively judge the quality of the output. Second, the novel’s cast of 108 outlaws has fascinated me since I was a child, as they all have their own backstories, skills, and colorful nicknames. Finally, the poster would make a nice product to offer listeners of my podcast who might be interested in purchasing the digital file as a way to support the show.

Generating the Images

The poster I envisioned would feature images of each of the 108 chieftains. There are a couple complete sets of illustrations of those characters on the internet, but I can’t just use those because they’re copyrighted. Also, their resolution is nowhere near high enough for a large printed piece. So I put ChatGPT to work creating a new set of images.

To create the images, I used one of those existing sets of drawings of the characters as reference. I pasted each one into ChatGPT along with this prompt: “Using the attached image, create a realistic action figure of [character name] from the Water Margin. Show the full figure. Leave some white space below the shoes. Make the background transparent.” In some cases, I provided additional reference images to help make the output closer to popular portrayals of certain characters.

Overall, ChatGPT did really well with the output. The majority of the generated images looked fairly close to how one might picture the characters. There were just a few anomalies — the occasional duplicate wine gourd, an extra scabbard poking out of someone’s belt, or the random case of dwarfism (which also popped up in my previous experiments with using ChatGPT to create realistic action figures).

Here are a few examples of the reference images and the ChatGPT output:

Reference image and output for Guan Sheng

Reference image and output for Hu Sanniang

Reference image and output for Jiao Ting

There were, however, a few recurring issues:

ChatGPT tended to put the characters in anachronistic shoes, but that was easily addressed by pasting in a reference image of period-appropriate footwear and telling the tool to update the image accordingly.
ChatGPT often ignored the part of the initial prompt that instructed it to make the background transparent. This was particularly prevalent on subsequent refinement prompts. There were also times when ChatGPT replaced the white background with a checkered pattern like the one you see on images with transparent backgrounds, except the background wasn’t actually transparent; it was just checkered. This got so bad at times that, after multiple attempts to tell ChatGPT to truly remove the background, I had to give up and come back to it later (at which point the tool did what I asked).
LLMs sometimes offer up a lower-probability response by design, and I saw that in action with this task. Every so often, after a few refinement prompts, ChatGPT obviously took a bigger swing and returned responses that completely diverged stylistically from previous iterations.
ChatGPT seems to have Victorian sensibilities about any display of skin. Some of the outlaws’ reference images showed bare upper torsos for men. While ChatGPT might initially generate an image along those lines, on subsequent refinement prompts, it almost always responded that the request violated its content policies. When I asked it to suggest an alternative that did not violate the policies, it invariably put clothes on the character.
ChatGPT can get very stubborn (or dumb) about certain seemingly simple things. The most prominent example was its insistence on cutting off the shoes of certain characters. When ChatGPT gets it into its virtual head to cut off shoes, it will keep doing so despite repeated, specific directions along the lines of “Don’t cut off his shoes” or “Show the full figure.” I eventually seemed to find better success with “Leave some white space below his shoes.”

Designing the Poster

A closeup of some of the images generated with ChatGPT

Once I generated all the images, I created the poster in InDesign. The layout did not involve AI (I had to contribute something to this project, after all), but I did use ChatGPT to clean up the list of the chieftains’ names, nicknames, and rankings in the gang’s hierarchy. It managed to do this successfully, but needed a couple tries.

Initially, I asked ChatGPT to use the Wikipedia page listing the 108 characters as the source and generate a spreadsheet with three columns containing the info mentioned above. However, ChatGPT seemed to hallucinate in a couple significant ways. For certain characters, it seemed to make up nicknames completely unconnected to the characters’ actual nicknames. It also rearranged the order of a major chunk of the list.

So I took a different tack. I pulled a list that I had previously compiled by hand (and thus knew the information was accurate) and simply asked ChatGPT to clean up various formatting issues, such as removing tonal marks on the pinyin names. ChatGPT handled these tasks very well.

Finally, I also used ChatGPT to generate an image of a wine jar with a red label sporting the Chinese characters for Mount Liang (the outlaws’ lair). This was a significant timesaver because it gave me pretty much the exact image I wanted in seconds, rather than me having to spend potentially hours scouring the internet for a high-quality, public-domain image of the right type of wine jar and then Photoshopping it to add the label. After ChatGPT generated the image, I ran the file through Enhance.io to enlarge it so that it’s high enough resolution to hold up as the main art in the poster.

Wine jar with a red label and Chinese characters that say "Mount Liang"

Building the Webpage

To create a webpage to accompany the poster, I tried a little vibe coding by feeding ChatGPT several mockups (which I created in Figma). My prompt specified how certain elements are supposed to work and asked ChatGPT to create a page using HTML, CSS, and Javascript. Given that this was my initial foray into vibe coding, I kept the designs simple so as to make it easier for me to gauge how well the code came out.

A mockup of the webpage I made about the Water Margin

The initial output was maybe 75 percent there. After a few more refinement prompts got it to about 90 percent, I downloaded the code and did some more cleanup manually.

Using AI to write the initial code was a huge timesaver. While I’ve done enough coding that I probably knew how to write something similar to what ChatGPT generated (or could probably figure it out with some googling and trial and error), it would have taken me exponentially longer than the roughly 30 minutes that it actually took with AI handling the heavy lifting and me doing some light refinement. Also, there was a particular part of the design that I wasn’t sure how to build, and the code from ChatGPT provided a quick way for me to learn.

A screenshot of part of the webpage, showing a grid of the characters. — Not being a developer by trade, It would’ve likely taken me some hours to figure out how to write the code for a grid of the characters that’s automatically populated from a spreadsheet.

Ironically, the part that seemed like the most obvious and easiest task for ChatGPT turned out to be the most difficult. For each of the 108 chieftains, I wanted to allow users to select their thumbnail image and open up a modal that displays a short bio. It seemed like a no-brainer to ask ChatGPT to compile those bios. I even pointed ChatGPT to a Wikipedia page that linked to individual entries for each chieftain and told it to use that page as reference.

As it turned out, ChatGPT simply could NOT handle this task and failed on multiple levels. Most seriously, it hallucinated on a significant percentage of the bios, often adding in details that weren’t true or were about a different outlaw. Of course, this would not have been caught if the person proofreading the bios wasn’t already intimately familiar with these characters.

Less serious but no less annoying was the generic quality of the output. For instance, about maybe 75 percent of the bios ended with something along the lines of “He earned respect among the outlaws for his loyalty and courage.” Well, that was more or less true for almost every one of these characters, and no human writer or editor would ever allow this kind of repetitive, say-nothing tripe to stand.

I kept working with ChatGPT to try to get it to improve the bios. However, when you’re dealing with 108 bios, you can only give the AI tool so many tries before you change tactics, because you don’t want to be proofreading 108 bios multiple times to see how each iteration screwed up. I tried a different LLM (Claude), but it fared no better, and in fact exhibited some of the same issues.

In the end, I decided to just write the bios myself, using each character’s Wikipedia entry as reference. This turned out to be the most time-consuming part of the entire project, and it was disappointing how much the AI tools fell short of delivering anything decent enough to save me some time.

Ethical Considerations

When I started this project, I first thought through some ethical issues. One of the main criticisms of generative AI tools is that they “steal” from real artists, and I wanted to be mindful of that pitfall. I wanted to make sure that 1) I wasn’t blatantly stealing a well-known artistic style; and 2) I wasn’t depriving an artist of financial opportunities.

On the first point, I believe my prompt for generating the images steered clear of stealing any well-known style. I did not tell ChatGPT to copy any particular artistic style, and I specifically told it to create a realistic figure so as to keep it from employing more illustrative styles that might incidentally mirror a human artist’s. Also, while I did use existing drawings as reference, the output was significantly different and thus should be considered transformative use when viewed through a copyright/fair-use lens.

Of course, I would never make the claim that what ChatGPT created does not resemble to some degree or another the style of any other work that came before. But then again, the same can be said for any work created by a human illustrator.

As for the financial considerations, here’s how I looked at it: This was not a situation where the use of AI for these images took away any financial opportunities for an artist. For me, it was not a decision between paying a human to draw 108 illustrations or using ChatGPT to make them for free. I do not have the means to pay an artist what it would cost to draw 108 illustrations for a hobby project, and so if I couldn’t generate these images for free or at a relatively low cost, I simply would not have pursued this project.

Beyond those considerations, I know there are other concerns about generative AI, such its impact on the environment, employment, the line between fact and fiction, and maybe even the very fabric of society. In fact, I get occasional pushback from listeners of my podcast when I use AI-generated images in some of my social media posts to promote new episodes. Here’s what I tell them:

I recognize and understand the concerns, and share some of them.
Yet, AI is becoming a prevalent presence and an increasingly expected skillset, particularly in the professions where I make my living.
Having lived through the development of the internet back from the late 90s through the present, I’ve experienced first-hand the impact that technological tsunamis can have on media and communications. I saw the many (often legitimate) critiques of new technologies, and I’ve seen how many breathless prognostications over the last 30 years didn’t pan out. Yet I also saw how, when taken as a whole, the way we live and work have fundamentally and irrevocably changed. So I operate with the expectation that when a technological wave hits, everyone gets wet.
So it’s important to understand AI’s capabilities and limitations, so that we as creators have some idea of the proper uses of this powerful and potentially problematic tool. And the only way to gain that understanding is by using the technology.
And that’s really the main drive for my experiments with AI tools and my current use of it in my creative ventures.

Takeaways

This was a project that would not have been feasible for me before generative AI became a widespread tool. I simply would not have been able to create or source stylistically consistent, copyright-compliant, and high-quality images of all 108 outlaws. Such images simply don’t exist in the public domain, and I would not have been able to afford what it would cost to hire an artist to draw 108 illustrations. So this was a situation where generative AI endeavored new creative endeavors.

As for AI’s value in each step of the process, I would rank them in this order, from most valuable to least:

Image generation (AI image tools made the whole project possible)
Writing code (ChatGPT probably saved me double-digit number of hours, and resolved issues that I might not have been able to figure out on my own)
Cleaning up lists, formatting, etc. (ChatGPT saved me maybe 30 minutes after I changed tactics due to the initial approach yielding inaccurate results)
Writing bios (AI just wasted my time as I would’ve had to proofread everything and probably rewritten 90 percent of the generated content)

As I had observed in previous experiments, the tasks that AI seemed to fare best at were the ones where there wasn’t a clear right answer, but rather a broader range of potentially acceptable outputs.

Case in point: While I had reference images for each of the 108 chieftains, these were still fictional characters from a text-based novel. There are no established canon for what most of them look like, beyond a line or two of very high-level descriptions. So for most of the outlaws, I wasn’t necessarily aiming for a very specific look, thus making the output more likely to be acceptable. For the handful of major characters whose appearance has been more concretely established in popular culture (e.g., by TV and movie adaptations), it did take more effort to arrive at what I deemed an acceptable facsimile.

All in all, in this particular experiment, generative AI’s image-creation capabilities proved to be an invaluable asset and opened the door to creative opportunities. However, the project also highlighted significant problems with the accuracy of generative AI tools’ output. In situations where there was such a thing as a clearly right answer, ChatGPT often got it wrong, and it was only due to intervention by a human expert on the subject that those errors were caught.

Buy a print-ready file of the poster