Avatar Me!
Many of my friends have these cartoons, which they call avatars, gracing their bios on LinkedIn, Facebook, and elsewhere. Naturally, I developed a serious case of “Avatar Envy”.
Being the intensely curious person that I am, I reached out to them and asked how they did it. The responses were all the same: “Just upload an image of yourself and ask the AI to create a cartoon of the image.” Choosing which image of myself wasn’t an issue. I will use my LinkedIn Learning bio image that graces this series.
Still feeling chastened by my experience with chatGPT, I decided to venture into this new realm using Google’s Gemini AI and Adobe Firefly, which my friends recommended.
Gemini has a very basic interface. However, there is a category called “image.” Clearly, this would be a good starting point, and Gemini even mentions it uses something called Imagen. Since I don’t see anything that requires me to upload an image, I assume the “All Knowing Google” knows who I am. But based on my experience with chatGPT, I hedge my bets and make sure it uses me, the other Tom Green.
The result was quite surprising. Do I really look like I can balance a microphone on one finger? And I wouldn’t be caught dead wearing that T-shirt and shirt combo. Mind you, the hair colouring was appreciated, and the glasses and the rather pronounced chin were somewhat acute, but adding my name at the bottom and that sticker on my computer really bothered me.
Clearly, I was missing something. That something was a reference image. It turns out that item is hidden in the dropdown menu when you click the + sign. There it is: Upload image. I upload the image, which appears in the chat area, and then add: “Create a cartoon image of the subject of this photo.”
Close but no cigar. I needed a second opinion. I show the image to my beloved and her response is: “Who is that?” This prompt needs some work. Back to Gemini and a new prompt: “Create a cartoon using this reference image.”
This time I am not breaking out the surrender flag. Obviously I need to flesh out the prompt. Not only that, I wanted it in 3D, not 2D. As one of my friends said when I asked how to do it in 3D: “Did you say 3D in the prompt?”
As I learn, prompts require much more context and detail to achieve accuracy. The more detail, the more accurate. Not knowing where to start, I turned to the “Oracle of Google” and asked: “How to create an accurate 3D avatar from a prompt in Gemini using a photo portrait.” After instructing me to upload the photo, the Oracle stated that the second step is to craft a detailed prompt that includes style, pose, and any specific features you want to emphasize. The third step is to make sure you reference the photo in the prompt. The oracle even suggested a prompt which I thought I would use:Create a photorealistic 3D avatar of the person in the reference photo with a slightly smiling expression, facing the camera and in a casual pose.
As you may notice, I am slightly smiling, facing the camera, and relaxed in the photo. How did the Oracle know?
This is where the Oracle’s 4th point, Iterate and Refine, comes into play. According to the Oracle, “The initial generation may not be perfect (You think!), so you will likely need to iterate by specifying to Gemini what needs to be adjusted.” Good advice.
My first iteration was : “ Change the hair colour to grey and have the subject, with crossed arms, leaning on a table”.
Gemini couldn’t do it. Its response was:” I am still learning to generate certain kinds of images, so I might not be able to create exactly what you are looking for yet, or it may go against my guidelines. If you would like to ask me for something else, just let me know!”
Recommended by LinkedIn
It was time to learn how to iterate. I started by asking Gemini to change the hair color to grey, remove the background and change the shirt color to white.
The glasses were missing and the hair needs be greyer.
The face needs to be thinner and the frames for the eyeglasses need to be thicker. thicker.
The hair should be white.
I felt the face should be thinner and the chin longer.
Conclusion
This was close enough for me. As you can see, there inevitably comes a point where you need to step back and accept what you have. When it comes to creating avatars, it is all too easy to be Goldilocks and waste an inordinate amount of time attempting to get it “just right.” Still, the lesson for me was the Oracle’s 4th Point: “You will likely need to iterate by specifying to Gemini what needs to be adjusted.” This also supports my observation that Generative Art has an issue. The issue is that you can’t fully describe “intent"; the best you can do is be specific about what you’re trying to create, and eventually, you need to accept a somewhat acceptable result.
While fumbling around trying to create this avatar, I found myself agreeing with many of the claims Sari Azout makes in her article How I stopped Worrying About AI and Learned to Value my Humanity. In the article, Sari examines the disparity between our expectations of AI and its actual capabilities. One of these gaps is that we expect AI will reduce our workload, giving us more leisure time. Sari points out that the reality is AI expands what’s possible, raises expectations and standards, and creates more work.
Tell me about it.
In the next installment of “The Bumbling Prompter,” I embark on a quest to use Adobe Firefly to see if I can do even better than Gemini to create my avatar.
It is so correct what you are writing .... It needed a lot of experience to create Europes first Bicycle Avatar ... https://youtu.be/SdUD8RlbX2o I
Nice efforts here, Tom. Interesting evolution on each iteration, though you are spot on - using a language model to try and clone your image can return quite varied outputs. There are ways around this. Various platforms out there either allow you to do this yourself, OR provide a service to do it for you. This can be done with image uploads OR actual video footage of you. An example of this with a historical figure (Einstein) is here - you can even have a chat with him: https://trulience.com/avatar/8118392647806109895
Its like an art form if you have a vision.
I feel your pain. I tried to create an avatar in a similar manner, with results that (I hope) don't look too much like my current profile image.