Every day, advanced techniques for algorithmically generated images and video are getting more sophisticated and harder to detect. The "advanced" part is key, as some elbow grease and know-how goes into making really convincing facsimiles. We're a long way off from AI that can just conjure a reasonable approximation of whatever we ask it to — case in point, just look at what Cris Valenzuela's online text-to-image generator spits out.
Working off of a paper that proposed an Attention Generative Adversarial Network (hence named AttnGAN), Valenzuela wrote a generator that works in real time as you type, then ported it to his own machine learning toolkit Runway so that the graphics processing could be offloaded to the cloud from a browser — i.e., so that this strange demo can be a perfect online time-waster.
When I took it for a spin, I tried out relatively simple phrases in the hopes that the generator would give me something recognizable. What I got in return were, well, definitely not mistakable for a real picture or even a good Photoshop.
The AttnGAN is set up to interpret different parts of the input sentences and adjust corresponding regions of the image output based on words' relevance — essentially, the AttnGAN text to image generator should have a leg up over other methods because it's doing more interpretive work on the words you feed it. In the paper, the researchers start by training the network on images of birds and achieve pretty impressive results with detailed sentences like "this bird is red with white and has a very short beak." When working off more generalized data and less specific descriptions, the generator churns out the oddball stuff you see above.
That said, it's cool to type increasingly complex sentences and watch the image output update with each new phrase you add. I was messing around with sentences about boats for a while before settling on the following image.
Is "abandoned" actually a good adjective to feed this generator? Probably not. Are there actually flower-like objects visible in the image? Not really. Still, the result feels eerily close to what I was imagining in my head. Watching hints of new elements appear as you feed the generator a small story can be super satisfying too.
Valenzuela is a research resident at New York University Tisch ITP (Interactive Telecommunications Program, a.k.a. people making art with tech) and his Twitter is full other neat machine learning demonstrations. Follow him there if you want to see more stuff like this generator, or like this bizarre Stephen Colbert cloning experiment.