M.C. Escher’s artwork is a gateway into a world of depth-defying optical illusions, featuring “impossible objects” that break the laws of physics with convoluted geometries. What you perceive his illustrations to be depends on your point of view — for example, a person seemingly walking upstairs may be heading down the steps if you tilt your head sideways.
If you rotate an image of a molecular structure, a human can tell the rotated image is still the same molecule, but a machine-learning model might think it is a new data point. In computer science parlance, the molecule is “symmetric,” meaning the fundamental structure of that molecule remains the same if it undergoes certain transformations, like rotation.
AI image generation — which relies on neural networks to create new images from a variety of inputs, including text prompts — is projected to become a billion-dollar industry by the end of this decade. Even with today’s technology, if you wanted to make a fanciful picture of, say, a friend planting a flag on Mars or heedlessly flying into a black hole, it could take less than a second. However, before they can perform tasks like that, image generators are commonly trained on massive datasets containing millions of images that are often paired with associated text. Training these generative models can be an arduous chore that takes weeks or months, consuming vast computational resources in the process.