Dall-E2 the AI artist - boon or bane?

Dall-E2

You probably know that artificial intelligence is relieving us from doing tasks that we consider tedious and repetitive. But now AI has taken a giant leap in the field of art. Have you ever seen an astronaut riding a horse on the lunar surface or a teddy bear buying fruits in the 90’s style? What I am saying may sound vague, but recently, a research company named OpenAI has developed a programme named Dall-E 2, which does something really cool. Let’s sneak in some info about it.

What exactly is Dall-E2?

In simple terms, Dall-E2 is the latest artificial intelligence system that is capable of generating ultra-realistic images and art from a written natural language description. This system was created by a for-profit organisation called OpenAI, co-founded by Elon Musk and others. This AI system is not just said to generate ultra-realistic images for the sake of it. This system is really promising. On seeing the images generated by this AI on the OpenAI website, I could not believe my own eyes as those images never looked like they were created by AI.

Dall-E2 generates high-resolution images from a description of what you want that image to look like. It can create any image that you want, even if it has never existed before. It can even edit the images in such a way that doesn’t make you feel that they were edited, and this process is called in-painting. And it does all these things in just 10 seconds.

Dall-E vs Dall-E2
This is the difference between the images generated by Dall-E and Dall-E2 for the same text prompt.

The name Dall-E2 itself self-explains it as the second iteration of that AI system. Its predecessor, DALL-E, was unleashed in January 2020, which does the same thing as Dall-E2 does, but with less resolution and in a cartoonish way. And then came the Dall-E2 on January 2022, with higher resolution, greater comprehension and new capabilities including in-painting.

How is Dall-E2 doing this?

The neural network of this AI system is trained to understand images and text descriptions, and can even recognise the relationship between the written text and anticipated image output. I.e.,

when your text prompt is like "A teddy bear walking on the moon".

Dall-E2 picks out all the words separately, and will look for the image that is stored in the name of that picked word. Once it is done, it finds the relationship between those images and the text, and generates the final output accordingly. Keeping a cherry on top, it gives you 10 variations of that image, allowing you to choose your favourite variant. So here, according to the text prompt, Dall-E2 will pick up the images of a teddy bear, walking, and the moon, and consolidate them into an ultra realistic image. This is referred to as "CLIP technology."

Dall-E2 even makes use of a process called diffusion, which generates images from a random pattern of dots and alters that dot pattern according to the individual images it acquired from the CLIP process.


Just have a look at this above image generated by Dall-E2. With the help of CLIP technology and diffusion, Dall-E2 could not have made it. Individual images obtained from CLIP technology and merging it doesn’t make it pleasing to us.

The reason why this image looks like a wonderful piece of art by a human artist is because of the artistic styles that this image incorporates. To make this possible, OpenAI started to look for everything that we humans find pleasing in art. Then they integrated it into their training process. They call this the "automatic aesthetic quality evaluation."

OpenAI introduced Dall-E2 in twitter.

For this, they generated 512 artistic captions with the help of a technology developed by OpenAI themselves previously, called GPT-3. These artistic captions were then integrated into Dall-E2. With the help of these captions, Dall-E2 could generate human-like art or images.

Dall-E2's limitations:

If the images in the dataset are incorrectly labelled, Dall-E2 may even give false outputs. For example, if an elephant image is labelled as "monkey", then Dall-E2 would generate a monkey image if it was asked to for an elephant image.

Dall-E2 might even have some gaps in training. For example, if Dall-E2 is unfamiliar with the term "howler monkey," as it may not have learned during its training, it will produce the best results by generating images corresponding to "howler monkey," as these two statements are somewhat similar.

These are some limitations that OpenAI themselves gave in their official video, but Dall-E2 has even got some problems with it.

Potential threats:

Dall-E2 threats

 The advent of every new technology ever created by humanity has always coincided with some problems. Though it is useful on one hand, it troubles us on the other hand. And Dall-E2 is not an exception to this.

Endangering the future of human artists:

If something could create an image or poster or any art they want in just 10 seconds, why do people have to waste their money on asking an artist to draw the same thing for days? This would be the mind-set of people if Dall-E2 came into use for everyone, ultimately leading to the endangerment of artist's lives. This has now become a hot debate among the artists' community and has made them furious. How could they find an innovation pleasing if it has the potential to destroy their lives?

Biased image output complication:

The other problem with Dall-E2 is with its racial image output. For example, if your text prompt is something like this: "A person receiving a noble prize," then literally all image outputs generated by Dall-E2 constitute white people and not others. But, if your text prompt is in a negative way, Dall-E2 generates non-white people in its images, triggering racial problems within us. Dall-E2 may even generate explicit images when it is given an obscene text prompt.

Tackling problems:

DALL-E2 was not exposed to explicit content while training, says OpenAI. So the probability of Dall-E2 generating those kinds of images is said to be low, but nothing is for sure till we get access. OpenAI has even developed a system that could monitor all text prompts, ensuring that no one violates their policies. As a result, taking action against someone who violates it is simple.

Though numerous precautions have been taken to ensure safety, OpenAI has given permission only to selected users on the advice of their technical team. But you can still see Dall-E2 generated images on their official OpenAI website. And if you want to use it, you can join their waiting list.

Many people who got access to use Dall-E2 have posted those images in twitter. So you can even have a look there. 

Dall-E2 is the biggest breakthrough ever in the field of AI, and could even be an initial step in the development of general or strong AI. In the future, we may even develop some technologies capable of generating animations, but as of now, all I can do is to just watch their progress.

Post a Comment

3 Comments

  1. Nice article bro, deep, yet easy to grap on. Well done!!!

    But your article was a little big, that is what I can say is a Lil minus.
    But to give this much info you have to write big so i love it anyways

    ReplyDelete
  2. The article is amazing...keep the good work...

    ReplyDelete
  3. Superb 👍👍

    ReplyDelete