AI, Data, Digital Content, Online Trends, Open Data, Product Management, Startups, Technology

What does generative AI have in store for us?

Whilst generative AI isn’t a new concept. Unlike traditional AI technologies, generative AI doesn’t predict or classify based on input; it generates content and images based on text input. This form of AI first became popular in 2014, through the introduction of Generative Adversarial Networks (GANs). In this post, I’ll focus mostly on image generation aspect of generative AI. Previously, I’ve looked at AI-powered content generation platforms like Copy.ai, which is competing in this space with startups such as Jasper and Copysmith.

A GAN is a deep learning method through which one can create realistic images or transfer the style of one image to another (think of so-called ‘deepfakes’ where the images and videos are modified to swap one person’s face for another). But because we’ve seen a number of exciting generative models and applications this year it feels like generative AI is on a path to becoming more mainstream in its application.

Image Credit: Twitter

Earlier this year we saw the launch of DALL-E 2, a new machine learning model that can create images from a scene written in words (called a “prompt”).

Video Credit: YouTube

Users provide DALL-E 2 with a text description and it generates an image in return. ‘Feature Learning’ is at the heart of generative AI; the neural network transforms pixel colours into a set of numbers that represent the features of the image. These features form the ‘input’ and are then mapped to the ‘output’ layer. This process is called “diffusion”; it starts with a pattern of random dots and gradually changes that pattern into an actual image when it recognises specific aspects of that image.

The visual style of the image is set through the description, varying from realistic to fantastic. In the example below, the images are generated from the description “An astronaut riding a horse”; the first description ends with “as a pencil drawing” and the second one “in photorealistic style.”

Image Credit: Venturebeat

DALL-E 2 can also take an original image, make edits and create different versions based on the original, using natural language descriptions. Changes to the original image are applied whilst taking into account shadows and textures of the original image.

Image Credit: DALL-E 2

DALL-E 2 in turn has sparked a number of AI projects, all building on DALL-E 2’s diffusion process. Tools like Midjourney and Craiyon are based on DALL-E 2 and intend to make image creation using generative AI more accessible to the mainstream public.

Image Credit: Ars Technica

Whereas DALL-E 2 charges for the usage of its image models, projects like Stable Diffusion are open source, which means that people can freely use this code and embed into their own applications. The downside of these models being open source is that it invites all kinds of less ethical applications. Take Unstable Diffusion which uses text to generate pornographic images, for example. But one can also imagine how the models will be used for expressions of violence, harassment and hatred. Despite Stable Diffusion having restrictions built in, these can be circumvented when building on the open source code. The main concerns associated with generative AI are thus twofold: (1) loss of artistic control / ownership and (2) inciting hatred, misinformation and violence at scale.

Image Credit: Twitter

These concerns will no doubt grow stronger as the applications of generative AI grow in popularity and become more production ready. Like the Internet, it’s hard to govern the application of the different AI models, but given the potential impact of ‘image synthesis’ technology I expect closer scrutiny of the underlying AI models and its usage.

The other aspect to consider here is the cultural bias present in most AI models, with some notable and harmful examples coming out of open source AI communities like Huggingface and Stable Diffusion.

Image Credit: Twitter

The absence of the safeguards to correct cultural biases is one of the reasons why companies like Google haven’t yet released their generative AI models to the wider public. Alex Ratner, CO-Founder and CEO at Snorkel AI, points out that these AI models aren’t yet ready to be put into production by enterprise. For now, labelling the training data is a key factor in generative AI models like DALL-E 2 and Imagen being ready to be deployed to and adapted by enterprise.

Main learning point: The promise of automated image generation using machine learning feels immense and game changing. However, a number of big problems that are inherent to this democratisation of content creation will need to be addressed: built-in safeguards to avoid unethical usage as well as making these AI models adoption ready to adopted by businesses and the larger public.

Related links for further learning:

14 responses to “What does generative AI have in store for us?”

OpenAI ChatGpT (Product Review) – As I learn … says:

December 11, 2022 at 8:43 am

[…] a a new machine learning model by OpenAI that can create images from a scene written in words. In my blog post, I explained how DALL-E 2 introduces a form of generative AI, enabling users to create images from […]

Reply
Kranzberg’s Six Laws of Technology – As I learn … says:

February 24, 2023 at 2:52 pm

[…] history and human activity. One can look at any technology, whether it’s the mobile phone or generative AI and see how Kranzberg’s Six Laws […]

Reply
Google’s Music LM (Product Review) – As I learn … says:

March 8, 2023 at 6:55 am

[…] such as “a calming violin melody backed by a distorted guitar riff.” Through a generative adversarial network MusicLM generates sound snippets based on text […]

Reply
Lensa AI (Product Review) – As I learn … says:

March 31, 2023 at 5:50 am

[…] avatars generated of some female Lensa users, which Lensa has been trying to address through its generative AI, updates to its privacy policy and risk warnings like the one above. I then need to select my […]

Reply
Duolingo Max (Product Review) – As I learn … says:

May 24, 2023 at 5:50 am

[…] summary of Duolingo Max before using it – Duolingo using generative AI to help its users learn new […]

Reply
How can established disciplines benefit from generative AI? – As I learn … says:

November 17, 2023 at 6:44 am

[…] been nearly a year since I first wrote about generative AI and it seems like not a day goes by where there isn’t a launch of a generative AI […]

Reply
How can established disciplines benefit from generative AI? | by MAA1 | Nov, 2023 – Cash AI says:

November 17, 2023 at 7:09 pm

[…] been nearly a year since I first wrote about generative AI and it seems like not a day goes by where there isn’t a launch of a generative AI application. […]

Reply
Generative AI and Tesler’s Law – As I learn … says:

January 12, 2024 at 8:24 am

[…] talk a lot about the new innovations enabled through generative AI, but we haven’t talked as much about how we expect users to interact with AI applications. […]

Reply
Learning about video and generative AI (1) – As I learn … says:

February 27, 2024 at 9:14 pm

[…] types of motion, and accurate details of the subject and background. Similar to Google Gemini and DALL-E, you give Sora a text prompt and it will generate a video for you. OpenAI claims that the model […]

Reply
What makes AI Product Managers different? – As I learn … says:

July 7, 2024 at 10:12 am

[…] for decades, but the main shift that we’re in the midst of is a shift from retrieval to generative […]

Reply
What makes AI Product Managers different? | by MAA1 | Jul, 2024 – Artificial Intelligence Article says:

July 7, 2024 at 10:18 am

[…] years, however the principle shift that we’re within the midst of is a shift from retrieval to generative […]

Reply
What makes AI Product Managers different? | by MAA1 | Jul, 2024 – Artificial Intelligence Article says:

July 7, 2024 at 10:25 am

[…] many years, however the primary shift that we’re within the midst of is a shift from retrieval to generative […]

Reply
Websim (Product Review) – As I learn … says:

August 24, 2024 at 6:32 am

[…] My summary of Websim before using it – I believe that Websim helps create a website in no time, using generative AI. […]

Reply
“Reimagined” (Book Review) – As I learn … says:

September 29, 2024 at 7:22 am

[…] consistent output quality – For instance, GANs might produce visually striking images marred by imperfections. Language models can generate text […]

Reply

What does generative AI have in store for us?

Share this:

14 responses to “What does generative AI have in store for us?”

Leave a comment Cancel reply