Generative AI Unboxed: The What, How, And Why For Beginners

By Colin Campbell

Head of Content @ Pixis

There are probably only a handful of people in the world that can discuss the internal workings of generative AI. Though generative AI is a black box for the majority of the world, we know the contraption is not magic. Still, wouldn’t it be great if this black box AI could be a little transparent?

In this article, we look to decode generative AI by giving you a glimpse into some of the inner processes of the technology that is sparking conversations all around.

Components of Generative AI

Generative AI leverages sophisticated algorithms to learn patterns and characteristics from existing data and generates novel outputs that resemble the training examples.

Depending on the use case of the generative AI, new content is generated using one or a combination of these components. Images are generated by random vectors in the latent space, a space where complex data is compressed in a more meaningful form, and transformed into images.

In text generation, the model starts with a seed sentence and predicts subsequent words based on learned language patterns and the data it has access to. Just like spouses who have spent years together and are capable of coherently finishing each other's sentences, these predictive actions are only possible if the generative model has been well-trained to do so.

Training Generative AI Models

Generative AI models require large amounts of preprocessed training data to learn the underlying patterns, regardless of how specific or general the use case.

Preprocessing data transforms data into a clean data set that is easily understood by computers. This involves removing redundant data or outliers, merging data from various sources, or transforming it into valuable analysis. It is an integral step when feeding generative AI models training data so it can provide us with any form of meaningful content on tap, otherwise it could turn into a Garbage In, Garbage Out scenario.

The training process typically involves an iterative optimization process that aims to minimize the difference between the generated content and the training set. For example, with GANs, the generator and discriminator are trained simultaneously, with the goal of improving the generator's ability to produce realistic content and fool the discriminator. Generators basically have to be excellent liars! For them to be able to lie convincingly and create realistic content, it needs to learn from a source of training data, and sourcing training data comes with its own challenges.

Just like us humans, the generative AI model's learning process is also one that never ends, and improves based on feedback it receives on its output.

Evaluating Generative AI Models’ Output

Evaluating and validating the outputs generated by the trained AI models requires constant adjustments to ensure that the generated content is of high quality, and aligns with the desired characteristics. Human intervention offers the most accurate results when providing feedback for the generative AI model, especially if it is for very general uses. Having a general use AI judge another general use AI-generated values would be like a 3 year old judging the drawing of another 3 year old, there is definitely going to be a limited perception of the errors. Likewise, the error factor multiplies when an AI model evaluates another AI model's output, because it is being trained on the same general data. At the end of the day, a human eye is a necessity to the process of evaluation.

In fact, platforms such as OpenAI and Google have us, their users, working for them for free to define certain texts and images. How? Captcha. What once started out as a mechanism to discern whether the user was a human or a robot, has since evolved into a method to have us help digitize books, annotate images and build machine learning data sets for Google. Everytime we solve a captcha image or text puzzle, we are effectively helping the AI model learn or corroborate information. With enough people providing the same answers for each captcha, the computer then learns it to be an appropriate output. So when enough people correctly identify traffic signals, bicycles, or zebra crossings in a captcha, the AI correctly learns what each of those objects look like. So next time you interact with a captcha, know that you are contributing!

Though efficient, this could have a downside if enough people want to ‘troll’ the AI model by cumulatively providing incorrect feedback. However, considering the amount of users ChatGPT or Google have, getting a majority to cooperate for such a prank is highly unlikely. Additionally, there are other measures in place to prevent this from happening, where such platforms have their own internal evaluation teams to ensure high quality of output.

On a smaller scale or for more specific use cases, evaluation of generative AI could be more expensive. Companies have to employ people or third-party services for this specific task, to help their AI model gain accurate feedback for their responses. Such specific use cases require people that understand the purpose of the model and training data being fed to make sure the generative AI model effectively improves.

Costs and accuracy of evaluation are just some of the factors to consider for companies building generative AI. To get the best out of these models, they also have to think about what they are actually capable of.

Limitations of Generative AI Models

Generative AI models of today are great at many things. They can help you with writing up professional emails, translating thousands of different languages, and even help you write your very own original theme song. However, have you ever asked it to generate a picture of a human hand holding a pencil? You may come across a very dysmorphic image with the hand’s fingers going every which way. The most common challenge is that the fingers don’t really look human-like.

The technology is notable in generating realistic images of landscapes and people’s faces, but it still has its limitations. Images of hands and fingers have proven to be a challenge for several big AI image generators such as Dall-E and Midjourney. Even though it has been trained on images of hands and fingers, no one has explicitly told the model that human fingers are not meant to bend backwards. Yes, AI has been trained on millions of images of hands, but it has seen hands in millions of different variations, from different views and positions as well. So after learning from all these data points, when it’s asked to generate a hand it may choose to return an output that is not very handy!

At the moment, we need to be very specific with the prompts we provide for generative AI models to return accurate images. Some developers are considering training AI with a very specific data set of hands, or even manually coding in the constraints of human hands into the algorithms. There are other limitations to generative AI, one of them being the data that backs its learning is the internet, and we all know what a circus the internet is. Nevertheless, such limitations are not permanent as we have some of the sharpest minds working on these challenges. Besides, these challenges are far outweighed by the value the technology adds.

And, as AI continues to learn through feedback loops and reinforcement learning, its use cases and advantages will expand as well.

Uses and Benefits of Generative AI Models

The biggest appeal there is to generative AI is that it can help automate repetitive content work, freeing up time to focus efforts and attention on more highly regarded tasks. Other uses span over many different industries, from chatbots in customer service and designing new products, to writing scripts for movies and helping in pharmaceutical drug development.

Generative AI enables the creation of new content, augments limited datasets, solves complex problems, personalized experiences, aids in prototyping and simulation, and assists in artistic expression and design. It fosters creativity, improves recommendation systems, optimizes solutions, and accelerates innovation across various domains.

Generative AI - The Future Outlook

Currently, generative AI sparks a lot of debate. It has had notable success in the creative and coding industries, but people still fear it may take over their jobs. It has helped organizations streamline processes and optimize operations, but there is still apprehension on data privacy issues. As we move ahead with generative AI, we must take into account areas where there is friction and address them appropriately.

Research has suggested that AI will not take jobs away, rather evolve current ones and create new opportunities too. AI service providers will see a need for creating ethical generative AI models that have increased focus on data control and security. Mistral AI is on a mission to show it can be done, by announcing the development of generative AI that meets the EU government standards and market concerns.

Improving the generative AI models in such ways will get more people to trust the technology and have it become a more practical part of their lives. Generative AI represents an exciting frontier in the field of AI, enabling machines to create content that was once exclusive to human effort.