Generative AI: A Primer

ChatGTP. DALL·E 2. Stable Diffusion. With media headlines across tech, business, and popular culture making household names of bleeding-edge research in generative AI, I thought it might be useful to put together a quick technical primer.

Generative AI is a subfield of artificial intelligence (AI) that focuses on developing algorithms to generate new, original content based on a set of inputs. Though relatively nascent, this capability is already impacting a wide range of fields, including art and design, media and communications, marketing and more. In this post I’ll try to outline the most important technical components of generative AI and discuss some of the leading techniques used to develop and train these systems.

Generative Adversarial Networks

Generative adversarial networks (GANs) are one of the more popular techniques for generative AI. GANs consist of two neural networks: a generator and a discriminator. The generator network produces synthetic content, while the discriminator network attempts to distinguish between real and fake content. The two networks are trained together, with the generator trying to produce increasingly realistic content and the discriminator trying to identify genuine and fake content correctly.

There are several GANs variants, including Deep Convolutional GANs (DCGANs), specifically designed for image generation, and Variational Autoencoder GANs (VAE-GANs), most often used in the generation of text. GANs have successfully generated a wide range of content, including images, audio, and text.

One of the big challenges in developing GANs is ensuring the quality of the content generated. The most effective approach today is to use a combination of qualitative and quantitative measures. A common quantitative approach is using an Inception Score, or the Fréchet Inception Distance, which measures the quality of generated images or text based on their similarity to real-world data. Human evaluators, who assess the realism or coherence of the generated content, remain the best method of subjectively measuring quality and training learning models.

GPTs, VAEs and Transformers

In addition to GANs, several other techniques are used for generative AI, including autoregressive models, variational auto-encoders, and transformers. Each technique has its strengths and limitations; which to use depends on the specific task and the desired output.

Autoregressive models, such as the widely used Generative Pre-training Transformer (GPT), are great for generating text and are particularly well suited for tasks such as language translation and text summarization — as seen by the explosive popularity of OpenAI’s ChatGPT. These models work by predicting the next word in a sequence based on the previous words, using a neural network to model statistical probability and the dependencies between words.

Variational autoencoders (VAEs) are another type of generative model that can be used for generating images and text. VAEs consist of two neural networks: an encoder and a decoder. The encoder network maps the input data to a latent space, while the decoder network maps the latent space back to the original data space. By sampling from the latent space and passing the samples through the decoder, VAEs can generate new, synthetic data that resembles the training data.

Finally, transformers are a type of neural network architecture that works particularly well for natural language processing (NLP), including language translation and text summarization. Transformer models work by using self-attention mechanisms to model the dependencies between words in a sequence, allowing them to effectively process long-range dependencies.

To wrap up, generative AI is a rapidly advancing field with the potential to revolutionize knowledge work across virtually every industry. While many technical challenges remain, as do a number of ethical concerns, it is clear that the power of generative AI is ushering in a period of radical innovation. 2023 will should be an interesting year.