A category of artificial intelligence systems capable of generating new text, images, audio, video, and code based on learned patterns from training data. Generative AI is transforming multimodal content creation, augmentation, and data processing workflows.
Generative AI models learn the statistical distribution of training data and generate new samples from that distribution. Large language models (LLMs) generate text autoregressively, predicting one token at a time. Diffusion models generate images by iteratively denoising random noise. Variational autoencoders encode data into a latent space and decode new samples. Each approach balances generation quality, diversity, and controllability.
Key architectures include transformer decoders (GPT-4, Claude) for text, diffusion models (Stable Diffusion, DALL-E 3) for images, and autoregressive models (Jukebox) or diffusion (AudioLDM) for audio. Multimodal generative models handle multiple modalities in one system. Training scales to billions of parameters on internet-scale datasets. Inference uses techniques like KV-caching, speculative decoding, and quantization for efficiency.