A natural language processing task that assigns one or more category labels to text documents. Text classification powers content routing, tagging, filtering, and organization in multimodal data processing pipelines.
Text classification models encode input text into a representation vector and map it to class probabilities through a classification layer. Transformer-based models fine-tuned on labeled examples achieve state-of-the-art performance. The model learns patterns that distinguish categories, from simple topic assignment to nuanced intent detection and content moderation.
Modern approaches fine-tune pretrained language models (BERT, RoBERTa, DeBERTa) by adding a classification head on top of the [CLS] token representation. Multi-label classification uses sigmoid activation per class instead of softmax. Few-shot classification can be performed using prompt-based approaches with large language models. Evaluation uses accuracy, F1-score, precision, and recall, with macro vs micro averaging depending on class balance.