A metadata management system that provides a searchable inventory of all data assets in an organization, including descriptions, schemas, ownership, and quality metrics. Data catalogs help teams discover and understand multimodal datasets across the organization.
A data catalog collects metadata about data assets from across the organization, including databases, data lakes, APIs, and file systems. It indexes technical metadata (schemas, types, statistics), business metadata (descriptions, tags, owners), and operational metadata (freshness, quality, usage). Users search and browse the catalog to find relevant datasets, understand their structure, and assess their fitness for use.
Open-source catalogs include DataHub, OpenMetadata, and Amundsen. Commercial options include Alation and Collibra. Catalogs integrate with data sources via crawlers or push-based metadata ingestion. Features include automated schema extraction, data profiling, lineage visualization, access control, and collaboration (comments, ratings). Search uses both keyword and metadata-based filtering.