Comprehensive guide to data catalog systems: purpose, Iceberg catalog implementations (Hive Metastore, AWS Glue, Tabular), using DuckDB as a lightweight multi-source catalog, and comparisons of open-source catalog tools (Amundsen, DataHub, OpenMetadata). Learn selection criteria, setup patterns, and best practices for data discovery, governance, and unified querying.
Without a catalog, you must manage table locations and schemas manually in each engine.
Apache Iceberg requires a catalog to store table metadata (schema, location, snapshots, partition specs). The catalog name → table identifier → storage location mapping.
كتالوجات البيانات: كتالوجات Iceberg (Hive Metastore، AWS Glue، Tabular)، باستخدام DuckDB ككتالوج خفيف الوزن متعدد المصادر، ومقارنات بين Amundsen/DataHub/OpenMetadata، وأنماط الوصول الموحد إلى البيانات. المصدر: legout/data-platform-agent-skills.