Skip to main content

The #1 Open Source Metadata Platform

DataHub is an extensible data catalog that enables data discovery, data observability and federated governance to help tame the complexity of your data ecosystem.

Built with ❤️ by Acryl Data and LinkedIn.

Get Started →Join our SlackJoin August Townhall! ✨
Netflix
Visa
Optum
Pinterest
Airtel
Coursera
Zynga
Chime
Checkout.com
MediaMarkt Saturn
Adevinta
Wolt
Geotab
Hurb
Grofers
Viasat
LinkedIn
Udemy
ThoughtWorks
Expedia Group
Typeform
Peloton
Razer
ClassDojo
Klarna
N26
BankSalad
Uphold
Stash
SumUp
VanMoof
SpotHero
hipages
Showroomprive.com
Wikimedia Foundation
Cabify
Digital Turbine
DFDS
Moloco
Check Out Adoption Stories →

Get Started Now

Run the following command to get started with DataHub.

python3 -m pip install --upgrade pip wheel setuptools 
python3 -m pip install --upgrade acryl-datahub
datahub docker quickstart
DataHub Quickstart GuideDeploying With Kubernetes

Metadata 360

Combine technical, operational and business metadata to provide a 360 degree view of your data entities.

Shift-left

Apply “shift-left” practices to pre-enrich important metadata using ingestion transformers, support for dbt meta-mapping and other features.

Active Metadata

Act on changes in metadata in real time by notifying key stakeholders, circuit-breaking business-critcal pipelines, propogating metadata across entites, and more.

Open Source

DataHub was originally built at LinkedIn and subsequently open-sourced under the Apache 2.0 License. It now has a thriving community with over a hundred contributors, and is widely used at many companies.

Forward Looking Architecture

DataHub follows a push-based architecture, which means it's built for continuously changing metadata. The modular design lets it scale with data growth at any organization, from a single database under your desk to multiple data centers spanning the globe.

Massive Ecosystem

DataHub has pre-built integrations with your favorite systems: Kafka, Airflow, MySQL, SQL Server, Postgres, LDAP, Snowflake, Hive, BigQuery, and many others. The community is continuously adding more integrations, so this list keeps getting longer and longer.

The Origins of DataHub

Explore DataHub's journey from search and data discovery tool at LinkedIn to the #1 open source metadata management platform, through the lens of its founder and some amazing community members.

ADLSAirflowAthenaAzure ADBigQueryClickhouseCouchBaseDatabricksDBTDeltalakeDruidElasticsearchFeastGlueGreat ExpectationsHadoopHiveIcebergKafkaKustoLookerMariaDBMetabaseModeMongoDBMSSQLMySQLNiFiOktaOraclePinotPostgreSQLPowerBIPrestoProtobufPulsarRedashRedshiftS3SalesforceSageMakerSnowflakeSparkSQLAlchemySupersetTableauTeradataTrinoADLSAirflowAthenaAzure ADBigQueryClickhouseCouchBaseDatabricksDBTDeltalakeDruidElasticsearchFeastGlueGreat ExpectationsHadoopHiveIcebergKafkaKustoLookerMariaDBMetabaseModeMongoDBMSSQLMySQLNiFiOktaOraclePinotPostgreSQLPowerBIPrestoProtobufPulsarRedashRedshiftS3SalesforceSageMakerSnowflakeSparkSQLAlchemySupersetTableauTeradataTrino

A Modern Approach to Metadata Management

Automated Metadata Ingestion

Push-based ingestion can use a prebuilt emitter or can emit custom events using our framework.

Pull-based ingestion crawls a metadata source. We have prebuilt integrations with Kafka, MySQL, MS SQL, Postgres, LDAP, Snowflake, Hive, BigQuery, and more. Ingestion can be automated using our Airflow integration or another scheduler of choice.

Learn more about metadata ingestion with DataHub in the docs.

recipe.yml
source:
type: "mysql"
config:
username: "datahub"
password: "datahub"
host_port: "localhost:3306"
sink:
type: "datahub-rest"
config:
server: 'http://localhost:8080'
datahub ingest -c recipe.yml

Discover Trusted Data

Browse and search over a continuously updated catalog of datasets, dashboards, charts, ML models, and more.

Understand Data in Context

DataHub is the one-stop shop for documentation, schemas, ownership, data lineage, pipelines, data quality, usage information, and more.