The data mesh architecture is built around four fundamental concepts, the second of which is “data-as-a-product” often just called data products.
Data products are the results of applying product thinking into data sets, making sure they have certain characteristics comprising discoverability, addressability, understandability, trustworthiness and many others. In this blog, we will delve into the details of the different characteristics that make up a successful data product.
Based on the data mesh theory, well-designed data products should embody the following properties:
Data products characteristics:
1. Data products must be discoverable - Ideally, data products should be published in a catalogue or a registry where it should be possible to search and explore. To enable downstream users to find what they need, data products should be published together with additional information, such as domain, owner, lineage and quality metrics. Now, data users can easily investigate and find the needed data products.
2. Data products must be addressable - Addressability of data relates to standardised naming and formats. In fact, a data product provides a permanent, unique address to the data user to automatically access it and that unique address must follow specific standards that enable downstream users to consistently access all data products, making them addressable.
3. Data products must be understandable - Once a data product is discovered, the next step is to understand it. Data products should be documented and the schema (underlying representation of the data) should be described. Indeed, data schemas with well described semantics and syntax will enable self-serve data products.
4. Data products must be trustworthy and truthful - While discoverability and understandability close the gap between what the user knows and doesn’t know about the data, it takes a lot more for data to be trusted. One way to close the trust gap is to adhere to an approved data products’ service-level objectives: 1. Interval of change and timeliness 2. Completeness 3. Freshness, general availability and performance 4. Lineage
5. Data products must be interoperable and composable - Can data products be easily combined? Are metadata standardised? Are types standardised? Putting in place global standards will harmonise data across the different domains and establish enterprise-wide data interoperability.
6. Data products must be natively accessible - The usability of a data product is closely related to how easily it is for data users to access it with their native tools. This property refers to the possibility of accessing data in a manner that aligns with the domain teams’ skill sets and language. For example, data analysts will most likely use SQL to build reports and dashboards. Data scientists, in turn, expect data to come in a file-based structure to train artificial intelligence models.
7. Data products must be secure - Data products should be inherently safe and secure and this encompasses access control, ownership and robust governance standards. Access can be managed centrally then domain data products can authorise access at a granular level to accommodate the specific needs of the different teams. Who can access data? In which context? What is the retention period? What is the confidentiality level?
8. Data products must be valuable - Collecting high quality data and transforming it into data products is one of the best ways to monetise data. However, data products are valuable only when they are consumed to improve business performance. Are data products used? Are they self-standing, or do they need to be combined with other products to be valuable.