You Can Only Govern Data That You Know

Cedric Berger
4 min readJun 22, 2021

We can only govern or manage data that we know. How would you apply the right governance to data you don’t know? What is it to know data?

Big organizations don’t know their data

In big organizations, neither humans nor machines have a full understanding of all data assets that exist and all entities/concepts they contain. This is worsening as the 4 Vs of new incoming data increase by the day.

In big organization, two other factors render the quest for data knowledge even more difficult:

  1. Regular reorganizations or structural changes
  2. The multiplication of data-handling tools/technologies

Know your data — What does it mean?

Where are you located in the radar diagram displayed as the header of this article? The knowledge you have of your data is represented by the grey area of the diamond. Knowledge about data (commonly called metadata) is a prerequisites for data governance activities that should be at the heart of any solid data/digital strategy.

The “Meaning” dimension

As a minimum requirement, any data entity must have at least one unique preferred label, one definition and a pointer to the source (source of preferred label + definition).

Progressing along the meaning dimension, an entity can be further characterized by adding alternate labels (synonyms), antonyms, many definitions (for the same preferred label, coming from different sources), hypernyms and hyponyms (if the entity is part of a taxonomy) and any additional element improving/refining the understanding of this entity.

Data Meaning

The “History” dimension

As a minimum requirement, in order to trust your data, you must know where it comes from (what is the source) and what are the transformations it has undergone. By the source I mean both humans and machines and by machines I mean at least the applicative layer if not the technology layer.

This is where we differentiate provenance vs lineage. This difference is useful when data flows between different (data management) platforms/environments separated by clear technical boundaries and interfaces. As a platform user, you access the data when it flows inbound to your platform via a define interface featuring a specific ETL or ELT process.

  • provenance = information about a data asset and transformations it has undergone before it enters a specific platform/environment
  • lineage = information about a data asset and transformations it undergoes inside a specific platform/environment
Data History

The “Modelling” dimension

If an entity is part of your (digital) business, it is used, represented and instantiated by humans or machines. Therefore, it should be described as part of your business and/or data architecture. Data architecture includes conceptual, logical and physical models of data.

As a minimum requirement, any data entity must be modeled at (business) logical level. You reach some level of modelling maturity when you are able to link logical entities to the many system-specific physical entities instantiated in the many systems using these entities. Similarly, your maturity increases when you link (in a one-to-many manner) the many business-specific logical entities to overarching business conceptual entities.

The “Mastering” dimension

Definitions around Master Data (MD) and MD Management (MDM) are numerous and controversial. Moreover, the way to implement MDM is organization-culture dependent.

Traditional MDM is system-centric: driven by technology vendors, we first managed single domain MDs (many similar MDs in many domains) and then multi-domains MDs (shared MDs across domains). Nowadays, graph-based MDM enables a more data-centric approach, by linking MD beyond domain, to the entire business architecture.

As a minimum requirement, data that you identify as master should be transparently published and accessible in an open standard format (structure- and content-wise) that is human and machine readable. One step further is when master data schemas are validated and linked to controlled vocabularies and/or terminologies applying to specific fields in the schema.

Because the “mastering” aspect of MD resides more in the process you handle MD, you move along the mastering dimension when this process and governance of it is openly described.

--

--

Cedric Berger

Entrepreneur, Healthcare Data Doctor, Digital Transformation Operator