Metadata is data that describes the properties of other data. Metadata has many purposes, such as tracking data’s history, understanding relationships between data, and creating searchable data indexes.
What are the Types of Metadata?
All metadata exists in relation to something else. For example, a restaurant menu is a type of metadata. Each menu item contains a name, description, and price for a meal that the restaurant sells.
In an enterprise context, metadata usually relates to other data. This can be data in a relational database table, or data that exists in flat files or JSON; it can be structured data in a data warehouse or vast quantities of unstructured data in a data lake.
Data is diverse, and so too is metadata. But it breaks down into three main categories: administrative, structural, and descriptive.
This kind of metadata is essential in any good data management system. Administrative data includes information about data lineage, which includes data provenance and records of any changes or updates.
An example of this is a shared document in Word, Google Docs, or other similar applications. One person will create the document, title it, and save it. Another person may then open the document and add some text. A third person might come along and edit the text, while a fourth might rename it.
Some applications will track all of this activity and store it as administrative metadata. This allows future users to discover important details like the document creation date, the identity of the author, and the last edit date.
On a larger scale, administrative metadata plays a vital part in the ETL (Extract, Transform, Load) process. This process can move substantial amounts of data from one location to another while applying essential changes to the data values.
Good data governance means keeping a log of all of these events, as well as recording details such as data provenance. This ensures that there is always full transparency when populating a data warehouse.
Structural metadata tells us about the relationships between data. For example, an organization might store their employee records in an HR system. Each employee will have the same type of record within that system, whether they are an intern or the CEO.
To reflect the company hierarchy, the HR team will need some structural metadata that describes relationships between employees. A department manager’s HR record might have structural metadata that looks like this:
Position: Finance manager 1 |-Reports to: Chief Finance Officer 1 |-Reporting to: Finance employee 1, Finance employee 2, Finance employee 3
This metadata doesn’t offer any information about the individual’s record. However, it does help the HR team keep track of how all of these records relate to each other. The sum total of this structural metadata describes the entire company hierarchy.
Structural metadata can exist on any level. You can find it within a document – a book’s table of contents is one example of structural metadata. It can also describe the structure of a relational database or the contents of a repository, such as a data warehouse.
Descriptive metadata gives information about the underlying entity. For example, in a large and diverse document store, descriptive metadata might include details about individual documents such as:
- Filename or unique identifier
- Document title
- Author or source
- Genre or topic
- File size
- Rights information
- Meta tags
This type of metadata existed in pre-digital libraries, where card indexes represented the books on the shelves. The cards each contained information about one book in the library, with details such as title, author, ISBN, and classification.
We use something similar in most modern operating systems, where we can search for files on a drive according to name, location, size, or format. This kind of search looks at the file’s descriptive metadata and returns every item that matches the search.
How is Metadata Used?
Metadata is one of the key components of a good data governance strategy. Having the right metadata allows enterprises to do things such as:
Metadata makes it easier to search for data; the more detailed the metadata, the more accurate the search results. An example of this is a data lake, which is a vast repository of unstructured data. This kind of structure is only functional because the file system keeps detailed metadata about the contents of the lake.
Reusing data can help to improve efficiency. For example, if data has passed through a transformation process, then it doesn’t need to go through transformation again. Metadata can help keep track of data lineage, including details of any changes or integration. Users can simply check the metadata to see if the data meets their required standards.
A data dictionary is a template for the business’s data structures. It outlines all of the fields, data types, relationships, and access permissions for each data source. This dictionary is, in itself, a type of metadata, but the business will derive this dictionary from its available metadata. The data dictionary then serves to guide all future data projects.
Metadata is essential in system integrations. For example, consider an ETL process where data flows from a production system to a data warehouse. The ETL process has to ensure that it’s not overwriting the most recent data with an older version. It does this by comparing metadata and establishing which version is most recent.
Analytics and Business Intelligence
Metadata can act as a rich source of information for data analytics and business intelligence tools. For example, administrative metadata will provide the creation date for a particular data item. An analytics tool could study all of these dates and identify trends over time. Metadata is also important in data exploration, which is an essential first step in analytics.