Please enable JavaScript to view the comments powered by Disqus. comments powered by Disqus

A Tight Definition For DAMA Metadata

In a previous post I expressed my delight at my discovery of the DAMA DMBOK (1) and my disappointment with the chapter on metadata. This post is an attempt to show the approach to definition of metadata that should, in my opinion, be taken in the DMBOK. At the end a list is shown of all the things that DAMA includes as metadata that would disappear from the chapter if my approach were adopted.

1 Introduction


1.1 Metadata in Daily Life


To obtain information from any data we need context around the data. That context is provided by some more data that describes the data in front of us. Such data describing other data is known as metadata.

We use metadata all the time almost unconsciously. If we are presented with a page of text we want to know where it comes from, who wrote it and for what purpose. In this way we gauge the value and reliability of the text and can place it in a context in order to interpret it.

Not only do we use metadata to interpret the text on a page, we also seek metadata about the page itself. Is it part of a document? If so, what kind of document? An internal company document or a published book? Who published it? When? We instinctively gather all this data about any document that comes into our hands.

We do the same with web pages. A search throws up an interesting piece of text. But if we really want to use the information we glean from the text we evaluate the source. What site is it on? Is it a regular news source or institution that we know and rely on? Or is it an individual blogger that we’ve never heard of before?

1.1.1 Metadata for structured data


We regularly meet metadata for structured data when we are confronted with forms to be completed either online or on paper. There is some metadata describing and defining each field or element on the form. Each element has a label. There may be a link or reference to explanatory notes. Required elements are marked. Related fields are in logical groups. Where data is required in a specific format, such as dates or codes according to a specific coding system, this is indicated.

All these features are ‘data about data’ or metadata.

1.2 Metadata in Data Management


In Data Management, where we are dealing with structured business data, an important type of metadata is defining metadata that directly explains the data elements, by providing their names and definitions along with aspects of the format of the data to enable its interpretation.

Here we have some data:

20160215ABCA1012XA 12NL00197802010065060000200029781585009701000010

We may make some guesses, but it is not possible to make a reliable interpretation of it. However, with the following specifications of the data elements and data groups we can obtain some information from it.

A table of metadata that maybe matches the example data
Table 1

By applying this metadata we can make the following interpretation of the data.

The data interpreted by means of the metadata
Table 2

We now have some information from the combination of data and metadata. There are five data elements in the Header data group and three in the Line item data group. In the Header data group we can distinguish the following:

  • Date is 15th February 2016
  • The customer is identified by code ABCA. With the appropriate database available we could look that up to find the name and other details of the customer.
  • A postal code 1012XA, building number 12 and a country code NL are associated with the customer code. With those elements we can search an address database and find the complete address details.

There are two Line Item data groups, each containing an ISBN identifying a book and a quantity.

We appear to have sufficient defining and structuring metadata to make an interpretation of the elements of the data. We can do some validation to verify the postal code and the ISBN numbers. However, we do not have any broader context for the data. It might be data from a purchase order or a dispatch advice, but without further metadata that remains unknown.

2 What is Metadata


Metadata is a set of data that describes and gives information about other data. (2)

This definition makes clear that metadata is itself a type of data. The definition is recursive and allows for layer upon layer of data and metadata. The scope needs to be limited. Three specific layers are identified as relevant for Data Management. There is the primary layer of business data that is the main focus of the functions of Data Management. The second layer is metadata describing the data layer. The third layer is meta-metadata describing the metadata layer. But all the metadata is also just data. Its role as metadata depends on which layer is in focus.

Figure 1 depicts these layers and the shift in focus. On the left the focus is on the primary data layer. Above that is the layer of metadata describing and providing information about the data. The third layer, the meta-metadata, is rather remote in this view. But when the focus shifts to the second layer, as shown on the right, that layer is seen to be data with a layer of metadata describing it.

Layers of Data and Metadata
Figure 1

2.1 Types of Metadata


Three types of metadata are of interest in Data Management:
  • defining metadata
  • structuring metadata
  • cataloging metadata

Defining metadata: data that defines and specifies other data.
Structuring metadata: data that specifies the relationships between data elements and data groups.
Cataloging metadata: data about the location, identity, history, validity, lineage, source, and destination of other data.

2.1.1 Defining Metadata


The example in the introduction illustrated the relationship between the business data layer and its defining metadata. In this table the defining metadata is a little more formalized, by way of illustration.

The metadata formalised into entries in a data dictionary
Table 3

The data element type specifications from the opening example have been extended with Identifiers and a Definition. An operational data dictionary includes such elements and many more.

Such defining metadata is the basis for determining data quality. Data must conform to the defining metadata. Well-formed defining metadata assists in assuring consistency of data across the enterprise. Defining metadata provides part of the context for interpretation of data in order to form information and it thereby provides help to users of information systems.

Defining metadata in the enterprise takes the form of a data dictionary of data elements according to ISO/IEC 11179 or some equivalent approach. The terminology used in naming and defining data elements forms the business glossary

2.1.1.1 Domains


The complete set of all possible values for an attribute is a domain.(3) Chapter 5 of the DMBOK on Data Development describes domains. A data element specification includes the specification of the domain of the data element.

2.1.2 Structuring Metadata


Structuring metadata brings relevant data elements together to form data groups. For example, all the data elements forming the name, address and other communication channels for a customer. Structuring metadata also shows which data elements are mandatory and which are optional in the complete set. Dependencies and other relationships are also specified.
Such structuring metadata takes the form of tables and data models.

2.1.3 Data Models


Data models are an important form of metadata. Conceptual, logical and physical data models are all forms of structuring and defining metadata.

2.1.4 Cataloguing Metadata


Cataloguing metadata enables users to discover the location of data and provides information about the data so that users can determine its usability.

Cataloging metadata includes data about:
  • date and time of creation, update and deletion,
  • validity status, e.g. in a master data update workflow a change is first drafted, then proposed, then approved and then applied.

2.2 Specifications and Instances


Defining metadata and structuring metadata both form parts of the data specification. They exist independently of the actual data. In the Data Lifecycle they are created before any actual data exists.

Whereas cataloging metadata is about actual instances of data and datasets. If there is as yet no data, there is no cataloging metadata.

2.3 The Meta-metadata Layer


Of course, the meta-metadata layer is just data when we look at it. But it is a very specific set of data. It is the data describing the metadata layer. Now the metadata layer contains data models, the data dictionary (containing data element specifications and data group specifications) and the business glossary. So the meta-metadata layer describes each of these and their relationships.

The meta-metadata layer is effectively the metadata architecture.

2.4 Overview Of The Layers Of Data And Metadata


The description of the three layers of data and metadata is summarised in Table 4.

Three layers, data, metadata and meta-metadata
Table 4

3 What Is Not Metadata


Section 2 above provides a tight description of metadata and its relationship with the data that is the focus of Data Management. The current edition of the DAMA-DMBOK Guide Version 1 section 11.2.1 Meta-data Definition includes things in its definition of metadata that fall outside this tight description. The following items from chapter 11 are not metadata:

  • Business analytics: reports, users, usage, performance
  • Business architecture: roles and organisations, goals and objectives
  • Business rules: standard calculations and derivation methods, except those that define relationships between data element values
  • Data governance: policies, standards (except metadata standards), procedures, programs, roles, organisations, stewardship assignments.
  • Data integration: sources, targets, ETL workflows, EAI, EII migration / conversion are not metadata except for the specifications of data, data interfaces, data mappings.
  • Data quality: defects, metrics, ratings
  • Document content management: unstructured data, documents, legal discovery, search engine indexes
  • Information technology infrastructure: platforms, networks, configurations, licenses
  • Process models: functions, activities, roles, workflow, business rules, timing, stores
  • Systems portfolio and IT governance: databases, applications, projects and programs, integration roadmap, change management.
  • Service oriented architecture (SOA) information: components, services, messages, master data (except for the metadata defining messages and master data)
  • System design requirements: requirements, designs and test plans, impact.
  • Systems management: data security, licenses, configuration, reliability, service levels.

All of the above are important in information systems management, business management and/or enterprise architecture. But they are not metadata.

If they disappear from DAMA-DMBOK chapter 11 then the metadata management process can be described in a neat clean manner without seeming to involve processes and expertise that are not in the field of Data Management.



Footnotes


1) The DAMA Guide to the Data Management Body of Knowledge (DAMA-DMBOK Guide), First Edition, 2010, DAMA International
2) Oxford Dictionary of English, Copyright © 2010, 2013 by Oxford University Press. All rights reserved
3) DAMA DMBOK Section 5.2.3.3.2



blog comments powered by Disqus