Knowledge Integration Dynamics

Metadata, not big data, offers the flexibility to rapidly enable usage models


by Mervyn Mooi Director at Knowledge Integration Dynamics (KID)
Johannesburg, Monday, 08 April 2013

Metadata is the universe in which all data, information, and process objects exist and it is through metadata that data architectures can enable flexible data and processing usage models that are required for using and managing big data. It implies that any data store or data processing technologies, which are based on metadata, such as data warehouses, are also considered as part of the big data domain. There is an argument that big data is about applying new technologies to meet needs unfulfilled by traditional technologies such as data warehouses.

The argument is that the three Vs of data, velocity, variety, and volume, were instituted long before big data and that they don't necessarily hold true in the big data world. Big data need not have large volume, need not have velocity, and need not incorporate variety. In fact, using the three Vs to describe big data is a literal and technical interpretation of the term. Big data does, however, also require flexibility and quick or even realtime response rates. Data warehouses provide rapid responses yet are not typically flexible – but that is not only a weakness of data warehouses.

Data warehouses were traditionally used to feed business intelligence (BI) systems and typically in a rigid manner relying on structured data. Big data goes beyond those fundamental data types and includes data content, systems data such as job or process run-time results and user accesses, which are not considered business data in the bottom line sense. Systems data is actually often touted as being metadata.

Data warehouses have difficulty in dealing with some of the basic data processing tasks, but this is more often a failure of data integration tools and architecture than the warehouses themselves because, while they rmeain capable of dealing with the tasks technically, the way they are deployed and used is the real problem.

It is true that architecture can be a constraining factor when placing data warehouses in the context of big data yet the best way to retain or gain economy of resources is to employ, for example, common processing, common models, and integration by taking a strongly architected approach. It is more about maximising the already deployed tools and technologies than adding layers of complexity with new tools and technologies that are not entirely necessary.

Although architecture implies rigidity it is not absolutely so. The architecture can be designed to be flexible so that it is able to adapt for changes such as dynamic mappings, self-service information delivery or reporting, drag and drop report development, and sand-box, quick win development strategies. Flexibility designed into the architecture lends agility, which is one of the most prominently cited constraints of modern data warehouses. Such agility, however, is achieved within the confines of an architectural framework, allowing for rapidly changing or cycling models and their uses. And it retains structure and order which, even if flexible, are necessary for sound data management which is a top priority in the data governance domain and one of the challenges in dealing with big data.

Metadata does not fit into the business data and information content but rather into the models, definitions, programs, scripting and specifications of all ICT artefacts and resources. It sits a layer above the artefacts and resources which is why metadata is the universe in which all data, models, information and process objects exist, which in turn includes big data and data warehouses. That is why it is in metadata that we realise flexibility for the new usage models big data requires.

The solution to realising the benefits of big data does not reside solely in the employment of big data technologies and systems or even other technologies and tools but rather in the architecture of the data domain which relies on reliable, consistent and available metadata to drive flexibility within the confines of good management practices and processes.

comments powered by Disqus