KID Press 15 Feb 2012

Data research results futile?

By Mervyn Mooi, Director at Knowledge Integration Dynamics. (KID)
[Johannesburg, 15 Feb 2012]

Regardless of research, companies must have plans in place to address data processing issues.

The data industry has emerged from the IT basement into business's centre stage. It's because operations, the business people, and decision-makers are under pressure to increase productivity and gain market velocity, while ensuring service and product variety meet customer needs.

Research on big data technology, maturity, trends, database products and data processing best practices is somewhat futile. It is not really important to business that the research found that: a) relational databases still retain the number one spot; b) unstructured data use is on the up; and c) real-time processing, data integration, data quality and reporting capabilities remain the key challenges.

Data management challenges are obvious. Regardless of what research says or the size of an organisation, companies should already have, at the very least, plans in place to address the three most important aspects in data processing – volume, velocity, and variety (the 3Vs).

The 3Vs are equally old hat to veteran data pundits, and while big data has certainly amplified them, these are de-facto considerations for information system performance and of any data project, because the 3Vs are all efficiency factors.

Managing large data volumes and improving velocity are achieved these days through the use of technologies such as massively parallel and in-memory processing (MPIP), compression, highly-specced blade or networked server farms, expanded server address access, and data appliance technologies, which will all be de rigueur for the modern business seeking efficiency, flexibility, and scalability.

The same applies to the products or tools for managing large volumes of unstructured data and which improve life cycle management for big data, also offering hot and cold data access, VTL with de-dupe options, single instance file storage, sub-file de-duplication, vertical storage, hash and checksum algorithms and more.

Spice it up

Variety is often called the spice of life, and in big data, it's magnified by the explosion of Web-based computing and data that has necessitated free-form storage catering for unstructured data storage and handling. The new generation of technologies catering to it are already ensconced with futuristic, explosive, harmonious and wafty-sounding products.

These products are non-relational, schema-free, Not Only SQL (NoSQL) based, open source, vector scalable, eventually consistent, support tuple and open formatted storage, and more. In addition, the hype today is also to pack everything into the cloud using a combination of these technologies, where consumers need not be concerned how and where their data is being processed, stored and managed, similar to the bureau style black-box concept in the old days, but much broader in context.

People can read about all of these technologies, solutions, products and services in a spectrum of IT and business press, and should have been doing so for some time now. But the one thing the 3Vs have in common is that they are not completely united by a single product. There are products and tools that overlap functionality to address the 3Vs, but no single, conglomerate solution, no fusion of 3V technologies that will deliver a one-stop solution against the sporadic assaults of traditional and big data issues that impede the modern business.

Tying the knot

There is a huge need to unite MPIP relational database management systems and NoSQL technologies. So the current trend is to marry the 3Vs into an integrated solution, or seamlessly integrate the MPIP RDBMS and NoSQL technologies, supplemented by the aforementioned capabilities, to handle any data, be it structured or unstructured.

Additionally, one of the bigger issues companies still face is integrating tools, and the research interestingly divulges that almost two-thirds of research participants cite a lack of integration as a problem in analysing big data. While the majority get their reporting and querying done, only a few manage data mining, visualisation, and what-if analysis on unstructured data.

So, while most vendors continue to flail unimpressively against the tide of technologies and attempt to herd waves of software tools and hardware into murky, leaky pools, there are certain vendors now creating interoperable solutions – some to a greater extent than others.

The upshot for businesses is that, although a silver bullet is yet to be delivered, a consolidated big data solution is on the horizon, based – initially at least – on a federated approach.