It's all about the data

By | March 31, 2014

dataI wrote a post 9 months ago entitled Data $5bn in which I mused about the benefits of being able to put a value on the data that an organisation holds. Perhaps that value could then be included as an asset on the balance sheet, in a way that goodwill or brand sometimes is. I thought it a good post but it didn’t get many eyeballs (hint – go and read it and then come back here) so I am returning to this theme in this post but in a broader context – that of the importance of data management and some of its properties that need consideration.

Ok, so now you’ve read that one.

Did you like it? Hope so. Anyway, let’s crack on with this one.

Storage is a means to an end. Why do you implement storage arrays at all? Essentially it is to manage all the data that your stakeholders create and to do so in the most effective way possible: effective from both a cost and a performance perspective. The relationship between storage systems and data management is therefore intrinsic. All of that is obvious.

When the majority of our customers look to purchase a storage system they tend to have similar criteria. The main buckets are:

1) Performance – will it give me the throughput and the latency that my users need in order to get access to the data they want?

2) Reliability – how often will it break down? how often will data be unavailable if at all?

3) Scalability – how many disks can I add? how much data can it store?

4) Ease of Use  – how complex will it be? how can the data I store on it be tracked, backed up, restored etc?

These are complex storage issues which big storage vendors have been addressing for 30 years or more:  storage and data management, intrinsically linked as I said before. However when I think about storage today, I am drawn much more to the latter than the former. Certainly storage hardware vendors have differentiated technologies that provide the bedrock for data management, but it is in the complexities of the data management layer where I believe the true action lies and differentiation will be observed.

Deriving value out of data is a complex task and one that requires sophisticated enterprise-level data management software. This is apparent right now but will become even more obvious as cloud architectural models become ever more sophisticated and ubiquitous. In the world of hybrid cloud for example, a lot of attention has been focused on the movement of workloads from one cloud to another.  The ability to move an application from one service provider to another or from one private cloud to a public cloud is one of the main attractions of a hybrid cloud model. What tends to be over looked in the discussion though is the data that is associated with the workload and how that moves through this ecosystem. My colleague Phil Brotherton has written eloquently about what he calls ‘the value of data control’ in the cloud and why choosing the right partners to deliver a hybrid cloud is essential if data stewardship issues are to be fully addressed.

I see a number of key issues today around data, exacerbated by the new cloud paradigms, that are vexing the minds of IT professionals the world over. Here are just a few to begin with.

Data Sovereignty – Data stored in a country should be subject to the data laws prevalent in that country. This is especially acute for customer data and many countries have amended their data laws to enure that customer data created in-country stays in-country. This can be difficult to regulate as workloads and their data are moved to the cloud, especially in a public cloud model. There is an element of trust of the service provider that is required.

Data Gravity – Moving data about from one platform to another is problematic. Data storage is persistent and resides some physical place unlike an application that is being processed at the compute layer or data that is transferred over a network. In essence, data has weight and data movement takes time.

Data Replication – Allied to the movement of data question. Data needs to be replicated for a plethora of reasons such as backup and recovery, high availability, compliance obligations etc. The legality of where copies are data are stored is an interesting question related to the data sovereignty issue noted above.

Data Privacy – This needs little explanation. Data privacy laws are continually being updated (and usually getting tighter). Cloud service providers, whether public, private or hyperscalar need to be as cognizant of the need for data privacy just as much as enterprises running on-prem data centers. If anything they need to be even more vigilant given their systems are often multi-tenanted, storing data from a large number of customers, some of whom may even be competitors.

Data Governance, Data Stewardship and Data Custody – All roughly the same broad topic. Data, especially in the context of an enterprise, needs to be governed properly. Auditable processes need to be established and individuals held responsible for following them.

Data Security – IT security as an overarching topic has been at the top of CIOs agenda for the last several years and I doubt it will ever drop off their lists. As we start to employ more cloud based architectural paradigms, the IT security issue will only intensify. Data protection and anti-data leakage technologies will continue to be essential in protecting the integrity of data, whether held in on-premise data centers or in the cloud.

Data Escrow – What happens to your data when your cloud service provider goes belly-up? Getting it back came be very expensive – read what happened when 2e2 shut its data center last year or Nirvanix, a cloud storage vendor who went into administration last year giving its customers two weeks to retrieve their data (at their own expense). The lesson here is if you outsource you data processing provisioning to a service provider, you do not outsource the ownership of the data nor your responsibility. As an old boss of mine used to say “there’s a fine line between delegation and abrogation of responsibility”. After looking up the word I understood what he meant about crossing that line.

Data Classification – Not all data is created equal. Being able to classify data and apply suitable policies to the treatment of that data is essential. This actually is the higher order capability, and the basis for really deriving value out of the data, allowing data analysis technologies do their work.

In summary, data management is set to be an extremely critical area of IT over the next few decades. It isn’t just about the vast volumes of data that we are now seeing (and with the Internet of Things and the tsunami of data from connected devices, it can only get more intense),  it also about the abstraction of many storage capabilities from hardware into software and the emergence of the so-called Software Defined Software platform.

This is why a cloud data operating system like NetApp’s Clustered Data ONTAP is so essential and its power is now being realized many thousands of customers.




Leave a Reply

Your email address will not be published. Required fields are marked *

one + 5 =