“No, no!” said the Queen. “Sentence first – verdict afterwards.”

“Stuff and nonsense!” said Alice loudly. “The idea of having the sentence first!”
The value proposition for data is not in its protection (sentence), but in its use (verdict).
In this series of articles, we’re going to explore an alternate value proposition for data classification and the benefits of thinking of data classification primarily as an enabler for using data rather than protecting data.

In this first article, we’ll consider the fundamental reason that we want to classify data with this mindset.

In the second article, we’ll contemplate how to change the data classification schemes we use to fit our needs.

In the third and final article, we’ll examine the business processes that must change to accommodate our alternative value for data classification.

Now let’s come back and state our value proposition in a business-appropriate manner. The value of data is a function of the value derived by the business from its use of the data, minus the cost of generating, acquiring, handling, and holding the data, while also meeting any custodial requirements. The custodial requirements dictate the lengths we must go to protect it from unsanctioned access or use.

We generate a lot of data. It’s hard to know what to do with each data set. Why not just treat it all the same? If it’s just a matter of money, write a big check and give everyone − our customers, our shareholders, our employees, and our regulators − total assurance that we’re on it and that all the data is safe. Why not just assume the same custodial requirements for all our data and be done with it?
A safe is a useful analogy. One reason to buy a safe is to protect valuable papers and jewels. Sometimes a bank is hired to do this, and the bank puts its clients’ valuables in its safe. These safes are of different sizes, configurations and classifications. We don’t give a lot of thought about why because it’s intuitive. A bank has more valuable objects to protect and needs a bigger safe. A bank also has a higher aggregate total value in its safe, and therefore requires a safe that is more difficult to move and more resistant to cracking. Many banks configure these mega safes to be vaults, entire rooms dedicated to protecting valuables. Some banks layer safes within vaults.

We might choose to put all our papers in our in-home safe. Typically, we don’t do that. We might decide to hire a bank to store all of our documents, including old magazines with articles we might want to read again. We don’t typically do that either. We don’t usually have to ask ourselves what we want in the safe or in the vault. We know the intrinsic value of the papers we own, and we understand the harm that could be done if they are lost, damaged or stolen. We then act accordingly. We know that if we put all of our documents (data) in the safe, we’ll spend a lot of time spinning the dial to lock and unlock the safe, or traveling to and from the bank, rather than using the documents. But we also feel a pull toward putting our grandmother’s diamonds in the bank vault (or at least our own safe), because we don’t want to be the one responsible for losing Nana’s earrings, therefore incurring her wrath. We take our custodial responsibilities seriously.

We know it doesn’t make sense to store a newspaper in the safe, because the content is public knowledge. Assuming we don’t mind ink rubbing off on our fingers, we just want to read the newspaper. If we’re subscribing to the newspaper and have documents or valuables that we use a bank to keep safe, we likely can afford to safeguard more than we do. We don’t because we perform a little calculus for each valuable – how do I use this, and does it warrant special safekeeping?

The same value and cost mechanisms are at work for the data that our organization uses. A crucial piece of the cost equation is the custodial requirement. There are three ways to protect data: we can control access to it, we can obfuscate it, or we can destroy it. Each of these protections comes at a cost, so we need to make sure we ask all of the following questions (and others, below is just a representative list). Making assumptions about the underlying need for these controls costs time and money. However, the controls themselves create barriers, both large and small, to using the data to the organization’s maximum advantage:

  • Do we have to back it up?
  • Do we have to protect it during transmission?
  • Do we have to encrypt it at rest?
  • Do we have to control access?
  • Do we have to monitor access and usage?
  • Do we have to adhere to a policy for data retention and destruction?

Common data classification approaches are 3-tier, 4-tier and 5-tier schemes that provide increasing levels of granularity for non-public information. A typical 3-tier scheme might include public, internal use only, and sensitive. A 4-tier scheme might include public, internal use only, confidential, and secret. A 5-tier scheme might include public, internal use only, confidential, sensitive, and secret. Let’s assume each tier is numbered, starting with 1 for public and ending at 3, 4 or 5. The higher the number, the greater the need for data handling controls.

In many organizations, data classification is done much like the Queen of Hearts decreed. We perform a classification of the sensitivity of the data, which dictates data handling requirements for each data set based on the sensitivity of the data. This classification seems reasonable, but when doing so, we often decouple knowledge of the use of the data (how we derive its value) from the type or sensitivity of the data (prerequisite for designing the protection scheme). This happens because two different teams are involved and for them to coordinate their assessment requires that each have the other’s context. When the data classification effort is undertaken months or years after the business decision to acquire or generate the data, a disconnect is assured.

In many organizations, data classification is done much like the Queen of Hearts decreed. We perform a classification of the sensitivity of the data, which dictates data handling requirements for each data set based on the sensitivity of the data.

Every time we answer yes to one of the questions listed above, we spend money, which decreases the value of the data to the organization. More importantly, we also decrease the operational value of the data to the business because each of these data handling controls comes with an operational burden. Our teams have to spend time and effort to conform to our usage requirements. Don’t use it while it’s being backed up, encrypt it during transmission, encrypt it when storing it on disk, decrypt it when using it, rotate encryption keys, grant access, manage access, keep access logs, analyze the logs, and so forth. Data gains value from its use, not from being hidden and protected.

It follows, therefore, that when we fail to classify data accurately, we build inefficiencies into our data protection processes. We force ourselves to create the equivalent of multi-room bank vaults when all we might need is an in-home wall safe. Often, the inefficiencies built into our data protection schemes manifest as logic errors that allow inappropriate access to our data. This inappropriate access can lead to everything from corporate embarrassment to regulatory sanction to data theft, loss, and destruction. This defeats the purpose of the protections we’ve put in place.

Storage space is so inexpensive that we’ve all become data hoarders. But as the age of the data increases, the cost to store it remains steady or increases while the value of the data is likely decreasing. The value could be decreasing simply because the data is less current, and therefore any insights it provides is less useful. It could also be decreasing in value because the operational burden may be increasing. There is a burden to rekeying, backup schemes, and access controls and as standards evolve, older data must undergo more ETL (extract, transform, and load) activities to continue to include the older data sets with newer data sets for combined analysis.

What is needed is a data classification scheme that has a few additional attributes as well as a mindset shift that allows us to think differently about the data we keep, and which data sets need which protections. We’ll explore these new attributes and this new mindset in a follow-on article in the next issue.

This article was originally published in Cybersecurity Magazine in Summer 2018