Simplify and Contextualize Your Data Classification Efforts

In the CISO Desk Reference Guide, I noted how critical the concept of “context” is to security programs. The same holds true for our organizations and their respective privacy programs. The foundation for any privacy program is understanding – and critically documenting – the nature and extent of the personal data (PD), personal identifiable information (PII), protected health information (PHI) and other forms of sensitive personal information (SPI) that the organization collects, processes, shares and retains. This is where context is integral to underpinning the critical work associated with data discovery and data classification. Before delving into discovery efforts, it’s important to tackle the challenges with data classification and data retention.

Similar to data retention, data classification tends to be an organizational hot potato with ambiguous understandings as to who should classify data and, by extension, who determines the data retention period based on the data’s classification.

Declare war on ambiguity
Data classification and data retention practices are made overly complicated and impede both security and privacy programs. In my view, data classification should first and foremost be governed by regulatory and/or contractual contexts. If HIPAA defines PHI, should organizations create their own definition of protected health information? No! If the GDPR defines personal data – as it does in Article 4 – should the organization create a unique definition? Again, no! Let regulations do the heavy lifting of data classification. Of course, there’s nuance here. Definitions of “confidential,” “proprietary” and “sensitive” information should be contextualized to the organization and defined by counsel, the executive leadership team and other identified key stakeholders. Retention periods should follow a similar methodology. If a regulation calls for a specific data retention period, implement it and validate it. Other data sets that are not subject to regulatory and/or contractual obligations should also reflect organizational context and organizational priorities, recognizing that one of the core principles in privacy is data collection and data retention minimization.

Unless there’s a valid regulatory or contractual requirement to collect and retain data, the collection of personal data and its retention should be limited and tied to its stated purpose. Far too many organizations have an unfunded liability on their hands as they retain personal data that is no longer tied to its stated purpose and, if/when there’s a data breach, the individuals behind these records would be entitled to credit monitoring and other recourses, including a personal right of action here in California (based on the California Consumer Privacy Act (CCPA), which will become the California Privacy Rights Act (CPRA) in 2023). Data minimization also frees up storage and administration costs associated with records that are no longer tied to a particular purpose or governed by a mandatory retention period.

Organizational context a must
Our privacy and security programs need organizational context to function correctly and align practices of both programs with agreed-to risk tolerances and organizational strategy. Too frequently, however, both programs function in a vacuum. This has to change. There are some simple, but detailed steps to overcome this disconnect. Fundamentally, we can enrich our respective understandings of organizational context with three tools: business impact analyses (BIAs), data flow diagrams (DFDs) for material systems (defined as those business processes, systems, and/or applications that collect, process, or store data elements that include PD, PII, SPI, PHI, etc.,), and privacy impact assessments (PIAs). Successful security and privacy programs are well-served to devote resources and attention to each of these tools. Let’s highlight why BIAs, DFDs, and PIAs are so powerful.

Business impact analyses are foundational to governance and provide a treasure trove of detail for organizational stakeholders. BIAs should first and foremost capture how the organization derives enterprise value. Effectively, what are the business processes that are most material to the organization? As these are identified, a basic dependency analysis should be performed. Specifically, does the material business process have dependencies upon technology, specific data sets, staffing levels and competencies, vendors, suppliers, locations, or other variables, and how are these dependencies managed? BIAs are foundational to business continuity and disaster recovery planning as they ideally distill recovery point and recovery time objectives for those processes that are integral to the organization and its operations. BIAs also capture department priorities, department context, and broader organizational priorities when responding to security, environmental and other operational incidents that have a business impact. BIAs, similar to PIAs, should be considered living documents that are updated and revised based on changes to the organization’s environment. Overtime, BIAs can become more useful as they add additional context and detail to identified material processes within the organization. While not necessarily at the same level as a process narrative in the context of Sarbanes-Oxley or a record of processing activities as would be required by Article 30 of the GDPR, BIAs should seek to provide a baseline description of the identified business process. Essentially, the BIA can provide an important snapshot of the organization, its business processes, and the general risk and dependency context associated with the same.

Data flow diagrams (DFDs) can be used to highlight how information enters the organization and internally which departments, systems, applications and IT infrastructure process and store that data. DFDs are excellent tools to highlight how data moves into and out of the organization and, from a security perspective, where there are trust boundaries. To be clear, however, data flow diagrams don’t need BIAs as a condition precedent to begin. When I was a research direction and security analyst at Gartner, I spoke to down-and-dirty approaches to building DFDs. As I would tell clients, “Grab a blank piece of paper, draw two vertical lines such that you have three columns.” The left column represents the sources of personal data that enters the organization – be it from consumers or employees – the center column outlines how this information is used internally (processes, applications, systems, and IT infrastructure, data sinks (storage), etc.), and the right column represents those external entities with whom the organization shares personal data information (e.g., data processors). The two lines again represent trust boundaries from a security perspective and critical demarcation points from a data and privacy governance perspective. The BIA and DFDs can be complemented by analyzing data flows in the context of the data privacy lifecycle (e.g., from notice, consent, collection, use, sharing, retention, through destruction). Prior to personal data being ingested, privacy stakeholders should verify if the notice that has been provided is consistent with the contemplated processing activities. The DFD can help identify where consent may be required and noted as either explicit or implicit in nature. The beauty of DFDs is that they are approachable to varied audiences, such as technical teams, executive management, and of course, privacy and security leaders. DFDs do not need to be complicated … again, just start with a blank piece of paper, draw two vertical lines, and start filling in the details.

Privacy impact assessments are the third tool that should be used to help derive additional context related to privacy practices as well as data discovery and classification initiatives for the organization. Basic “observation and inquiry” is a great complement to more technical approaches to data discovery. Privacy leaders need to know the enterprise and meet with department leaders, notably in sales, marketing, HR, and IT to get a better understanding of the nature and extent of personal data collected and used by the organization. PIAs, similar to BIAs, are there to add detail and important context to personal data processing activities. The aforementioned Article 30 of the GDPR offers excellent insights as to what should be captured in these assessments. Most critical is the evaluation of the stated purpose for personal data collection as conveyed in the privacy notice (policy) and whether these contemplated practices are reflective of actual practices. Where there is a disconnect, the privacy leader should raise this risk to other members of the executive leadership team. Section 5 of the Fair Trade Commission Act (overseen by the FTC) explicitly prohibits “unfair and deceptive trade practices.” The FTC assertively enforces actions against organizations that state one thing and do another. PIAs, like BIAs, should be considered living documents that facilitate ongoing awareness of privacy practices throughout the organization and through the privacy lifecycle. Effectively, the privacy leader should continually ask ‘What is it that I don’t know about my organization’s privacy practices that I should know, and how do I determine that status?” BIAs, DFDs, and PIAs are there to help answer that basic question.

The insights derived from these three tools underpin governance and risk management practices for the organization. Moreover, the core body of knowledge that is incorporated into ISACA’s CDPSE, CRISC, CISA, CGEIT and CISM certifications help privacy and security leaders understand the critical linkages between governance and technology, and appropriate risk management for their respective organizations.

Originally published on ISACA BLOG NOW on Feb 2, 2022

Simplify and Contextualize Your Data Classification Efforts

Submit a Comment Cancel reply