CISO DRG Vol 2: Chapter 16 – Recovery and Resuming Operations

← CISO DRG Vol 2: Chapter 17 – The Aftermath: Forensics and the Value of Post-Mortem Reviews CISO DRG Vol 2: Chapter 15 – Incident Response and Communication →

Introduction

There is a fine line between incident response and recovery and resuming operations. To some extent, that line is only academically useful. The authors have covered many of the discrete activities in resuming operations in Chapter 14. Nonetheless, there is some discipline that is helpful in the immediate aftermath, both to make sure the incident is really resolved, and to learn and improve for a better response to the next incident.

Bill highlights two discrete activities that can be thought of as specific to resuming operations. First, it is important to realize that outside of the family of ransomware attacks, a major objective of a modern attack is persistence. Verifying that the recovered asset is truly back to acceptable baseline takes planning and diligence. Second, as is the case while the incident is underway, communication during the recovery phase is also critical. All stakeholders, including customers, suppliers, law enforcement, and employees, need to know what is expected of them.

Matt takes the reader through a hypothetical situation that a healthcare provider, in this case a hospital, might face. He recognizes that for many people reading this book, you might not have been through an incident before and may not have inherited a mature program. He uses that hypothetical to challenge the reader to be capturing lessons while in the moment with an eye toward building the muscle memory that the organization will need to improve operational resilience.

Gary provides a series of planning guides to help the reader prepare for the inevitable and then walks the reader through the activities. The reader should find it helpful to see how the planning is put to use and benefit from the reminders about critical information to capture in the moment. As Gary has pointed out throughout his essays, the CISO can never stop learning. That learning discipline is what allows the CISO to continue to push their organization to improve.

Some of the questions the authors used to frame their thoughts for this chapter include:

♦ What steps should an organization take to prepare for a data breach?

♦ During a data breach, what operations should the CISO be aware of and possibly manage as a member of the organization’s business continuity effort and leader of the incident response team?

♦ What steps should be followed to resume normal operations and resume data breach management efforts?

Getting Back to Business – Bonney

Now that the incident has been detected, contained, and eradicated, it’s time to recover and resume operations. It’s important to distinguish between recovering the business process and recovering the asset. Certainly, many business processes will be entirely dependent on the availability and integrity of a specific set of critical assets. But keeping the focus on the business process as your key recovery objective will allow you and your organization to make crisper decisions about when to use backups, alternative sites, or other options defined in your recovery plans.

As with other disciplines that we’ve discussed, some of the ground we’re going to cover in this chapter has traditionally been within the CIO’s purview. But as we’ve stated before, in today’s digital business world the most likely cause of downtime requiring recovery operations are cyber-related events, and that’s going to place the CISO front and center. It’s important that the CISO can take responsibility as needed and is working with the same recovery objectives as the CIO.

Planning and Preparation

Here again, the planning you have done in preparation for recovery is critical. We have already established that incident response does not begin with the incident. It begins in the preparation phase when you are taking inventory of your business processes and systems and creating RTOs, RPOs, and the sequence of eventual recovery activities. Each business process should have a runbook, validated by the business process owner, that details how to recover the business process, including decision criteria for asset recovery versus switching to backup or alternative assets.

It is critically important that the business process owner is intimately involved in the creation of the recovery runbook and the execution of the recovery runbook. The business process owner will need to balance internal stakeholder and external customer expectations regarding service delivery and contractual obligations for uptime and service availability. They will do this by using the RPO and RTO referenced in Chapter 14 as guideposts for prioritizing recovery activities and deciding between restoring primary assets versus switching to backups.

Another key aspect of your preparation activities is making sure your executive team knows that you are constantly working on incidents. They need to understand that you are continually evaluating log files, investigating outages, and tweaking your monitoring tools. Your executive team should know how incident response works and that it is part of normal activity. You’ll want to present it as a routine activity and a continual process that addresses high-level investigations and specific incidents and outages. Reporting on some amount of the activity on a regular basis will help familiarize them with the work that will be required while recovering from high-profile events.

Having the executive team receive these periodic reports, act on them, and participate in communications and recovery activities will prepare them for the more challenging high-profile events, when you will need their support and when it’ll be vital for them to pitch in by working their human network.

The reason this is important is that when we are stressed we rely on habits; quick, easy-to-remember responses are best for stressful circumstances when we are under pressure. The reasons that airlines trust pilots with ever more complex aircraft flying more passengers over greater distances as they gain experience and the military drills continuously are to form habits that will take over in times of stress. For your executive team to react in a positive and supportive manner and not distract the team with knee-jerk reactions, they need to be part of the routine incident management process.

Recover and Resume

The recovery steps include restoring the assets, validating the assets, determining when to place the assets back in service, monitoring the assets, and communicating the status, both at the business process and incident level. Restoring the assets will be the responsibility of the business teams and the IT team, but the CISO and the Information Security team also play critical roles. As you bring assets back online, InfoSec needs to assist with validation and monitoring.

However, before any of these activities can take place, it is essential that your organization’s process for determining the regulatory or contractual impact of the outage or disruption is executed to catalog and, if necessary, that you sequester all assets needed for forensics activities and follow-up analysis. This review can be required to assist with regulatory action (for instance, a record request for a high-profile breach or outage) or to help the organization with its defense against any litigation instigated by authorities, customers, or partners. It is more than a matter of convenience. In many cases, the regulatory obligations under which you operate or the contracts that specify the services you provide to key customers spell out the need to preserve records and evidence and the failure to do so can potentially subject the firm to additional legal jeopardy.

Here again, it is critical to work with your legal team to appropriately handle records and systems, make detailed notes of what, if any, compromise has taken place against sensitive records or systems, and ensure you can complete any subsequent analysis. At a minimum, copy all logs and all records involved in the incident, and preserve the state of any systems (do a snapshot of virtual machines, for instance) involved. Care must be taken to handle sensitive records according to the appropriate data handling policy, even (and especially) when systems are technically offline.

For example, a simple downtime event can turn into a breach notification event if recovery personnel inadvertently review restricted PHI records while reviewing for record integrity. Certain designated personnel are likely empowered to execute specific pre-approved record integrity validation routines. Make sure this is how the records are validated, so you don’t run afoul of data handling regulations. Remember that when offline, the application safeguards you or the vendor designed into the system may not be functioning. Without these controls, you may inadvertently expose records to inappropriate personnel. Make sure to account for this with your incident response and recovery runbook to avoid adding to your list of problems.