By Richard Bejtlich, Principal Security Strategist, Corelight
Earlier this month during Black Hat I had the good fortune to speak with Gary Reiner, a business leader for whom I have an immense amount of respect. Gary was the chief information officer (CIO) at General Electric (GE) for 20 years, and as such he was the last stop in my management chain before the then-CEO Jeffrey Immelt. In addition to my boss Grady Summers, our chief information security officer (CISO), Gary had the most managerial impact on the success of the GE Computer Incident Response Team (GE-CIRT), the unit I led and which we declared initially operational capable (IOC) on January 1, 2009.
I thought that Corelight blog readers with a leadership and management mindset might enjoy learning from the three most important decisions Gary made with respect to the success of GE-CIRT. I incorporated themes from these decisions in chapter nine of my fourth book, The Practice of Network Security Monitoring, but I wanted to hear the other side of the story: what was Gary thinking?
Gary is now an operating partner at financial services firm General Atlantic, and sits on the boards of companies like Citigroup and Hewlett-Packard Enterprise. I prepared my three questions and took notes on his answers, forming the backbone of this post.
As a brief background, please know the following: in the late 2000s, the time our story takes place, GE was a global conglomerate, sitting in the Fortune Five and consisting of dozens of businesses with over 300,000 employees. Gary was the company CIO reporting to the company CEO, but each business unit had its own CIO reporting to a business CEO. These CIOs had dual-reporting duties, such that they worked for both their business CEO and Gary. GE-CIRT was the company CIRT, but worked with a multitude of business IRTs as well.
At the time of publication, GE is not a Corelight customer.
The One Hour to Containment Directive
Early in GE-CIRT’s existence, and prior to IOC, we had a small number of incident handlers (IHs) performing the collection, analysis, escalation, and resolution of security events. During one of our briefings with Gary, sometime in 2008, he asked my team how long it was taking us to accomplish our mission. We described the problem and told him our current metrics. Gary looked at us and said “one hour.” After mentally collapsing to the floor and standing up again, I asked him to clarify what he meant. Gary said that for high-priority incidents — what we were already calling “advanced persistent threat” campaigns — he wanted the detection-to-containment cycle completed in one hour or less.
In our conversation during Black Hat last week, Gary explained that the one-hour directive was a natural result of GE’s management culture. The company had a bias towards time-based metrics such as the time to take an order and deliver a product, or the time to resolve a customer’s complaint, or the time to deploy a new server. The goal was to constantly set ambitious goals for performing these tasks, without directing how a business unit should accomplish the mission. This allowed businesses to exercise creativity and foster innovation. Gary noted that Amazon has exhibited a similar focus on time-centric metrics, particularly with respect to their delivery goals.
By telling the GE businesses that he expected them, working in concert with GE-CIRT, to detect and respond to APT incidents in an hour or less, he set the goal but not the method. As Gary expected, asset owners responded in a variety of ways to his directive. Some built up their Business Incident Response Teams (BIRTs). Some hired more contractors. Some retired swathes of old computers that were constantly being compromised. Others disconnected systems en masse from the network! After eight or nine months, GE-CIRT was pleased to report that the company was, as a whole, achieving the one-hour goal.
The Exception Process
The business CIOs did not welcome Gary’s one hour directive. Some claimed that it was draconian to remove a compromised asset from the network, despite the risk it posed to their own digital assets, let alone assets owned by other business leaders. By “compromised asset,” I mean a computer under active control by a nation-state adversary. The need to differentiate among various sorts of intrusions was the reason I wrote my 2009 post Information Security Incident Rating, and I was referring to “Breach 3” or worse intrusions when encountering business resistance.
Upon explaining this problem to Gary, and with a great deal of support from Grady (the CISO), Gary established an “exception process” for containment. When notified by GE-CIRT that an asset was compromised, business leaders could request an “exception” to the “remove from the network” directive. However, once a month business leaders who had exercised the exception process had to explain to Gary why they had taken that action.
This may seem like a trifling matter, but I will try to explain the profound dampening effect this briefing requirement had on exception requests. Gary is a very pleasant person, but you did not want to find yourself in his office explaining why you were exposing your business assets, and those of the rest of the company, to theft, alteration, or other calamity by a nation-state intruder actively operating in your environment.
Gary explained to me how he had invented the exception process. At GE he had championed a process called “lean, then digitize.” No one in information technology was allowed to automate or digitize a process which had not been “leaned” to remove inefficiencies and poor quality. Business leaders had a tendency to want to automate inefficient processes, and business CIOs found themselves under pressure to serve two masters — their business CEO, and Gary, the company CIO.
To ensure that business CIOs would not ignore his “lean, then digitize” directive, Gary told the business CIOs (who had a dual-reporting chain of command to their business CEO and the company CIO), that he would fire anyone who digitized a non-lean process. This directive guaranteed that the business CIO would escalate any real conflict between business and information technology to Gary’s level, where he could make his case to the business CEO.
The exception to the one hour directive was born out of this dynamic. By requiring business CIOs to explain to Gary why they had ignored his requirements, he ensured they would not simply ignore him and keep compromised assets online, in accordance with some of the business CEO preferences. Gary explained to me that he aimed to create a balance between the needs of the business leaders and the IT leaders.
Finally, and most importantly, the exception process gave GE-CIRT the authority to direct business IRTs to remove compromised assets via network containment. This process severely hampered the adversary’s ability to complete his mission. Note that we did not exercise this authority as an ad-hoc “whack a mole” affair. GE-CIRT directed containment when it met the characteristics we had previously defined via playbooks, threat intelligence, and counter-campaign operations.
The third issue I discussed with Gary was his requirement that members of the GE-CIRT, along with our CISO Grady, periodically brief him, without business CIO or management involvement. These were very frank discussions, sometimes in person, that involved a few incident handlers, myself, Grady, and Gary. He was intensely interested in knowing exactly what was happening in the environment, and he asked the very people detecting and responding to intrusions to share their findings and thoughts.
Gary explained that he learned this technique from the famous GE CEO Jack Welch. Jack relied on a network of contacts throughout GE to tell him what was happening on the ground. Gary told me that Jack had learned this management technique from a fellow executive. This CEO had shared a bit of wisdom with Jack: “you’ll be the last guy in the company to hear bad news.” Gary did not want to read sanitized reports from intermediaries. He wanted ground truth.
I recall someone asking Gary how he felt now that GE-CIRT was handling a variety of incidents at GE. The person asked “how can you sleep knowing what is happening?” Gary replied “I couldn’t sleep before, because I didn’t know what was happening. Now that I know GE-CIRT is handling these incidents, I sleep very well!”
Gary told me that gaining access to bad news is a quintessential “big company problem,” and that leaders must find a way to gather and act on bad news as efficiently and effectively as possible. Developing a trusted network of sources across the company is one way to collect this vital business intelligence.
There are many other aspects of the dynamics of my time at GE working with Gary that I would enjoy discussing, but these key thoughts — the one-hour directive, the exception process, and direct briefings — formed the basis for transforming the GE security experience. I hope that sharing them with you has provided a few ideas for how innovative management and leadership techniques can empower your incident detection and response processes.