May 26, 2010

Don’t SPOF me, Bro’!

Posted in Financial Markets, Foolishness, Social Commentary tagged , , , at 4:49 pm by Doug Brockway

A SPOF, or “single point of failure,” is a part of a system which, if it fails, will stop the entire system from working. They are undesirable in any system whose goal is high availability, be it a network, software application or other industrial system.

Sometimes high availability is desirable as every minute a system is down money is lost.  If an airline’s reservation system is down they still own the planes but they can’t sell the tickets.  Sometimes high availability is desirable because having the system fail also means significant additional costs.  The owners of Three Mile Island and BP both understand this all too well.

Some in management understand the dangers and the costs of SPOFs quite well.  This brings us to two stories about Edward Crosby “Ned” Johnson 3rd, the CEO of Fidelity Investments.  Johnson’s reputation (I’ve never met him) is that of a particularly hands-on, operationally knowledgeable person.  In the early 1980’s Fidelity used to have its main data centers in Boston.  Ned was told by the manager of a data center there that it was secure, that no one, not even Ned, could get unauthorized access.  The story goes that Ned climbed over the computer window desk and into the data center.  Having made his point security was upgraded and over the next few years the data centers were moved out of the city center.

Not too many years later Fidelity established a facility in the Dallas area, including a data center.  Mainframe computers of the time, and for the most part now, were cooled by the circulation of water through pipes inside the machines.  On a tour of his new data center Johnson is reputed to have asked to see the back-up to the water supply.  Having seen that he asked to see the back-up to the back-up.  There was none.  Ned asked how much water was involved.  Given a number he concluded it was about the volume one has in a reasonable swimming pool so a pool was constructed on-site, for employee and family use, attached by pipes to the back-up to the water supply.

I don’t have the data on the volumes of trades or dollars that Fidelity was managing to in those times.  Suffice it to say that as the dominant, the largest mutual fund company, trade and dollar volumes were, and remain, quite high.  Johnson knew the near term value of lost transactions and the long-term reputational risk and revenue risk if his customers came to believe Fidelity was not a reliable partner.

He likely used different words but his message to management was, “Don’t [SPOF] me, Bro’!”

Wouldst that the management of BP took the same view in the Gulf of Mexico.  As I write they are attempting the “top kill” method to stop the gushing oil leak that has plagued us for over a month.  They had a controller on the blow-out preventer fail and no plan to deal with it.  They were drilling five miles down “where no man has gone before” and no at-hand, tested solutions to outages that are reasonably expectable.  The now famous Minerals Management Service blithely gave BP authority to go ahead without the plans and reports that are designed to prevent or control for disasters.  One can only wonder what any of these people were thinking.

One thing is clear, they weren’t thinking “Don’t [SPOF] me, Bro’!”

Causal factor of unavailability
Lack of best practice change control
Lack of best practice monitoring of the relevant components
Lack of best practice requirements and procurement
Lack of best practice operations
Lack of best practice avoidance of network/system failures
Lack of best practice avoidance of internal application failures
Lack of best practice avoidance of external services that fail
Lack of best practice physical environment
Lack of best practice network/system redundancy
Lack of best practice technical solution of backup
Lack of best practice process solution of backup
Lack of best practice physical location
Lack of best practice infrastructure redundancy
Lack of best practice storage architecture redundancy
Advertisements

6 Comments »

  1. […] This post was mentioned on Twitter by Sarah Conroy, Doug Brockway. Doug Brockway said: New blog post: "Don't [SPOF] me Bro'!": Ned Johnson at #Fidelity, and #BP in the Gulf of Mexico: http://wp.me/pTZFi-1p […]

  2. And the gift that keeps on giving continues… This became public the next morning. Its about taking a less expensive option now and accepting a SPOF which looks to be a cause of the oil leak at the bottom of the gulf: http://www.nytimes.com/2010/05/27/us/27rig.html?hp

    WASHINGTON — Several days before the explosion on the Deepwater Horizon oil rig, BP officials chose, partly for financial reasons, to use a type of casing for the well that the company knew was the riskier of two options, according to a BP document.

    The Deepwater Horizon ablaze on April 21, the day after the oil rig exploded. The concern with the method BP chose, the document said, was that if the cement around the casing pipe did not seal properly, gases could leak all the way to the wellhead, where only a single seal would serve as a barrier.

    Using a different type of casing would have provided two barriers, according to the document.

  3. Paul Clermont said,

    For Want of a Nail

    For want of a nail the shoe was lost.
    For want of a shoe the horse was lost.
    For want of a horse the rider was lost.
    For want of a rider the battle was lost.
    For want of a battle the kingdom was lost.
    And all for the want of a horseshoe nail

    —medieval, author unknown

    Some things are just too hard to learn, I guess!

  4. Frank said,

    I had the pleasure of supporting Mr. Johnson in the late 90’s. Very personable gentleman, although he had high standards……..SPOF;s were no no’s, they were not tolerated (nor should they have been) Fidelity was a great place to learn “best practices” for both design and support of sustainable systems…..

    • Yes, “Mr. Johnson”, is something of a role model. As far as I can see his questions, occasionally as detailed as the back-up for the back-up for the coolant water in Dallas, are not meant to be challenging in any way personally. His intent appears to be to make sure the organization thinks clearly and deeply. The result one hell of an asset and a lesson many others should learn from.

      Does anyone ever call him “The Nedster”?… 🙂

  5. Frank said,

    No, I wouldn’t dare, he was Mr Johnson. But his son, Ned IV, was known as Eddie Quatro!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: