Twitter Chief Architect Leaves - But Why The Witch Hunt?

I don’t like witch hunts.   Apart from anything else, they rarely solve the problems they’re purporting to fix; and don’t bring people any closer to understanding the truth of what’s happening in any given situation.   Today, the knives seem to be out for a guy called Blaine Cook, who up until a couple of weeks ago, was Chief Software Architect at Twitter.   For example, Mike Arrington over at TechCrunch wrote the following in a post entitled Amateur Hour Over At Twitter?

It doesn’t really matter if Twitter’s Chief Architect Blaine Cook was fired or resigned. The important thing is that he’s gone now… Cook was directly responsible for scaling Twitter, and he very much failed in his job.

Now, I don’t know Blaine at all.  I don’t know if he’s a talented software architect or not. Neither do I know anything about the ins-and-outs of Twitter’s very public scaling problems.   Was Blaine to blame? Was their former infrastructure provider, Joyent to blame?  Is the Twitter software development team to blame?   Is the Ruby platform to blame?  Is the Rails framework to blame?   Is the Twitter operations team to blame?   Is the Twitter senior management team fundamentally dysfunctional?  I have no idea.

What I’m pretty confident of, though, is that scaling is a strategically important problem for Twitter as a company.   That means more than one person in the company will have been thinking about it, not only the Chief Architect.  And more than one person in the company hasn’t yet figured out how to solve the problems.

In my experience, when you dig into strategically important problems like this to try to find out what the truth of the situation is, there’s one pattern that occurs over and over.  That is - very few people inside the company actually understand what the causes of the problem are, and even fewer (if any) know how to fix it.

Instead, what you tend to find is that different people inside the company have different opinions on what the root causes of the problem are. Some of these opinions are honestly held, but wrong (people who just don’t “get it”); some are not-so-honestly held (political operators who seek to gain advantage from the situation, and/or resist changes which would see them lose advantage).  And of course, there are usually some people who do at least understand what the causes of the problem are, even if they don’t always know how to fix them.  In a company that’s experiencing long-term strategic problems, this last group is often pretty small in size… and often those people don’t have the power to fix the problem.

Now, I’ve said on this blog that I personally find Twitter pretty uninteresting.  Clearly, though, plenty of people love it.   Enough people to mean Twitter still has some time to fix their scaling problems.  Unlike Mike Arrington and others though, I’d be far from confident in thinking that the departure of the Chief Architect will automagically fix the problems.

As a side-note, it’s interesting to reflect just what challenging times we live in, from the perspective of keeping software running as a service.  Ever since the advent of the web, people expect all the systems they have access to, to operate 24/7 365 days a year.  That means no back-up windows.  No restore-from-backup windows. No maintainence windows.   No unscheduled downtime expected.

It used to be the case that dev and admin teams at least had over-night or over a weekend to implement changes to mission-critical systems (and back out the changes if things went wrong), or to take systems off-line to run back-ups or restore from back-ups.  Keeping systems running 24/7/365 is no mean feat, because the truth is: all complex software applications have bugs; all operating systems have bugs; and hardware fails.  All of which means, things go wrong sometimes, especially when financial resources are contrained.  Like I say, challenging times!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*

*