Sunday, July 10, 2011

Not All Early Optimization is Premature

Andrew Parker and Jeff Atwood have both had great posts recently about performance as a feature, but I think they've each actually stopped short of a powerful point - improving a product experience by 10 or 20 percent through optimization is great, but there's incredible power when you unlock fundamentally different feature sets through radical optimization. This is an area that the code-first-then-optimize process misses entirely, because incremental improvements on the same basic design will never lead to order-of-magnitude performance improvements.

The power of such performance improvements is one of the most important lessons I learned at Google. A couple examples demonstrate the kind of optimization I'm thinking of:
  • In 2004, when the standard storage for webmail was 2MB, Google was able to launch Gmail with 1GB of storage, because GFS provided a means for managing disk that was orders of magnitude cheaper than what the other providers were using. Rumor has it that Yahoo went out and gave NetApp millions of dollars to buy storage devices in order to come anywhere close to what Google was offering. Underlying this all is an optimization of storage and disk that was deeply more efficient that what others in industry were capable of at the time.
  • One of the coolest features of Google maps is the ability to see a route, then grab it with your mouse and drag it to change the route. That feature is possible because Google developed a radically more efficient route-finding algorithm, years ahead of what anyone else in the market can offer. The difference between computing a route in 1 second and computing it in 10 milliseconds means you can suddenly offer users the ability to compute hundreds of times more routes.
It's part of the modern software engineering zeitgeist that "premature optimization is the root of all evil," but as I researched this post, I found out that the full Knuth quote is a lot more illuminating than just that snippet; the full statement attributed to Knuth is actually, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" (emphasis mine).

So yeah, we can all agree that "performance is a feature," but that fails to convey the power of high-performance systems. We can talk about caching database results or using a CDN for static content, and everyone should be doing those things, but let's not be afraid to go much, much deeper. Consider the core operations of your service, then imagine that you could speed them up by 100x - what radically new features would be enabled? With those radical new features in mind, start working backwards to figure out actually make those 100x improvements.

The big challenge is that these order-of-magnitude optimizations are design optimizations, not the kind of changes you can make after the fact. In design discussions, the engineer arguing for keeping the entire datastore in memory is immediately shouted down with the "premature optimization" line, but I think it's time we start fighting back on behalf of design-time optimization.

Monday, May 23, 2011

My hardest bug

Lately I've been thinking a lot about the question "What is the hardest bug you've ever tackled?" It's an interesting index into the interests and style of different software engineers, and usually provides a jumping off point for a really great conversation.

With that in mind, I figured it was time to write up my hardest bug. This is a pretty old one, but I still enjoy it for just how head-slappingly bad the eventual resolution was.

Early in the year 2000, I was working on a cool cross-platform data warehousing product. One of the coolest things about it was that we had a vendor-independent SQL macro layer, so normal developers could write in one SQL dialect, and it would automatically translate to each of the underlying platforms (eg, SQLServer, Oracle).

My first project was to port the reporting engine to DB2; it was relatively straightforward, it just required figuring out some of the idiosyncracies of DB2, auditing the code to match those, and getting the regression tests to pass. It was a fun project and shipped without much trouble.

A couple months later, though, our largest DB2 customer called in with major complaints about reporting performance. This was my responsibility, so I took a copy of their logs and started digging in. The weird thing was, the logs showed that all the database queries were running very quickly, and yet, periodically, there were these weird 30 second gaps in the logs, as though nothing were happening at all.

I instrumented the code a bit more, shipped a patch to the customer, and asked them to send back new versions of the logs. Everywhere I had added log statements continued to show things running smoothly, and yet, there were still those odd gaps where nothing was happening.

In desperation, I asked the customer to send me a full copy of their datamart, which they did, and after taking a day to get the system up locally, I tried out the problem operations, and they ran perfectly fine - no performance issues, no weird gaps in the logs, nothing. Now I was really annoyed.

I returned to instrumenting the log files, adding a log statement after nearly every line of code. When the customer sent me the output of this very detailed logging, I could identify the exact line of source code that corresponded to the gaps, but I looked at it and said “there’s no way this could be taking 30 seconds, all it’s doing is macro replacement from our SQL meta-language to DB2 - it’s a straight string substitution.”

Still, I dug into it a little more, and that’s when my mind was blown. It turned out that the developer who had ported the macro language to DB2 had changed the string-replacement function to do a database query as part of the replacement step. The query had a performance profile that would grow as the square of the size of a certain internal log table, so the longer a system had been in production, the longer it would take to do string substitution.

It hadn’t shown up in the local test I did because I had only replicated the warehouse data, not the operational data, and it didn’t show up in the logs in production because the developer had intentionally covered his tracks to route around our built-in query logging.

Once we hit on this, it was a matter of minutes to refactor the n^2 query and ship a patch to the customer.

Sunday, April 10, 2011

Making Effective Use of Code Reviews

I’ve been reading chunks of Coders at Work this weekend, and the topic of code reviews has come up a few times. Code reviews are clearly a very useful development technique, but it can be tricky to apply them in a way that improves code quality without slowing productivity.

Poking around the web, I don’t see a lot of great writing about code reviews, so I wanted to share the guidelines we use at Yext for effective code reviews. The guidelines below are captured from an email I sent to the the engineering team almost a year ago, and I’m proud to say that our code reviews do a lot to improve code quality while keeping the team operating at peak efficiency.

These guidelines are heavily biased by my experience at Google. There, I saw how code reviews could identify and eliminate many preventable bugs, including many that the original developer never would have found. I also saw innumerable cases of reviewers who lost perspective on the larger goals of the team and the company, and thus acted to prevent progress on important projects as a result of matters of personal preference or sheer obstinance. My goal for Yext is that we capture the best aspects of code reviews while eliminating the worst.

With that in mind, my recommendation for code reviews is that they address the following points:
  • Correctness: Does the code do what it claims to do? Is the code correct in both the nominal case and the boundary cases? As a reviewer, this is your opportunity to point out edge conditions of which the original developer may not have been aware. An important special circumstance is when you may be aware of legacy systems or features that interact with the modified code in some non-obvious way.
  • Complexity: Does the code accomplish its task in a reasonably straightforward way? If you can point out simpler approaches that do not compromise the correctness or performance of the code, you should.
  • Consistency: Does the code achieve its basic goals in a way that is consistent with how similar code in the codebase achieves those goals? Is it re-using the available libraries and utility classes? Where possible, has code been refactored for re-use instead of just copying and pasting?
  • Maintainability: Could the code be extended by another developer on the team with a reasonable amount of effort? More than any item on the list, this is the karma investment you make by doing code reviews – the code you review today may be the code you have to update tomorrow, so taking the time to make sure it’s maintainable by others pays itself back to you.
  • Scalability: Will the code be performant at the expected volumes? It is important that this question always be asked in the context of expected volumes. When building a new product in an untested market, it is fine to write code that works for 100 users but not 10,000; if the product should be that successful, you can profile, optimize, and, when necessary, re-write the critical bits. The corollary is that you should not spend time optimizing code when the market demand is unproven.
  • Style: Does the code match the team style guide? This should rarely be controversial. The obvious assumption here is that your team should have a style guide.
There are some items I believe should only rarely be addressed during a code review:
  • Scope or mission feedback: “I don’t think you should be doing this project” is almost never a useful comment for a code review. If you think the team is embarking on projects that are not worthwhile, that is great feedback to share, but not in the context of a code review. The exception here is if someone is introducing a new way of doing something that is already well-handled in some other way.
  • Design review: A code review is not the time to evaluate the overall design of a project. For example, "I don't think you should be using the DB to store this data" is not useful. It is incumbent upon the developer to have their designs reviewed before implementation, and there will be scenarios in which the fundamental design is questioned during the implementation, but for a project that has been through a design review, let the results of that design stand.
  • Personal preference: “I would rather you do it my way” is an invitation to an unproductive debate. If you have a way that is demonstrably better, you should always argue for it. The hardest part about this point is identifying when a review has deteriorated to matters of personal preference; the hallmark I spot most often is when people are trading hypothetical scenarios in which alternative solutions might be advantageous, with no way of determining the likelihood of said scenarios. In these cases, the default is to use what the developer has already written.
How can you, as the developer, write your code in such a way as to make a code review? A few simple practices help.
  • Correctness: Comprehensive unit tests are the best demonstration that code functions as intended.
  • Complexity: Favoring small methods and cleanly-separated functional units makes it easy for your reviewer to see how everything fits together.
  • Consistency: When building new functionality, you can maximize the consistency of your code with existing work by taking the time to research how similar code solves similar problems. If you suspect someone else has solved the same problem before, ask!
  • Maintainability: Thorough commenting and the use of meaningful names throughout your code help ensure that others will be able to easily understand your code.
  • Scalability: My #1 recommendation in demonstrating the performance of new code is to just take 30 minutes and write a little driver to run your code through its paces. This can be total throwaway code, but simply being able to tell your reviewer that you’ve done a performance test makes this topic less debatable.
  • Style: The most important thing you can do to maintain style consistency is to configure your editor to implement your style guide. (As an aside, this also means that your team should adopt a style guide that is simple to automate in the editors used by the team.)
Even when everyone on the team follows these guidelines, there will frequently be strong debate during code reviews, and that’s a great thing - the point of these guidelines is to focus the debate on what matters.

All of this ignores some very tactical questions about code reviews like what code gets reviewed and what tools we use to aid that process. If you’re interested in hearing more about that, leave a comment and I will follow up with another post.

Sunday, March 13, 2011

The Dark Side of Passion

There was a lot of great response to my post about the Passion Gap, but some people misunderstood it to mean that I fall in with the camp of career counselors who tell you to “do what you love and the money will follow”. While I believe that doing something you love is important, I also strongly believe that following your passion must be grounded in reality.

This first became clear to me when was in high school. Sometime during freshman year, they marched us all into the career center and made us look up the key facts on our ideal jobs. We had to find out good college majors to prepare us for those careers, the demand for people with those jobs, and the average pay we could expect. I looked up the job I’d be dreaming about since age 9, and I was incredibly disappointed at the average salaries. When my father was pushed into early retirement while I was a senior in high school, the cold hard reality of making a living was a key input in my eventual decision to study computer science.

To be completely clear, I’m incredibly happy with what I do - I love going to work every day. But I also love providing for my family.

Moderating your passion with realism is not at all inconsistent with having a deep enduring passion for your startup. Even Dennis Crowley, protagonist of the Passion Gap post, is on the record saying that the one thing he most wants to be good at is … karaoke. But belting out Whitesnake just doesn't pay the bills.

The dark side of passion is blindly following a passion that won’t support you. This post at Study Hacks goes so far as to make the argument that the “follow your passion” culture is responsible for the fact that self-reported job satisfaction rates have fallen every year since measurement started in 1987 - the argument is that because people are so focused on the perfect job, the one they can be most passionate about, they are ultimately disappointed by having simply a great job.

The unrealistic view that it’s sufficient to simply be passionate about something and not match that passion to reality is captured in this op-ed from the Times in 2009, in which the author describes a difficult search for work as an art instructor, despite qualifications including an MFA. The author admits, “In my master’s program, we … tried not to dwell on earthy, unpleasant topics like money, or how to make it.”

As I said in my earlier post, I deeply believe that passion can be a competitive advantage for startups, and I also think it’s crucial to success in most everything else, but that passion must be channeled in a way that is connected to reality.

Wednesday, March 09, 2011

Foursquare, Facebook, Founders, and Passion

Late in the summer of 2009, I was talking to a very successful entrepreneur at a tech industry meet-and-greet when Dennis Crowley, co-founder of Foursquare, came into the room, at which point the person I was talking to commented, “I first met Dennis at 3am on a street corner on the lower east side. I have never met another founder who is a more direct physical embodiment of their startup.”

That comment always comes back to me when I hear someone saying that a particular startup “is just a feature,” or, “has no defensible technology,” or, “could be replicated in a weekend.” There are a lot of startups out there that are subject to this critique, and yet, they continue to do extremely well in the face of competition. And I think, in many cases, it derives from the fact that these founders are the physical embodiment of their startups.

In the case of Foursquare, we should expect by now that Facebook Places would have obliterated the use of Foursquare. And yet, as SAI points out this morning, Foursquare’s user base has doubled since the launch of Facebook Places. That’s pretty stellar growth in the light of competition that should be winning on every dimension. And that’s before the release of the awesome goodness that is Foursquare 3.

Why does Foursquare just keep winning? If I could tell you the specific features or user interactions that are lacking in Facebook Places, I’d be a rich man. I’m in the Facebook mobile app 3 times a day, minimum, let alone being on Facebook over the web at least twice a day - I should be using Places all the time, and yet I just keep coming back to Foursquare. Somehow Foursquare just hums.

This, to me, is the difference between a product built by someone who is deeply invested in the in the underlying product idea, as compared to a product built by someone who is just trying to check off a set of feature boxes. This is what I think of as the Passion Gap.

If you’ve ever heard Dennis talk about Foursquare, or mobile devices, or cities, you can’t help but see that everything you think of, he’s already thought of, turned over in his head eight times, and reached the conclusion that you would eventually come to if you spent eighteen months in deep thought about the topic. Does Facebook have a Dennis Crowley? Or do they have a product manager who just started thinking about location-based services eight months ago? That PM may have 600 million users and every engineering resource they desire, but they haven’t spent the last ten years thinking about how to get people more engaged with their cities via their mobile devices. The difference between Dennis Crowley and that Facebook PM is the Passion Gap.

The Passion Gap is evident when you see a founder or product manager so deeply engaged in their product that they can’t help but think about it all the time, and, as a result, they see all the fine details that are required to make a product that exactly matches what the market needs. This is true even when the market hasn’t yet realized the need.

Another great demonstration of the impact of the Passion Gap is the difference between StackOverflow and Experts Exchange. StackOverflow is amazing, and we should all be thankful to Jeff Atwood for creating it. But Experts Exchange is basically the same product idea, created years before, with an entrenched user base. And yet, somehow the user experience at Experts Exchange is all friction, whereas StackExchange just hums. If you’ve read any of Coding Horror, it makes perfect sense - Jeff Atwood has a deep passion for making every software engineer out there better at what they do. Of course someone with that passion was going to be the one to make a tremendously useful knowledge-sharing site for software engineers!

So I think of the Passion Gap whenever someone claims that a successful startup has “no defensible technology”. For some of the most interesting companies out there right now, the key bits of technology are not in some single large algorithmic piece, but rather in dozens of fine-grained product choices that make a total experience. The people who accuse those products of lacking defensible technology are taking a one-dimensional view of the product. It’s like we’re all color-blind to the features that make the product hum, and yet these highly passionate founders see the colors we’re completely unaware of.

Just to enumerate some other examples of where I see the Passion Gap at work:
  • Andrew Mason (Groupon) - Andrew’s startup prior to Groupon was The Point; it was a startup for social causes where one person acting alone could not have an impact. I would have to guess that Andrew Mason's passion isn’t about marketing local businesses, as much as it is about leveraging the power of groups of people acting together. Certainly, we have to assume that his time spent thinking about the dynamics of group action while working on The Point provides a nice competitive advantage to Groupon.
  • Mark Zuckerberg - the Time Man of the Year profile of Zuckerberg does an excellent job of showing just how passionate Zuckerberg is for thinking about personal relationships and how to extend them with technology. Remember for a moment that when Facebook first started to get big, many people thought it was just another wave in the Friendster/Myspace ebb and flow of social networking sites. I see Zuckerberg's passion as key to how Facebook achieved and maintained dominance.
  • Steve Jobs - Steve is an interesting case, because his passion, by observation, seems to be more about beauty in technology rather than any particular product application. And yet, Apple consistently puts out products where the beauty of the thing itself is a huge part of the sales appeal. The Passion Gap is particularly well-demonstrated when competitors set out to copy and improve upon Apple’s category killers, for example, the Zune.
The most common way that people talk about the Passion Gap is when they advise you to “start a company that scratches your own itch”. I posit that the underlying logic in that advice is that the best startup you can create is one where you will be constantly engaged in thinking about improving the product, maximizing the user experience, and planning for the future -where you have real passion for making it work.

Anyway, the next time you see a startup with “no defensible technology”, take a look at the founder, analyze their passions, and consider whether their defense lies in the Passion Gap.

Addendum: Please check out the follow-up post to this one, The Dark Side of Passion.

Friday, November 12, 2010

I'm calling it now: The engineering talent bubble

My 6th grade composition teacher would dock me 10 points for throwing my thesis right up into the title, but I just want to go out there and call it now: we are in the middle of a bubble of valuations for individual software engineers.

What do I mean by bubble? For my purposes, a bubble exists when prices of some commodity are increasing at a blistering pace that becomes disconnected from reality. Most of the time we fail to recognize a bubble because we think "this time it's different." This time is NOT different.

What are the signs of the bubble? Two key things I see include:
  1. Google paying outrageous retention bonuses to keep people from leaving for Facebook (see here)
  2. Startups, less than two years old, with very little user traction and no revenue, getting acquired simply for talent, at valuations that give even the front-line engineers a seven-figure outcome. (I won't provide any links here because I don't want to point too many fingers)
Maybe even more importantly, though, the key anecdotal indicator of a bubble is when it becomes "common sense" that prices will always go up or sustain the current market. That is what I am seeing more and more every day now - software engineers saying they're sure they can start a "company", sit in their apartment for a year, and then get acquired by Facebook for $10 million.

The unfortunate thing about a bubble is that it always bursts. You can argue that this bubble won't burst, because the number of software engineers is relatively fixed, and the demand is going up pretty drastically these days. However, a few years ago, people argued for the real estate bubble by saying "they aren't making any more land". That bubble didn't burst because supply expanded, it burst because demand collapsed as the ability to pay fell through the floor.

This is going to end, and badly. There are a lot of engineers who are going to make serious life choices based on current trends, and in a year or two, when this blows up, they will find themselves unemployed and without savings.

Tuesday, September 15, 2009

Please Drive Your Kids to School

The article in Sunday's Times about driving your kids to school has generated a lot of discussion. I just posted a mini-rant in Reader about this, and figured it was worth re-posting here.

As a parent and a computer scientist, to me this comes down to basic probability theory. The idea of the "only 115 abductions a year" stat is to make it sound as though the chances are so low that it's not worth protecting against.

As we all remember from discrete math, though, you don't care about the probability, you care about the expected value. As a parent, the cost of losing a child is, for all practical purposes, infinite. So any "lose my child" event with non-zero probability becomes worth preventing against. The cost of driving a kid to school each day is astonishingly low; there's no point in NOT doing it.

The author is basically appealling to the sentiment of "why aren't things the same as when I was a I kid?" To them I say: the world changes, get over yourself.

This is not meant to be an argument for protecting kids against all possible dangers, but driving your kids to school is pretty much a no brainer. It's right up there with "should I buy my kid a car seat or not?"

By the way, most states these days have much stricter car seat and seat belt laws for children than when we were kids. My kids will never know the joy of bouncing around, unrestrained, in the way back of an '85 Oldsmobile wagon. That's life.