NIST has announced what they call a Static Analysis Tools Exposition (SATE). Participating static analysis tools will be used to analyze a small set of programs selected by NIST. Eventually all of the results will be made public. I like the idea a lot, and I'm going to make sure Fortify takes part in both the C and Java tracks.
Running a bunch of programs against the same test cases might seem like a straightforward exercise, but it's not. Finding an even-handed way to examine complicated tools is tough, and the fact that NIST is emphasizing the detection of security vulnerabilities makes it that much tougher. (Tell me again, what exactly constitutes a "vulnerability" in source code?) Just browse through the steps in the SATE protocol to get a feel for the complexity involved.
To make the exercise even more difficult, everyone who makes a tool can't help but worry that they'll come out looking bad. After all, this sounds a little bit like the setup for a product review. We've seen those before. (Here's a link to the one Network Computing did in the middle of 2007: link.) Product reviews generally have winners and losers. NIST says they'll work hard to avoid these kinds of labels, but it's not going to be easy. All of this makes test case selection a touchy subject, and comparative presentation of the results is an absolute minefield.
The problems are many, but there's also a lot to be gained. Here are the three reasons I'm looking past all of the complexity and anxiety and participating in SATE:
1) People who want to run their own tool comparisons will benefit. You might like the way SATE works or you might not, but you'll be able to see what they did and where it took them.
2) Tool builders will benefit. Static analysis for security is a hot topic in both industry and academia. The results will give us a little bit of insight into which problems we've conquered and which ones require more work.
3) The security community will benefit. Being able to compare output from different tools will give us a better idea about how the tools classify code, and my guess is that we'll learn that our nomenclature still needs improvement. Better nomenclature makes it easier to talk to the 99% of humanity that doesn't think about software security every day.






