Security Tools Benchmarking: Rules of the Game

The last couple of months have been very interesting (thanks for all the great feedback and constructive criticism), and I have some good news and several announcements.

First, the good news:

I had several discussions with Simon Bennetts (psiinon), one of the chapter leaders in OWASP and the leader / co-leader of several OWASP projects (ZAP, WAVE and OWASP-DEF).

One of the sub projects Simon is leading is ZAP-WAVE, the only additional web-app scanner evaluation framework which is actively maintained (the last publically available update of the third framework – "moth", was in mid 2009), and he suggested we merge our efforts so that everyone will benefit.

To make a long story short, Simon contributed the source code of the current test cases of ZAP-WAVE, allowing me adjust them into WAVSEP format and publish them under GPL 3.0 (currently available under ASF 2.0, lawyer comments aside). He even suggested that in the future, test cases that will be implemented by the ZAP team will be in WAVSEP format (structure and documentation).

That's obviously great news for me (and for anyone else using the project or the benchmark results – credits to Simon Bennetts and Axel Neumann), since the ZAP-WAVE project already contains test cases in several exposures that are not covered by WAVSEP, and any additional contribution will only enhance my current efforts (I'm currently working on dozens of additional test cases for new exposure categories).

I have already started to adjust these test cases (changes and integration notes will appear in the changelog), and I hope I'll manage to release them soon.

Now for several announcements that are related to the upcoming benchmark and the future versions of WAVSEP:

After the last benchmark was published, I got a lot of feedback, requests, interesting ideas and various suggestions. I read it all, and some of the requests and suggestions will be implemented in the upcoming benchmark and the future versions of WAVSEP.

The different feedbacks lead me to some important realizations:

· The rules of the test must be made public and clear to all vendors, in order to make sure that the process will be fair. In order to achieve this goal, certain changes must be implemented in the testing process.

· In order to enable vendors to show improvement quickly and in order to prevent any previous "negative" results from being perceived as a long term "punishment", the result presentation method must be updated more frequently, even between benchmarks.

As a result, I have constructed the following set of rules which will govern the testing processes in any future benchmark I will perform, and also require some changes in the publication cycles of WAVSEP:

· In order to enable vendors to show improvement, all the future benchmarks will be based on the WAVSEP test cases used in the previous benchmark, in addition to any other tests (interpretation: the upcoming benchmark will also include tests against all the SQLi and RXSS test cases of WAVSEP 1.0.3).

· Since a benchmark is much more interesting if the content in it is new, each major benchmark will include different test aspects and / or detection results for test cases in additional exposure categories.

· In order for the contest to be even more interesting (and in order to prevent one vendor from preparing for everything while another was not even notified, was not aware of the WAVSEP platform, has insufficient time to improve the tool prior to the benchmark, etc), the test cases of some of the new exposure categories will only be published after the first major benchmark that included tests against them – something that will add some spice to the results, make sure the process will be fair, but will still enable vendors to improve their previous score.

· The upcoming benchmark will include tests in some new categories: I'm currently aiming for at least 3 additional categories, in addition to the previous (and I hope that I'll manage to finish all the developments and tests before my next deadline… at least for most tools).

· Vendors that wish to update their score will be given an opportunity to do so, even between major benchmarks, by using a presentation method that will support dynamically updated content. The terms for these tests will be published separately, as soon as the presentation framework will be available (soon). Re-tests of additional versions of the same product will be performed under these terms.

· Since my final goal is to test all the vendors (almost 100), test additional types of scanning services / products, and eventually, test as many features of these tools as I possibly can, my time is a valuable asset, and contacting commercial vendors that don't offer a publically available evaluation version is very difficult. Although I will try my best to go through official channels and perform all the tests myself (or through members of the WAVSEP project), my experience shows that in some cases, the official channels might be time consuming, and sadly, sometimes more then I can afford. Therefore, I encourage vendors to contact me directly, starting of November 15, so I could test them properly, on equal terms.

Summary: future benchmarks will include test cases used in previous benchmarks (to enable vendors to show improvement), new test cases which will only be published after the benchmark (so that the tests will be fair and the content more interesting), and finally - the results will be more dynamic, to disable one more participation barrier.

As I said in my previous posts - I'm planning to continue to perform these comparisons for a long time, and intend to make sure that the community and vendors will both be able benefit from this initiative, if they only choose to.

Cheers

Sunday, October 23, 2011

Rules of the Game – Scanner Benchmarks

1 comment: