Security Tools Benchmarking: Commercial Web Application Scanner Benchmark

Monday, August 1, 2011

Commercial Web Application Scanner Benchmark

The Scanning Legion:

Web Application Scanners Accuracy Assessment & Feature Comparison

Commercial & Open Source Scanners

A Comparison of 60 Commercial & Open Source Black Box Web Application Vulnerability Scanners

By Shay Chen

Security Consultant, Researcher & Instructor

http://sectooladdict.blogspot.com/

sectooladdict-$at$-gmail-$dot$-com

August 2011

Assessment Environments: WAVSEP 1.0 / WAVSEP 1.0.3 (http://code.google.com/p/wavsep/)

Disclaimer

The results of this research are only valid for estimating the detection accuracy of SQLi & RXSS exposures, and for counting and comparing the various features of the tested tools.

The author did not evaluate every possible feature of each product, only the categories tested within the research, and thus, does not claim to be able to estimate the ROI from each individual product.

Furthermore, several vendors invested resources in improving their tools according to the recommendations of the WAVSEP platform which was publically available since December 2010. Some of them did so without any relation to the benchmark (and before they were aware of it), and some in preparation for it. Since the special structure of the WAVSEP testing platform actually requires the vendor to cover more vulnerable test scenarios, that action actually improves the detection ratio of the tool in any application (for the exposures covered by WAVSEP).

It is however, important to mention that a few vendors were not notified on this benchmark, and were not aware of the existence of the WAVSEP platform, and thus, could not have enhanced their tools in preparation for this benchmark (HP Webinspect, Tenable Nessus, and Janus security Webcruiser), while other vendors that were tested in the initial research phases released updated versions that were not tested (Portswigger Burpsuite and Cenzic Hailstorm)

That being said, the benchmark does represent the accuracy level of each tool in the date it was tested (the results of the vast majority of the tools are valid for the date this research was released), but future benchmark will use a different research model in order to ensure that the competition will be fair for all vendors.

Table of Contents

1. Prologue

2. List of Tested Web Application Scanners

3. Benchmark Overview & Assessment Criteria

4. Test I – The More The Merrier – Counting Audit Features

5. Test II – To the Victor Go the Spoils – SQL Injection

6. Test III – I Fight (For) the Users – Reflected XSS

7. Test IV – Knowledge is Power - Feature Comparison

8. What Changed?

9. Initial Conclusions – Open Source vs. Commercial

10. Morale Issues in Commercial Product Benchmarks

11. Verifying The Benchmark Results

12. Notifications and Clarifications

13. List of Tested Scanners

14. Source, License and Technical Details of Tested Scanners

15. Comparison of Active Vulnerability Detection Features

16. Comparison of Complementary Scanning Features

17. Comparison of Usability and Coverage Features

18. Comparison of Connection and Authentication Features

19. Comparison of Advanced Features

20. Detailed Results: Reflected XSS Detection Accuracy

21. Detailed Results: SQL Injection Detection Accuracy

22. Drilldown – Error Based SQL Injection Detection

23. Drilldown – Blind & Time Based SQL Injection Detection

24. Technical Benchmark Conclusions – Vendors & Users

25. So What Now?

26. Recommended Reading List: Scanner Benchmarks

27. Thank-You Note

28. Frequently Asked Questions

29. Appendix A – Assessing Web Application Scanners

30. Appendix B – A List of Tools Not Included In the Test

31. Appendix C – WAVSEP Scan Logs

32. Appendix D – Scanners with Abnormal Behavior

1. Prologue

I've always been curious about it… from the first moment I executed a commercial scanner, almost seven years ago, to the day I started performing this research. Although manual penetration testing has always been the main focus of the test, most of us use automated tools to easily detect "low hanging fruit" exposures, increase the coverage when testing large scale applications in limited timeframes and even to double check locations that were manually tested. The questions always pops up, in every penetration test in which these tools are used…

"Is it any good?", "Is it better than…" and "Can I rely on it to…" are questions that every pen-tester asks himself whenever he hits the scan button.

Well, curiosity is a strange beast… it can drive you to wander and search, consume all your time in a search for obscure solutions.

So recently, because of curiosity, I decided that I want to find out for myself, and invest whatever resources necessary to solve this mystery once and for all.

Although I can hardly state that all my questions were answered, I can definitely sate your curiosity for the moment, by sharing insights, interesting facts, useful information and even some surprises, all derived from my latest research which is focused on the subject of commercial & open source web application scanners.

This research covers the latest versions of 12 commercial web application scanners and 48 free & open source web application scanners, while comparing the following aspects of these tools:

· Number & Type of Vulnerability Detection Features

· SQL Injection Detection Accuracy

· Reflected Cross Site Scripting Detection Accuracy

· General & Special Scanning Features

Although my previous research included similar information, I regretted one thing after it was published; I did not present the information in a format that was useful to the common reader. In fact, as I found out later, many readers skipped the actual content, and focused on sections of the article that were actually a side effect of the main research.

As a result, the following article will focus on presenting the information in a simple comprehendible graphical format, while still providing the detailed research information to those interested… and there's a lot of new information to be shared – knowledge that can aid pen-testers in choosing the right tools, managers in budget related decisions, and visionaries, in properly reading the map;

But before you read the statistics and insights presented in this report, and reach a conclusion as to which tool is the "best", it is crucial that you read ‎Appendix A - Section 29, which explains the complexity of assessing the overall quality of web application scanners… As you're about to find out, this question cannot be answered so easily… at least not yet.

…

So without any further delay, let's focus on the information you seek, and discuss the insights and conclusions later.

2. List of Tested Web Application Scanners

The following commercial scanners were included in the benchmark:

· IBM Rational AppScan v8.0.03 - iFix Version (IBM)

· WebInspect v9.10.78.0, SecureBase 4.05.99 (HP)

· Hailstorm Professional v6.5-5267(Cenzic)

· Acunetix WVS v7.0-20110608 (Acunetix)

· NTOSpider v 5.4.098 (NT Objectives)

· Netsparker v2.0.0.0 (Mavituna Security)

· Burp Suite v1.3.09 (Portswigger)

· Sandcat v4.2.4.0 (Syhunt)

· ParosPro v1.9.12 (Milescan)

· JSky v3.5.1-905 (NoSec)

· WebCruiser v2.5.0 EE (Janus Security)

· Nessus v4.41-15078 (Tenable Network Security) – Only the Web Application Scanning Features

The following new free & open source scanners were included in the benchmark:

VEGA 1.0 beta (Subgraph), Safe3WVS v9.2 FE (Safe3 Network Center), N-Stalker 2012 Free Edition v7.1.1.106 (N-Stalker), DSSS (Damn Simple SQLi Scanner) v0.1h, SandcatCS v4.2.3.0

The updated versions of the following free & open source scanners were re-tested in the benchmark:

Zed Attack Proxy (ZAP) v1.3.0, sqlmap v0.9-rev4209 (SVN), W3AF 1.1-rev4350 (SVN), Watobo v0.9.7-rev544, Acunetix Free Edition v7.0-20110711, Netsparker Community Edition v1.7.2.13, WebSecurify v0.8, WebCruiser v2.4.2 FE (corrections), arachni v0.2.4 / v0.3, XSSer v1.5-1, Skipfish 2.02b, aidSQL 02062011

The results were compared to those of unmaintained scanners tested in the original benchmark:

Andiparos v1.0.6, ProxyStrike v2.2, Wapiti v2.2.1, Paros Proxy v3.2.13, PowerFuzzer v1.0, Grendel Scan v1.0, Oedipus v1.8.1, Scrawler v1.0, Sandcat Free Edition v4.0.0.1, JSKY Free Edition v1.0.0, N-Stalker 2009 Free Edition v7.0.0.223, UWSS (Uber Web Security Scanner) v0.0.2, Grabber v0.1, WebScarab v20100820, Mini MySqlat0r v0.5, WSTool v0.14001, crawlfish v0.92, Gamja v1.6, iScan v0.1, LoverBoy v1.0, openAcunetix v0.1, ScreamingCSS v1.02, Secubat v0.5, SQID (SQL Injection Digger) v0.3, SQLiX v1.0, VulnDetector v0.0.2, Web Injection Scanner (WIS) v0.4, Xcobra v0.2, XSSploit v0.5, XSSS v0.40, Priamos v1.0

For the full list of commercial & open source tools that were not tested in this benchmark, refer to ‎Appendix B - Section 30.

3. Benchmark Overview & Assessment Criteria

The benchmark focused on testing commercial & open source tools that are able to detect (and not necessarily exploit) security vulnerabilities on a wide range of URLs, and thus, each tool tested was required to support the following features:

· The ability to detect Reflected XSS and/or SQL Injection vulnerabilities.

· The ability to scan multiple URLs at once (using either a crawler/spider feature, URL/Log file parsing feature or a built-in proxy).

· The ability to control and limit the scan to internal or external host (domain/IP).

The testing procedure of all the tools included the following phases:

· The scanners were all tested against the latest version of WAVSEP (v1.0.3), a benchmarking platform designed to assess the detection accuracy of web application scanners. The purpose of WAVSEP’s test cases is to provide a scale for understanding which detection barriers each scanning tool can bypass, and which vulnerability variations can be detected by each tool. The various scanners were tested against the following test cases (GET and POST attack vectors):

o 66 test cases that were vulnerable to Reflected Cross Site Scripting attacks.

o 80 test cases that contained Error Disclosing SQL Injection exposures.

o 46 test cases that contained Blind SQL Injection exposures.

o 10 test cases that were vulnerable to Time Based SQL Injection attacks.

o 7 different categories of false positive RXSS vulnerabilities.

o 10 different categories of false positive SQLi vulnerabilities.

· In order to ensure the result consistency, the directory of each exposure sub category was individually scanned multiple times using various configurations.

· The features of each scanner were documented and compared, according to documentation, configuration, plugins and information received from the vendor.

· In order to ensure that the detection features of each scanner were truly effective, most of the scanners were tested against an additional benchmarking application that was prone to the same vulnerable test cases as the WAVSEP platform, but had a different design, slightly different behavior and different entry point format (currently nicknamed "bullshit").

The results of the main test categories are presented within three graphs (commercial graph, free & open source graph, unified graph), and the detailed information of each test is presented in a dedicated report.

So, now that you've learned about the testing process, it's time for the results…

4. Test I – The More The Merrier – Counting Audit Features

The first assessment criterion was the number of audit features each tool supports.

Reasoning: An automated tool can't detect an exposure that it can't recognize (at least not directly, and not without manual analysis), and therefore, the number of audit features will affect the amount of exposures that the tool will be able to detect (assuming the audit features are implemented properly, that vulnerable entry points will be detected and that the tool will manage to scan the vulnerable input vectors).

For the purpose of the benchmark, an audit feature was defined as a common generic application-level scanning feature, supporting the detection of exposures which could be used to attack the tested web application, gain access to sensitive assets or attack legitimate clients.

The definition of the assessment criterion rules out product specific exposures and infrastructure related vulnerabilities, while unique and extremely rare features were documented and presented in a different section of this research, and were not taken into account when calculating the results. Exposures that were specific to Flash/Applet/Silverlight and Web Services Assessment were treated in the same manner.

The Number of Audit Features in Web Application Scanners – Commercial Tools

The Number of Audit Features in Web Application Scanners - Free & Open Source Tools

The Number of Audit Features in Web Application Scanners – Unified List

So, now that were done with the quantity, let's get to the quality…

5. Test II – To the Victor Go the Spoils – SQL Injection

The second assessment criterion was the detection accuracy of SQL Injection, one of the most famous exposures and the most commonly implemented attack vector in web application scanners.

Reasoning: a scanner that is not accurate enough will miss many exposures, and classify non-vulnerable entry points as vulnerable. This test aims to assess how good is each tool at detecting SQL Injection exposures in a supported input vector, which is located in a known entry point, without any restrictions that can prevent the tool from operating properly.

The evaluation was performed on an application that uses MySQL 5.5.x as its data repository, and thus, will reflect the detection accuracy of the tool when scanning similar data repositories.

Result Chart Glossary

Note that the BLUE bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

The SQL Injection Detection Accuracy of Web Application Scanners – Commercial Tools

The SQL Injection Detection Accuracy of Web Application Scanners – Open Source & Free Tools

The SQL Injection Detection Accuracy of Web Application Scanners – Unified List

It's obvious that testing one feature is not enough; ideally, the detection accuracy of all audit features should be assessed, but in the meantime, we will settle for one more…

6. Test III – I Fight (For) the Users – Reflected XSS

The third assessment criterion was the detection accuracy of Reflected Cross Site Scripting, a common exposure which is the 2nd most commonly implemented feature in web application scanners.

Result Chart Glossary

The Reflected XSS Detection Accuracy of Web Application Scanners – Commercial Tools

The Reflected XSS Detection Accuracy of Web Application Scanners – Open Source & Free Tools

The Reflected XSS Detection Accuracy of Web Application Scanners – Unified List

7. Test IV – Knowledge is Power - Feature Comparison

The list of tools tested in this benchmark is organized within the following reports:

· List of Tested Scanners

· Source, License and Technical Details of Tested Scanners

Additional information was gathered during the benchmark, including information related to the different features of the various scanners. These details are organized in the following reports, and might prove useful when searching for tools for specific tasks or tests:

· Comparison of Active Vulnerability Detection Features (Audit Features) – 1 of 2

· Comparison of Active Vulnerability Detection Features (Audit Features) – 2 of 2

· Comparison of Complementary Scanning Features - Passive Analysis, CGI Scanning, Etc

· Comparison of Usability, Coverage and Scan Initiation Features

· Comparison of Authentication, Scan Control and Connection Support Features

· Comparison of Advanced and Uncommon Features

For detailed information on the accuracy assessment results, refer to the following reports:

· Benchmark Results – Reflected XSS Detection Accuracy

· Benchmark Results – SQL Injection Detection Accuracy – Unified

· Benchmark Drilldown – Blind & Time Based SQL Injection Detection Accuracy

· Benchmark Drilldown – Error Dependant SQL Injection Detection Accuracy

· The Scan Logs (describing the executing process and configuration of each scanner)

Additional information on the scan logs, the list of untested tools and the abnormal behaviors of scanners can be found in the article appendix sections (at the end of the article):

‎Appendix B - Section 30 – an appendix that contains a list of tools that were not included in the benchmark

‎Appendix D - Section 32 – an appendix that describes scanners with abnormal behavior

8. What Changed?

Since the latest benchmark, many open source & commercial tools added new features and improved their detection accuracy.

The following list presents a summary of changes in the detection accuracy of free & open source tools that were tested in the previous benchmark:

· arachni – a dramatic improvement in the detection accuracy of Reflected XSS exposures, and a dramatic improvement in the detection accuracy of SQL Injection exposures (verified on mysql).

· sqlmap – a dramatic improvement in the detection accuracy of SQL Injection exposures (verified on mysql).

· Acunetix Free Edition – a major improvement in the detection accuracy of RXSS exposures.

· Watobo – a major improvement in the detection accuracy of SQL Injection exposures (verified on mysql).

· N-Stalker 2009 FE vs. 2012 FE – although this tool is a very similar to N-Stalker 2009 FE, the surprising discovery I had was that the detection accuracy of N-Stalker 2012 is very different – it detects only a quarter of what N-Stalker 2009 used to detect. Assuming this result is not related to a bug in the product or in my testing procedure, it means that the newer free version is significantly less effective than the previous free version, at least at detecting reflected XSS. A legitimate business decision, true, but surprising nevertheless.

· aidSQL – a major improvement in the detection accuracy of SQL Injection exposures (verified on mysql).

· XSSer – a major improvement in the detection accuracy of Reflected XSS exposures, even though the results were not consistent.

· Skipfish – a slight improvement in the detection accuracy of RXSS exposures (it is currently unknown if the RXSS detection improvement is related to changes in code or to the enhanced testing method), and a slight decrease in the detection accuracy of SQLi exposures (might be related to the different testing environment and the different method used to count the results).

· WebSecurify – a slight improvement in the detection accuracy of RXSS exposures (it is currently unknown if the RXSS detection improvement is related to changes in code or to the enhanced testing method).

· Zed Attack Proxy (ZAP) – Identical results. Any minor difference was probably caused due to the testing environment, configuration or minor issues.

· W3AF – slight improvement in the detection accuracy of RXSS exposures and slight decrease in the detection accuracy of SQL Injection exposures.

· Netsparker Community Edition – Identical results. Any minor difference was probably caused due to the testing environment, configuration or minor issues.

· WebCruiser Free Edition – a minor decrease in accuracy, due to fixing documentation mistakes from the previous benchmark.

9. Initial Conclusions – Open Source vs. Commercial

The following section presents my own personal opinions on the results of the benchmark, and since opinions are beliefs, which are affected by emotions and circumstances, you are entitled to your own.

After testing over 48 open source scanners multiple times, and after comparing the results and experiences to the ones I had after testing 12 commercial ones (and those are just the ones that I reported), I have reached the following conclusions:

· As far as accuracy & features, the distance between open source tools and commercial tools is not as big as it used to be – tools such as sqlmap, arachni, wapiti, w3af and others are slowly closing the gap. That being said, there still is a significant difference in stability & false positives, in which most open source tools tend to have more false positives and be relatively unstable when compared to most commercial tools.

· Some open source tools, even the most accurate ones, are relatively difficult to install & use, and still require fine-tuning in various fields. In my opinion, a non-technical QA engineer will have difficulties using these tools, and as a general rule, I'll recommend using them if your background is relatively technical (consultant, developer, etc). For all the rest, especially non-technical enterprise employees that prefer a decent usage experience - stick with commercial produces, with their free versions, or with the simple variations of open source tools.

· If you are using a commercial product, it's best to merge the use of tools with a wide variety of features with tools with high detection accuracy. It's possible to use tools that have relatively good scores in both of these aspects, or use a tool with a wide variety of features with another tool that has enhanced accuracy. Yes, this statement can be interpreted to using combinations of commercial and open source tools, and even to using two different commercial tools, so that one tool will complete the other. Budget? Take a look at the cost diversity of the tools, before you make any harsh decisions; I promise you'll be surprised.

10. Morale Issues in Commercial Product Benchmarks

While testing the various commercial tools, I have dealt with certain moral issues that I want to share. Many vendors that were aware of this research enhanced their tools in preparation for it, an action I respect, and consider a positive step. Since the testing platform that included most of the tests was available online, preparing for the benchmark was a relatively easy task for any vendor that invested the resources.

So, is the benchmark fair for vendors that couldn’t improve their tools due to various circumstances?

The testing process of a commercial tool is usually much more complicated and restrictive then testing a free or open source tool; it is necessary to contact the vendor to obtain an evaluation license, and the latest version of the tool (a process that can take several weeks), the evaluation licenses are usually restricted to a short evaluation timeframe (usually two weeks), and thus, updating and testing the tools in a future date can become a hassle (since some of the process will have to be performed all over again)… but why am I telling you all this?

Simply, because I believe that the relevance of the test I performed for vendors that provided me an extended evaluation period and access to new builds was better; for example, a few days before the latest benchmark, immediately after testing the latest versions of two major vendors, I decided to rescan the platform using the latest versions of all the commercial tools I have, to ensure that the benchmark will be published with the most updated results.

I verified that JSky, WebCruiser, and ParosPro didn't release a new version, tested the latest versions of AppScan, WebInspect, Acunetix, Netsparker, Sandcat and Nessus.

It made sense that builds that were tested a short while ago (like NTO spider), were also something that I can rely on to represent the currently state of the tool (I hopeJ).

I did however, have a problem with Cenzic and Burp, two of the first tools that I tested in this research, since my evaluation licenses were no longer valid, and I couldn't update the tools to their latest version and scan again, and since I had 2-3 days until the end of my planned schedule, with a million tasks pending, I simply couldn't afford going through the evaluation request phase again, with all of my good intentions, and the willingness to sacrifice my spare time to ensure these tools will be properly represented.

Even though the results of some updated products (WebInspect and Nessus being the best examples) didn't change at all, even after I updated them to the latest version, who could say that the result would be the same for other vendors?

So, were the terms unfair to burp and cenzic?

Finally, several vendors sent me multiple versions and builds – they all wanted to succeed, a legitimate desire of any human being, even more so for a firm. Apart from the time each test took (a price I was willing to pay at the time), the new builds were sent even in the last day of the benchmark, and afterwards.

But if the new version is better, and more accurate, by limiting the amount of tests I perform for a given vendor, isn't that against what I'm trying to achieve in all my benchmarks, which is to release the benchmark with the most updated results, for all the tools?

(For example, Syhunt, a vendor that did very well in the last benchmark, sent me its final build (2.4.2.5) a day after the deadline, and included a time based SQL injection detection feature in that build, but since I couldn't afford the time anymore, I couldn't test the build, so, am I really reflecting the tool's current state in the most accurate manner? But if I would have tested this build, shouldn't I provide the rest of the vendors the same opportunity?)

One of the questions I believe I can answer – the accuracy question.

A benchmark is, in a very real sense, a competition, and since I take the scientific approach, I believe that the results are absolute, at least for the subject that is being tested. Since I'm not claiming that one tool is "better" than the other in every category, only at the tested criterion, I believe that priorities do not matter – as long as the test really reflects the current situation, the result is reliable.

I leave the interpretation of the results to the reader, at least until I'll cover enough aspects of the tools.

As for the rest of the open issues, I don't have good answers for all of those questions, and although I did my very best in this benchmark, and even exceeded what I thought I'm capable of, I will probably have to think of some solutions that will make the next benchmark terms equal, even for scanners that were tested in the beginning of the benchmark, and less time consuming then it has been.

11. Verifying The Benchmark Results

The results of the benchmark can be verified by replicating the scan methods described in the scan log of each scanner, and by testing the scanner against WAVSEP v1.0.3.

The latest version of WAVSEP can be downloaded from the web site of project WAVSEP (binary/source code distributions, installation instructions and the test case description are provided in the web site download section):

http://code.google.com/p/wavsep/

12. Notifications and Clarifications

How to use the results of the benchmark

The results of the benchmark clearly show how accurate each tool is in detecting the tested vulnerabilities (SQL Injection (MySQL ) & Reflected Cross Site Scripting), as long as it is able to locate and scan the vulnerable entry points. The results might even help to estimate how accurate each tool is in detecting related vulnerabilities (for example SQL Injection vulnerabilities which are based on other databases), and determine which exposure instances cannot be detected by certain tools;

However, currently, the results DO NOT evaluate the overall quality of the tool, since they don't include detailed information on the subjects such as crawling quality, technology support, scoping, profiling, stability in extreme cases, tolerance, detection accuracy of other exposures and so on... at least NOT YET.

I highly recommend reading the detailed results, and the appendix that deals with web application scanner evaluation, before getting to any conclusions.

Additional Notifications

During the benchmark, I have reported bugs that had a major affect on the detection accuracy to several commercial and open source vendors:

· A performance improvement feature in NTOSpider caused it not to scan many POST XSS test cases, and thus, the detection accuracy of RXSS POST test cases was significantly smaller then the RXSS GET detection accuracy. The vendor was notified on this issue, and provided me with a special build that overrides this feature (at least until they will have a feature in the GUI to disable this mechanism).

· A similar performance improvement feature in Netsparker caused the same issue, however, the feature could have been disabled in Netsparker, and thus, with the support of the relevant personal at Netsparker, I was able to work around the problem.

· A few bugs in arachni prevented the blind sql injection diff plugins from working properly. I notified the author, Tasos, on the issue, and he quickly fixed the issue and released the new version.

· Acunetix RXSS detection result was updated to match the results of the latest free version (one version above the tested commercial version) - Since the tested commercial version of Acunetix was older than the tested free version (20110608 vs 20110711), and since the results of the upgraded free version were actually better than the older commercial version I had tested, I changed the results of the commercial tool to match the ones of the new free version (from 22 to 24 in both the GET & POST RXSS detection scores).

· Changes in results from the previous benchmark might be attributed to enhanced scanning features, and/or to enhanced stability in the test environment & method (connection pool, limited & divided scope).

13. List of Tested Scanners

The following report contains the list of scanners tested in this benchmark, and provides information on the tested version, the tool's vendor/author and the current status of product:

http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20-%20WAVSEP%20Benchmark%202011.pdf

14. Source, License and Technical Details of Tested Scanners

The following report compares the licenses, development technology and sources (home page) of the various scanners:

http://sectooladdict-benchmarks.googlecode.com/files/Details%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20-%20WAVSEP%20Benchmark%202011.pdf

15. Comparison of Active Vulnerability Detection Features

The following reports compare the active vulnerability detection features (audit features) of the various tested scanners:

First Report:

http://sectooladdict-benchmarks.googlecode.com/files/Application%20Active%20Scan%20Features%20Comparison%201of2%20-%20WAVSEP%20Benchmark%202011.pdf

Second Report:

http://sectooladdict-benchmarks.googlecode.com/files/Application%20Active%20Scan%20Features%20Comparison%202of2%20-%20WAVSEP%20Benchmark%202011.pdf

Aside from the Count column (which represents the total amount of audit features supported by the tool, not including complementary features such as web server scanning and passive analysis), each column in the report represents an audit feature. The description of each column is presented in the following glossary table:

Title	Description
SQL	Error Dependant SQL Injection
BSQL	Blind & Intentional Time Delay SQL Injection
RXSS	Reflected Cross Site Scripting
PXSS	Persistent / Stored Cross Site Scripting
DXSS	DOM XSS
Redirect	External Redirect / Phishing via Redirection
Bck	Backup File Detection
Auth	Authentication Bypass
CRLF	CRLF Injection / Response Splitting
LDAP	LDAP Injection
XPath	X-Path Injection
MX	MX / SMTP / IMAP Injection
Session Test	Session Identifier Complexity Analysis
SSI	Server Side Include
RFI-LFI	Directory Traversal / Remote File Include / Local File Include (Will be separated into different categories in future benchmarks)
Cmd	Command Injection / OS Command Injection
Buffer	Buffer Overflow
CSRF	Cross Site Request Forgery
A-Dos	Application Denial of Service / RegEx DoS
Privilege Escalation	Privilege Escalation Between Different Roles and User Accounts (Resources / Features)
Format String	Format String Injection
File Upload	File Upload / Insecure File Upload
Code Injection	Code Injection (ASP/JSP/PHP/Perl/etc)
XML Injection	XML / SOAP Injection
Source Code Disclosure	Source Code Disclosure Detection
Integer Overflow	Integer Overflow
Padding Oracle	Padding Oracle Detection / Exploitation
Session Fixation	Session Fixation

16. Comparison of Complementary Scanning Features

The following report compares complementary vulnerability detection features in the tested scanners:

http://sectooladdict-benchmarks.googlecode.com/files/Complementary%20Scan%20Features%20Comparison%20-%20WAVSEP%20Benchmark%202011.pdf

In order to clarify what each column in the report table means, use the following glossary table:

Title	Description
Web Server Hardening	Features that are able to detect Insecure HTTP method support (PUT, Trace, WebDAV), directory listing, robots and cross-domain files information disclosure, version specific vulnerabilities, etc.
CGI Scanning	Default files, common vulnerable applications, etc.
Passive Analysis	Security tests that don’t require any actual attacks, and are instead based on information gathering and analysis of responses, including certificate & cipher tests, content & metadata analysis, mime type analysis, autocomplete detection, insecure transmission of credentials, google hacking, etc.
File / Dir Enumeration	Directory and file enumeration features
Notes and Other Features	Uncommon or Unique features

17. Comparison of Usability and Coverage Features

The following report compares the usability, coverage and scan initiation features of the tested scanners:
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(1%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final3.pdf

In order to clarify what each column in the report table means, use the following glossary table:

Title	Possible Values
Configuration & Usage Scale	Very Simple - GUI + Wizard Simple - GUI with simple options, Command line with scan configuration file or simple options Complex - GUI with numerous options, Command line with multiple options Very Complex - Manual scanning feature dependencies, multiple configuration requirements
Stability Scale	Very Stable - Rarely crashes, Never gets stuck Stable - Rarely crashes, Gets stuck only in extreme scenarios Unstable - Crashes every once in a while, Freezes on a consistent basis Fragile – Freezes or Crashes on a consistent basis, Fails performing the operation in many cases
Performance Scale	Very Fast - Fast implementation with limited amount of scanning tasks Fast - Fast implementation with plenty of scanning tasks Slow - Slow implementation with limited amount of scanning tasks Very Slow - Slow implementation with plenty of scanning tasks

18. Comparison of Connection and Authentication Features

The following report compares the connection, authentication and scan control features of the tested scanners:
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(2%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final.pdf

19. Comparison of Advanced Features

The following report contains a comparison of advanced and uncommon scanner features:

http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20%283%20of%203%29%20-%20WAVSEP%20Benchmark%202011.pdf

20. Detailed Results: Reflected XSS Detection Accuracy

The results of the Reflected Cross Site Scripting (RXSS) accuracy assessment are presented in the following report (the graphical results representation is provided in the beginning of the article):

http://sectooladdict-benchmarks.googlecode.com/files/Web%20Application%20Scanner%20RXSS%20Detection%20Accuracy%20-%20WAVSEP%20ScoreChart%202011.pdf

The results that were taken into account only include vulnerable pages linked from the index-xss.jsp index page (the RXSS-GET and/or RXSS-POST directories, in addition to the RXSS-FalsePositive directory). XSS Vulnerable entry points in the SQL injection vulnerable pages were not taken into account, since they don’t necessarily represent a unique scenario (or at least, not until the “layered vulnerabilities” scenario will be implemented).

21. Detailed Results: SQL Injection Detection Accuracy

The overall results of the SQL Injection accuracy assessment are presented in the following report (the graphical results representation is provided in the beginning of the article):

http://sectooladdict-benchmarks.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20-%20WAVSEP%20ScoreChart%202011.pdf

22. Drilldown – Error Based SQL Injection Detection

The results of the Error-Based SQL Injection benchmark are presented in the following report:

http://sectooladdict-benchmarks.googlecode.com/files/Web%20Application%20Scanner%20Error-Based%20SQLi%20Detection%20Accuracy%20-%20WAVSEP%20ScoreChart%202011.pdf

23. Drilldown – Blind & Time Based SQL Injection Detection

The results of the Blind & Time based SQL Injection benchmarks are presented in the following report:

http://sectooladdict-benchmarks.googlecode.com/files/Web%20Application%20Scanner%20Blind%20SQLi%20Detection%20Accuracy%20-%20WAVSEP%20ScoreChart%202011.pdf

24. Technical Benchmark Conclusions – Vendors & Users

While testing the various tools in this benchmark, I dealt with numerous difficulties, witnessed many inconsistent results and noticed that some tools had difficulties optimizing their scanning features on the tested platform. I had however, dealt with the other end of the spectrum, and used tools the easily overcome most of the difficulties related to detecting the tested vulnerabilities.

I'd like to share my conclusions, with the authors and vendors that are interested in improving their tools, and aren't offended by someone that's giving advice.

As far as detecting SQL injection exposures, I have noticed that tools that implemented the following features, detected more exposures, had less false positives, and provided consistent results:

· Time based SQL Injection detection vectors are very effective. They are, however, very tricky to use, since they might be affected by other attacks that are simultaneously executed, or affect the detection of other tests in the same manner. As a result, I recommended to all the authors & vendors to implement the following behavior in their product: execute time based attacks at the end of the scanning process, after all the rest of the tests are done, while using a reduced number of concurrent connections. Executing other tests in parallel might have a negative effect on the detection accuracy.

· Since the upper/lower timeout values used to determine whether or not a time based exploit was successful may change due to various circumstances, I recommend calculating and re-calculating this value during the scan, and revalidating each time based result independently, after verifying that the timeout values are "normal".

· Implement various payloads of time based attacks – the sleep method is not enough to cover all the databases, and not even all the versions of mysql.

25. So What Now?

So now that we have all those statistics, it's time to analyze them properly, and see which conclusions we can get to. Since this process will take time, I have to set some priorities;

In the near future, I will try to achieve the following goals:

· Find a better way to present the vast amount of information on web application scanners features & accuracy. I have been struggling with this issue for almost 2 years, but I think that I finally found a solution that will make the information more useful for the common reader… stay tuned for updates.

· Provide recommendations for the best current method of executing free & open source web application scanners; the most useful combinations, and the tiny tweaks required to achieve the best results.

· Release the new test case categories of WAVSEP that I have been working on. Yep, help needed.

In addition to the short term goals, the following long term goals will still have a high priority:

· Improve the testing framework (WAVSEP); add additional test cases and additional security vulnerabilities.

· Perform additional benchmarks on the framework, and on a consistent basis. I previously aimed for one major benchmark per year, but that formula might completely change, if I'll manage to work a few issues around a new initiative I have in this field.

· Integration with external frameworks for assessing crawling capabilities, technology support, etc.

· Publish the results of tests against sample vulnerable web applications, so that some sort of feedback on other types of exposures will be available (until other types of vulnerabilities will be implemented in the framework), as well as features such as authentication support, crawling, etc.

· Gradually develop a framework for testing additional related features, such as authentication support, malformed HTML tolerance, abnormal response support, etc.

I hope that this content will help the various vendors improve their tools, help pen-testers choose the right tool for each task, and in addition, help create some method of testing the numerous tools out there.

Since I have already been in the situation in the past, then I know what's coming… so I apologize in advance for any delays in my responses in the next few weeks.

The following resources include additional information on previous benchmarks, comparisons and assessments in the field of web application vulnerability scanners:

· "Webapp Scanner Review: Acunetix versus Netsparker", by Mark Baldwin (commercial scanner comparison, April 2011)

· "Effectiveness of Automated Application Penetration Testing Tools", by Alexandre Miguel Ferreira and Harald Kleppe (commercial & freeware scanner comparison, February 2011)

· "Web Application Scanners Accuracy Assessment", the predecessor of the current benchmark, by Shay Chen (a comparison of 43 free & open source scanners, December 2010)

· "State of the Art: Automated Black-Box Web Application Vulnerability Testing" (Original Paper), by Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell (May 2010) – original paper

· "Analyzing the Accuracy and Time Costs of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, February 2010)

· "Why Johnny Can’t Pentest: An Analysis of Black-box Web Vulnerability Scanners", by Adam Doup´e, Marco Cova, Giovanni Vigna (commercial & open source scanner comparison, 2010)

· "Web Vulnerability Scanner Evaluation", by AnantaSec (commercial scanner comparison, January 2009)

· "Analyzing the Effectiveness and Coverage of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, October 2007)

· "Rolling Review: Web App Scanners Still Have Trouble with Ajax", by Jordan Wiens (commercial scanners comparison, October 2007)

· "Web Application Vulnerability Scanners – a Benchmark" , by Andreas Wiegenstein, Frederik Weidemann, Dr. Markus Schumacher, Sebastian Schinzel (Anonymous scanners comparison, October 2006)

27. Thank-You Note

During the research described in this article, I have received help from quite a few individuals and resources, and I’d like to take the opportunity to thank them all.

For all the open source tool authors that assisted me in testing the various tools in unreasonable late night hours, for the kind souls that helped me obtain evaluation licenses for commercial products, for the QA, Support and Development teams of commercial vendors, which saved me tons of time and helped me overcome obstacles, and for the various individuals that helped me contact these vendors.

I would also like to continue my tradition, and thank all the information sources that helped me gather the list of scanners over the years, including (but not limited to) information security sources such as PenTestIT (http://www.pentestit.com/), Security Sh3ll (http://security-sh3ll.blogspot.com/), NETpeas Toolswatch Service (http://www.vulnerabilitydatabase.com/toolswatch/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Help Net Security (http://www.net-security.org/), Astalavista (http://www.astalavista.com/), Google (of course) and many others.

I hope that the conclusions, ideas, information and payloads presented in this research (and the benchmarks and tools that will follow) will be for the benefit of all vendors, open source community projects and commercial vendors alike.

28. Frequently Asked Questions

Q: 60 web application scanners is an awful lot, how many scanners exist?

A: Assuming you are using the same definition for a scanner that I do, then I'm currently aware of 95 web application scanners that can claim to support the detection of generic application level exposures, in a safe an controllable manner, and in multiple URLs (48 free & open source scanners that were tested, 12 commercial scanners that were tested, 25 open source scanners that I didn't test yet, and 10 commercial scanners that slipped my grip). And yes, I'm planning on testing them all.

Q: Why RXSS and SQLi again? Will the benchmarks ever include additional exposures?

A: Yes, they will. In fact, I'm already working on test case categories of two different exposures, and will use them both for my next research. Besides, the last benchmark focused on free & open source products, and I couldn't help myself, I had to test them against each other.

Q: I can't wait for the next research, what can I do to speed things up?

A: I'm currently looking for methods to speed up the processes related to these researches, so if you're willing to help, contact me.

Q: What’s with the titles that contain cheesy movie quotes?

A: That's just it - I happen to like cheese. Let's see you coming up with better titles at 4AM.

29. Appendix A – Assessing Web Application Scanners

Although this benchmark contains tons of information, and is very useful as a decision assisting tool, the content within it cannot be used to calculate the accurate ROI (return of investment) of each web application scanner. Furthermore, it can't predict on its own exactly how good will the results of each scanner be in every situation (but it can predict what won't be detected), since there are additional factors that need to be taken into account.

The results in this benchmark could serve as an accurate evaluation formula only if the scanner will be used to scan a technology that it supports, pages that it can detect (manual crawling features can be used to overcome many obstacles in this case), and locations without technological barriers that it cannot handle (for example, web application firewalls or anti-CSRF tokens).

In order for us to truly assess the full capability of web application vulnerability scanners, the following features must be tested:

· The entry point coverage of the web application scanner must be as high as possible; meaning, the tool must be able to locate and properly activate (or be manually "taught") all the application entry points (e.g. static & dynamic pages, in-page events, services, filters, etc). Vulnerabilities in an entry point that wasn't located will not be detected. The WIVET project can provide additional information on coverage and support.

· The attack vector coverage of the web application scanner – does it support input vectors such as GET / POST / Cookie parameters? HTTP headers? Parameter Names? Ajax Parameters? Serialized Objects? Each input vector that is not supported means exposures that won't be detected, regardless of the tool's accuracy level (assuming the unsupported attack/input vector is vulnerable).

· The scanner must be able to handle the technological barriers implemented in the application, ranging from authentication mechanism to automated access prevention mechanisms such as CAPTCHAs and anti-CSRF tokens.

· The scanner must be able to handle any application specific problems it encounters, including malformed HTML (tolerance), stability issues and other limitations. If the best scanner in the world will consistently cause the application to crash in a couple of seconds, then it's not useful for assessing the security of that application (in matters that don't relate to DoS attacks).

· The number of features (active & passive) implemented in the web application vulnerability scanner.

· The accuracy level of each and every plugin supported by the web application vulnerability scanner.

That being said, it's crucial to remember that even in the most ideal scenario, with the absence of human intelligence, scanners can't detect all the instances of exposures that are truly logical – meaning, are related to specific business logic, and thus, are not perceived as an issue by an entity that can't understand the business logic.

But the sheer complexity of the issue does not mean that we shouldn't start somewhere, and that's exactly what I'm trying to do in my benchmarks – create a scientific, accurate foundation for obtaining that goal, with enough investment, over time.

Note that my explanations describe only a portion of the actual tests that should be performed, and I'm sharing them only to emphasize the true complexity of the core issue; I haven't touched stability, bugs, and a lot of other subjects, which may affect the overall result you get.

Additional information on evaluation standards for web application vulnerability scanners can be found in the WASC Web Application Security Scanner Evaluation Criteria web site.

30. Appendix B – A List of Tools Not Included In the Test

The following commercial web application vulnerability scanners were not included in the benchmark, since I didn't manage to get an evaluation version until the article publication deadline, or in the case of one scanner (mcafee), had problems with the evaluation version that I didn't manage to work out until the benchmark's deadline:

Commercial Scanners not included in this benchmark

· N-Stalker Commercial Edition (N-Stalker)

· McAfee Vulnerability Manager (McAfee / Foundstone)

· NeXpose Enterprise Edition Web Application Scanning Features (Rapid7)

· Retina Web Application Scanner (eEye Digital Security)

· WebApp360 (NCircle)

· Core Impact Pro Web Application Scanning Features (Core Impact)

· Parasoft Web Application Scanning Features (a.k.a WebKing, by Parasoft)

· MatriXay Web Application Scanner (DBAppSecurity)

· Falcove (BuyServers ltd, currently Unmaintained)

· Safe3WVS 9.2 Commercial Edition (Safe3 Network Center)

The following open source web application vulnerability scanners were not included in the benchmark, mainly due to time restrictions, but will be included in future benchmarks:

Open Source Scanners not included in this benchmark

· Rabbit VS

· Spacemonkey

· Kayra

· 2gwvs

· Webarmy

· springenwerk

· Mopset 2

· XSSFuzz 1.1

· Witchxtoolv

· PHP-Injector

· XSS Assistant

· Fiddler XSSInspector/XSRFInspector Plugins

· GNUCitizen JAVASCRIPT XSS SCANNER - since WebSecurify, a more advanced tool from the same vendor is already tested in the benchmark.

· Vulnerability Scanner 1.0 (by cmiN, RST) - since the source code contained traces for remotely downloaded RFI lists from locations that do not exist anymore.

The benchmark focused on web application scanners that are able to detect either Reflected XSS or SQL Injection vulnerabilities, can be locally installed, and are also able to scan multiple URLs in the same execution.

As a result, the test did not include the following types of tools:

· Online Scanning Services – Online applications that remotely scan applications, including (but not limited to) Appscan On Demand (IBM), Click To Secure, QualysGuard Web Application Scanning (Qualys), Sentinel (WhiteHat), Veracode (Veracode), VUPEN Web Application Security Scanner (VUPEN Security), WebInspect (online service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC – currently offline), Cloud Penetrator (Secpoint), Zero Day Scan, DomXSS Scanner, etc.

· Scanners without RXSS / SQLi detection features:

o Dominator (Firefox Plugin)

o fimap

o lfimap

o phpBB-RFI Scanner

o DotDotPawn

o LFI (Library-level Fault Injector)

o lfi-scanner

o LFI-Scanner

o lfi-rfi2

o LFI/RFI Checker (astalavista)

o CSRF Tester

o etc

· Passive Scanners (response analysis without verification):

o Watcher (Fiddler Plugin by Casaba Security)

o Skavanger (OWASP)

o Pantera (OWASP)

o Ratproxy (Google)

o CAT The Manual Application Proxy (Context)

o etc

· Scanners of specific products or services (CMS scanners, Web Services Scanners, etc):

o WSDigger

o Sprajax

o ScanAjax

o Joomscan

o wpscan

o Joomlascan

o Joomsq

o WPSqli

o etc

· Web Application Scanning Tools which are using Dynamic Runtime Analysis:

o PuzlBox (the free version was removed from the web site, and is now sold as a commercial product named PHP Vulnerability Hunter)

o Inspathx

o etc

· Uncontrollable Scanners - scanners that can’t be controlled or restricted to scan a single site, since they either receive the list of URLs to scan from Google Dork, or continue and scan external sites that are linked to the tested site. This list currently includes the following tools (and might include more):

o Darkjumper 5.8 (scans additional external hosts that are linked to the given tested host)

o Bako's SQL Injection Scanner 2.2 (only tests sites from a google dork)

o Serverchk (only tests sites from a google dork)

o XSS Scanner by Xylitol (only tests sites from a google dork)

o Hexjector by hkhexon – also falls into other categories

o d0rk3r by b4ltazar

o etc

· Deprecated Scanners - incomplete tools that were not maintained for a very long time. This list currently includes the following tools (and might include more):

o Wpoison (development stopped in 2003, the new official version was never released, although the 2002 development version can be obtained by manually composing the sourceforge URL which does not appear in the web site- http://sourceforge.net/projects/wpoison/files/ )

o etc

· De facto Fuzzers – tools that scan applications in a similar way to a scanner, but where the scanner attempts to conclude whether or not the application or is vulnerable (according to some sort of “intelligent” set of rules), the fuzzer simply collects abnormal responses to various inputs and behaviors, leaving the task of concluding to the human user.

o Lilith 0.4c/0.6a (both versions 0.4c and 0.6a were tested, and although the tool seems to be a scanner at first glimpse, it doesn’t perform any intelligent analysis on the results).

o Spike proxy 1.48 (although the tool has XSS and SQLi scan features, it acts like a fuzzer more then it acts like a scanner – it sends payloads of partial XSS and SQLi, and does not verify that the context of the returned output is sufficient for execution or that the error presented by the server is related to a database syntax injection, leaving the verification task for the user).

· Fuzzers – scanning tools that lack the independent ability to conclude whether a given response represents a vulnerable location, by using some sort of verification method (this category includes tools such as JBroFuzz, Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type of exposure that was verified were included in the benchmark (Powerfuzzer).

· CGI Scanners: vulnerability scanners that focus on detecting hardening flaws and version specific hazards in web infrastructures (Nikto, Wikto, WHCC, st4lk3r, N-Stealth, etc)

· Single URL Vulnerability Scanners - scanners that can only scan one URL at a time, or can only scan information from a google dork (uncontrollable).

o Havij (by itsecteam.com)

o Hexjector (by hkhexon)

o Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)

o Mysqloit (by muhaimindz)

o PHP Fuzzer (by RoMeO from DarkMindZ)

o SQLi-Scanner (by Valentin Hoebel)

o Etc.

· Vulnerability Detection Assisting Tools – tools that aid in discovering a vulnerability, but do not detect the vulnerability themselves; for example:

o Exploit-Me Suite (XSS-Me, SQL Inject-Me, Access-Me)

o Fiddler X5s plugin

o XSSRays (chrome Addon)

· Exploiters - tools that can exploit vulnerabilities but have no independent ability to automatically detect vulnerabilities on a large scale. Examples:

o MultiInjector

o XSS-Proxy-Scanner

o Pangolin

o FGInjector

o Absinth

o Safe3 SQL Injector (an exploitation tool with scanning features (pentest mode) that are not available in the free version).

o etc

· Exceptional Cases

o SecurityQA Toolbar (iSec) – various lists and rumors include this tool in the collection of free/open-source vulnerability scanners, but I wasn’t able to obtain it from the vendor’s web site, or from any other legitimate source, so I’m not really sure it fits the “free to use” category.

31. Appendix C – WAVSEP Scan Logs

The execution logs, installation steps and configuration used while scanning with the various tools are all described in the following report:

http://sectooladdict-benchmarks.googlecode.com/files/Scan%20Log%20-%20WAVSEP%20Benchmark%202011.pdf

32. Appendix D – Scanners with Abnormal Behavior

The following appendix was published in my previous benchmark, but I decided to include in the current benchmark, mainly because I didn't manage to invest the time to get to the bottom of these mysteries, and didn't see any information on someone else that did.

During the current & previous assessment, parts of the source code of open source scanners and the HTTP communication of some of the scanners was analyzed; some tools behaved in an abnormal manner that should be reported:

· Priamos IP Address Lookup – The tool Priamos attempts to access “whatismyip.com” (or some similar site) whenever a scan is initiated (verified by channeling the communication through Burp proxy). This behavior might derive from a trojan horse that infected the content on the project web site, so I’m not jumping to any conclusions just yet.

· VulnerabilityScanner Remote RFI List Retrieval (listed in the scanners that were not tested, appendix A, developed by a group called RST, http://pastebin.com/f3c267935) – In the source code of the tool VulnerabilityScanner (a python script), I found traces for remote access to external web sites for obtaining RFI lists (might be used to refer the user to external URLs listed in the list). I could not verify the purpose of this feature since I didn’t manage to activate the tool (yet); in theory, this could be a legitimate list update feature, but since all the lists the tool uses are hardcoded, I didn’t understand the purpose of the feature. Again, I’m not jumping to any conclusions; this feature might be related to the tool’s initial design, which was not fully implemented due to various considerations.

Although I did not verify that any of these features is malicious in nature, these features and behaviors might be abused to compromise the security of the tester’s workstation (or to incriminate him in malicious actions), and thus, require additional investigation to disqualify this possibility.

24 comments:

Miroslav ŠtamparAugust 2, 2011 at 2:06 AM
Not taking false positives into the general score is at least frivolous. Just for consideration. I'll give you a program with 1 line of code (print "target is vulnerable") and it will have 100% of success rate and 100% of false positives and you'll put it at the top of the list. LOL :)
ReplyDelete
Replies
Shay ChenAugust 2, 2011 at 2:24 AM
Yep. Tricky Issue, and apart from the ranking formula, I also have other issues:
In fact, in some of the charts, in case two tools have an even score, it will present the one with the highest false positive ratio first (!), regardless of what I write in the query, or define in the form/report. I have been struggling on this issue with MS Access with zero success so far... my punishment for using MS Access to store the information :)
ReplyDelete
Replies
AnonymousAugust 2, 2011 at 6:42 PM
2 bad thats a really old copy of Cenzic
ReplyDelete
Replies
stiguruAugust 2, 2011 at 8:26 PM
Where is Saint?
Where is Qualys?
Where is Rapid7?
Where is eEye?

I don't get it.
ReplyDelete
Replies
Shay ChenAugust 3, 2011 at 4:31 AM
(reply for stiguru)
As far as I know, Qualys only provide SaaS scanning services, and don't supply their product for local tests, one of the perquisites of all the scanners in this benchmark (which is required in order to make sure that there's no manual intervention in the test).
I didn't manage to get an evaluation license of Rapid7 and eEye vulnerability scanners until the deadline, and wasn't aware of SAINT's web application scanning capabilities (until now, thanks to you), but I'll do my best to test these products in my next research.
ReplyDelete
Replies
Tasos "Zapotek" LaskosAugust 3, 2011 at 4:43 AM
@Miroslav Stampar

I didn't see a general score anywhere and the way that the results were presented is certainly not frivolous.

It gives you the FP results individually so that you can glean the sensitivity of the scanner yourself.

As for your example, I doubt that any scanner developer would risk the ridicule.

Disclaimer: My scanner is included in benchmark but it's sitting pretty anyway you slice it, so my comment was more for conversation's sake rather than anything else. :)
ReplyDelete
Replies
AnonymousAugust 3, 2011 at 11:38 AM
hi, thank you very much for your work, very interesting! are the test cases (i.e. php scripts) public available (maybe I missed the link?) ?

cheers,
Reiners
ReplyDelete
Replies
Shay ChenAugust 3, 2011 at 2:22 PM
Sure. It can be downloaded from http://code.google.com/p/wavsep/
ReplyDelete
Replies
Miroslav ŠtamparAugust 4, 2011 at 12:43 AM
@Tasos "Zapotek" Laskos

i haven't said anything about the test(s) itself - i can sense some hard work was invested into this. i just wanted to say that with 'print "target is vulnerable"' you could get to the top in the given charts.

p.s. damn, i'll do that for the next round of tests :)
ReplyDelete
Replies
Shay ChenAugust 4, 2011 at 1:00 AM
There's a good subject discussed by @Tasos and @Miroslav that I'd like to discuss further.
It's related to the position of scanners in the benchmark, and the fact that currently, false positives do not "lower" the position of a scanner in the chart (even though the chart still presents the amount of false positives separately).
A potential flaw in this scoring mechanism was raised by @Miroslav, since in this method, any scanner that reports 100% percent of the pages it scans as vulnerable will be ranked 1st, even if it's false positive ratio is also 100%.
Well, in my opinion, there's a difference between a scanner with a high ratio of false positives (something I let the user's decide whether or not to use), and a scanner that reports everything as vulnerable, and two good examples of explaining this scenario are the cases of the tools *Lilith" and "Spike Proxy".
Whenever I tested a tool, I also analyzed its communication (using burp/wireshark), and the payloads it was sending in order to conclude something was vulnerable (assuming its license permitted it). In many cases, the tools "spike proxy" and "Lilith" sent payloads that could not have confirmed vulnerabilities that were reported, and in the case of spike proxy (a tool from 2003), nearly every page was found vulnerable to every plugin the tool had.
As a result, I decided not to include these tools in the benchmark (due to the absence of a logical detection algorithm), and instead, placed them under the category of "de-facto fuzzers" (since they provide the user with leads, without performing verifications themselves), found in Appendix B.
I believe that if any of the other tools would have behaved in a similar manner, I would have found out, and place it under the same category (which isn't a punishment, just a classification).
Is my test perfect? No.
Is my method foolproof? Nonsense.
But I did my best, and I did try to detect these issues, as well as a variety of others.
ReplyDelete
Replies
Shay ChenAugust 4, 2011 at 3:23 AM
(Answer for @c3dd3c34-bd71-11e0-8eff-000bcdca4d7a(

True, the copy of Cenzic was relatively old, and so was the copy of Burp Suite (even though some of the data and particularly the list of features for Cenzic was updated from other sources as well).
I have touched the reasons for that in a whole section called: "10. Morale Issues in Commercial Product Benchmarks", in which I admit the fact that the test was less fair precisely for these two vendors, and that as a result, I will change the format of my researches in the future to a dynamic score – one that could present the immediate result of any vendor (will be explained in a different post).
ReplyDelete
Replies
DruAugust 4, 2011 at 6:49 AM
Hi Shay,

Thanks for this! Of course it is impossible to create a perfect webapp test and please everyone with the results but I feel you have done better then anyone else has. You can tell that you have put a lot of work into this. I'm curious if you had a chance to test the latest version of BurpSuite (1.4) scanner and how it would rank? Thanks again!
ReplyDelete
Replies
Shay ChenAugust 4, 2011 at 7:21 AM
Hi Dru,
I didn't manage to test Burp 1.4; the version tested was 1.3.09, and 1.4 was released less than two months ago (http://www.pentestit.com/2011/06/06/update-burp-suite-v14/), about two months after I finished testing 1.3.09 (again, one of two vendors that had a newer version that wasn't tested – the other one being cenzic).
I know from rumors that in 1.4 they also implemented a relatively rare feature that is only implemented by 4 other vendors (privilege escalation checks – which is a feature that as far as I know, is only implemented by appscan, WebInsepct, cenzic hailstorm and NTOSpider).
Although 4 months are more than enough time to implement significant changes to the tested vulnerability detection mechanisms, I don't want to guess or estimate things I didn't manually test myself.
I intend to contact them officially for my next research in order to test that, but I need to modify the way results are presented to ensure that next time, there won't be any significant time differences for any vendor… which means I have to do some infrastructure work upfront.
ReplyDelete
Replies
UnknownAugust 7, 2011 at 10:27 PM
Great overview, thanks for that. Small remark: in a next test, can you include tests for spider functionality (e.g. by using WIVET). This might give a better insight whether or not some of the scanners can be used in a point-and-shoot scenario.
ReplyDelete
Replies
WLADIMIRAugust 8, 2011 at 6:08 PM
I Agree with HERMAN... GREAT OVERVIEW...should have more posts like this
ReplyDelete
Replies
AnonymousAugust 17, 2011 at 6:21 AM
Hi Shay, any chance that you could put together and include cost per package next time? (In some kind of base/universal configuration like 1yr license/support/maint./updates, unlimited targets, etc)

Obviously Open Source stuff is going to be $0. But, if for example the cost of one commercial product is significantly more than a combination of other commercial products then that tells a good story and present interesting options as well.

Just as a totally fictitious example: If Burp Pro + NetSparker Commercial + WebInspect together cost less than IBM AppScan Standard then that's also becomes a noteworthy factor for companies, contractors, and analysts as well.
ReplyDelete
Replies
Shay ChenAugust 30, 2011 at 3:15 AM
Hi Guys.
Using WIVET for future tests is on my list (and I hope I'll have enough time to include it in my next research - although I haven't estimated the effort yet), and although I intentionally avoided publishing the prices of the various products, I will do it eventually, once the research reaches a mature phase (I have all the numbers, but didn't want to associate the research to ROI, at least not yet).
ReplyDelete
Replies
Jordan M SchroederSeptember 12, 2011 at 12:59 PM
Incredible data and extremely useful. Despite the flaws inherent in trying to compare vastly different products, you have worked through it all logically. Your process has helped me understand how to evaluate the data and to help me choose the right tools for the job at hand.

Perhaps readers should think about not picking the apps at the top of the lists just because they are at the top, but use your work to choose the best combination of benefits for maximum impact.

For instance, I'm curious about using w3af's spider to feed arachni's scans. I know that arachni has a new spider in v0.3, but the ability to import might solve problems, too.

And it's research work like this site that makes questions like that possible.

I can't thank you enough Shay-Chen.
ReplyDelete
Replies
Lentin VargheseOctober 5, 2011 at 5:35 AM
This is an awesome post which gives almost perfect idea about web application scanners....Thanks a lot Shay-chen.....You saved my day by avoiding my google reasearch....
ReplyDelete
Replies
UnknownApril 30, 2012 at 8:21 AM
The GamaSec Scan SaaS was off line during your verification I invite you to try it and to share your result as we believe we have one of the best performance online SaaS tools scan on the market especial again application vulnerabilities
www.gamasec.com
ReplyDelete
Replies
Leo RomeroMay 17, 2012 at 4:16 PM
Good job ;)
[+] SAlu2
ReplyDelete
Replies
UnknownSeptember 13, 2012 at 3:19 AM
This comment has been removed by the author.
ReplyDelete
Replies
KBFebruary 14, 2013 at 8:46 PM
Hi Shay, I didnt find any note about integration of any of these tools with QA testing frameworks like Silk or Selenium. Have you done any research on that?
ReplyDelete
Replies

Add comment