Monday, July 30, 2012

SAAS Scanners (Qualys) vs Commercial Scanners

The results of Qualys - a SAAS scanning service provider were added to http://www.sectoolmarket.com.
Qualys got great scores in WIVET, SQLi & RXSS, but don't detect RFI/Path Traversal cases, and I'm currently not sure if it's my testing methodology, bugs or the absence of appropriate plugins.
Link to the 2012 benchmark: http://bit.ly/LloTfL (Qualys results are only included in sectoolmarket)

The Diviner - Clairvoyance in the Digital Frontier


The Diviner
Digital Clairvoyance
Server-Side Source Code and Memory Divination

How to gain insight into the server-side source code and memory structure of any application, using black box techniques and without relying on any security exposures.

POC is implemented as a ZAP proxy extension, developed by Hacktics ASC.




Introduction
There's a LOT of quality infosec publications lately, in blog posts, articles, videos and whitepapers. Even though I try my best, I admit it's hard for me to keep up.

Although this post is one of these publications, I already admit that the title sounds a bit confusing and maybe even scary, and I am aware of that since that's a response I got from many individuals.

So what's so special in this post that should make you want to invest 5 minutes of your precious time to read it?

I could tell you stories about research and development work that's been going on for more than a year, or mention the fact that it contains an entirely new concept in hacking, but I think I'll take the direct approach with this one:

Using a new technology that relies on black box techniques, the server-side source code of any application can be stolen, the server side memory can be mapped, and so can the data flow of server side values.
The technique is already implemented in a new tool, does not rely on any security exposures, and works regardless of any existing security enhancements.

No introductions, obscure concepts or murky waters. Just facts - Get Code, Get Memory Map, No Security, Any Application.

Let's assume for a moment that the proclamations are true - so how can this information be used in penetration tests?

Although the posts in this blog were recently focused at automated scanning, it's never too late to correct the faults. Any veteran knows that the focus of any tester should always be the manual testing process, and this new information, when properly presented to a tester, can dramatically enhance the process of a manual penetration test:
  • Optimization of the manual testing process - allow the tester to make better decisions, faster and test entry points that are more likely to be vulnerable first.
  • Gaining Intel - enable the tester to understand how a certain page / entry point behaves under various conditions, by viewing a representation of the server-side source code, memory and cross-entry-point processes.
  • Locate complex vulnerabilities - locate leads for vulnerabilities that require access to multiple entry points, while overriding session and database values, with various perquisites and in extreme scenarios. Vulnerabilities that cannot be detected by automated tools, and are hard to locate even in manual assessments.
  • Think about it… viewing the server-side source code of any component… criteria or not, it's simply awesome.


In addition, if the information can be delivered in a standard format to a black box web application scanner, it can enhance the coverage of the tool to include potential events and behaviors that only occur under extreme or rare conditions.

And what enables us to gather this information using nothing but black box techniques?

Well, I can only define it as... umm... breadcrumbs. Many tiny, seemingly useless pieces of information.

So if having the ability to gain Insight into the server side, reducing the time necessary to perform many types of tests and being able to locate vulnerabilities that nobody else can detect without sheer luck is of any interest to you, hang on a bit.

And just to make sure you're not losing track, here's one way to present it:

Activating Diviner's Clairvoyance feature - viewing a representation of the server side code

Viewing the Dynamic Server Memory & Processes Map Generated by Diviner


The Problem – The Limitations of Manual Pentesting
The process of manual penetration testing is a process of trial and error, which is composed of event-triggering attempts, behavior analysis and deduction;

Through a process of trial and error, the tester learns how a certain application entry point responds to specific input, access patterns and extreme conditions, locates behaviors that might be caused by potential vulnerabilities, and verifies (or rules out) the existence of these vulnerabilities through exploits, comparisons, etc.

Since there are dozens of potential generic application-level attacks (read the lists in OWASP, WASC and CWE if this number sounds exaggerated), excluding the use of scanners and fuzzers and with the exception of very small applications, this process can only be manually performed on part of the tested application entry points, and relies heavily on experience, intuition, methodology and sometimes luck.

The point I am trying to make is this - currently, there is an inefficient use of time in the process of manual penetration testing.

Don't jump to conclusions or take it personally... let me explain my intention:

Even though efficient information gathering enables the tester to narrow the list of tests that should be performed on each application, entry point, page or parameter - it still includes a lot of tests to perform, often more than the tester can do in the time allocated to the test.

Furthermore, since the most of the global information gathering processes rely on information disclosure, passive information gathering and fingerprinting, the tester needs to manually gather information on specific targets prior to testing them, or perform the test "blindly", while relying on other incentives.

Take SQL injection for example, one of the most common tests that penetration testers attempt to perform. In order to truly be certain that a certain location is (or isn't) vulnerable, the tester needs to receive different kinds of feedback; Sometimes a visible or hidden error make the task simple (blablabla.SQLException), Sometimes the tester needs to dig deeper and detect content differentiation, or compare responses to inputs that contain arithmetic or mathematical operations (id=4-2 vs id=5-3). When the tested entry point does not provide any feedback, he might be required to use payloads that are designed to delay the execution of SQL statements, and if an exposure with a similar obscure behavior affects an offline process or an indirectly affected backend server, he/she might even need to inject payloads that execute an exploit that alters content (risky) or sends a notification to external entities (mail, ping, etc).

Assuming the assessment method is a black box assessment, since there are various types of databases and syntax injection contexts, the tester will need to use a lot of payloads to truly verify the issue - in each field, and in each location.

Scanners attempt to tackle this issue by performing various tests on a wide range of targets, but conclude themselves whether or not the location is vulnerable, and currently, are far from performing these tests in a sufficient amount of extreme or complex scenarios.

Fuzzers on the other hand can store the different responses and behaviors of multiple entry points, but don't provide out-of-the-box support for complex processes or complex analysis methods, are usually not application-aware, and present the information in a way that is hard to digest.

The problem, however, could be handled using another method:
Divination attacks, a crossbreed between automated testing and human deduction, provide an alternate (or complementary) route:

Consider the methods required to detect the following complex vulnerability:

"SQL injection vulnerability, in which the *attack payload* is injected into a server variable in the *registration phase*, stored in the *database*, but only affects the application in the *event of writing an exception into a database log* (the vulnerable code segment), which only occurs in a module that generates the *monthly report* for a user, which requires *authentication*, while the log triggering exception requires the user to *directly access* the last phase of a multiphase report generation process while skipping the rest of the phases in the flow (forceful browsing)."

In other words, a vulnerability that affects the application indirectly, and only when certain extreme scenarios occur.

Although talented (or lucky) lucky testers might be able to detect it in a limited scope, it's unlikely that it will be detected by a black box automated vulnerability scanner, passive security scanner, or any other black-box tool… that is unless a certain process will make it possible…

Divination Attacks
When using the general term "Divination", this article refers to the following interpretation:

"Divination is the attempt to gain insight into a question or situation by way of an occultic standardized process or ritual. Used in various forms for thousands of years, diviners ascertain their interpretations of how a querent should proceed by reading signs, events, or omens." - Wikipedia's Definition for Divination.

For those of you that read this section first, and for those that got confused from the introduction, please, let me clarify: I am not proposing to hire the practitioners of witchcraft to participate in penetration tests.

I am however, proposing the following solution to the time management problem:
Inspect the direct and indirect effect of each parameter, on each page, with every possible sequence and under every possible condition, before deciding which attack to perform, and where.

Since obtaining this information manually is not probable, the process needs to be, at least in some aspects, automated.

And how can we obtain this information using an automated process?

Execute Scenarios -> Isolate Behaviors -> Perform Verifications -> Interpret -> GUI

Assume that interception proxy contains the following requests in its request history:

In order to analyze the effect of a given input parameter on other entry points (and on the origin entry point), we need to send a value to the target parameter, and then access another entry point - in order to see the effect (for example, send a value in the username input parameter to request 4, and then access request 6 to see if there was any special effect).

The process must be repeated for the next "exit point", while sending another value (identical or otherwise) to the target parameter, prior to accessing the "exit point".


The result of this analysis might change due to various factors, such as:
  • Authentication - Authenticate before accessing the entry point, before accessing the "exit point" (a.k.a target), or not at all.
  • Multiple Sessions - When an entry point responds by replacing the session identifier, the scenario could continue using the old session identifier (assuming it was not invalidated), or using the new one.
  • History Requirements – Certain entry points might require the execution of previous entry points using a shared session identifier. For example, testing a parameter sent to the fourth phase of a multiphase process might require access to previous entry points using the same session identifier, with, or without authentication.
  • Input Type - The target "exit point" and "entry point" might respond differently to other types of input (e.g. input with random values, valid values, invalid syntax characters, etc).
  • Required Tokens – Certain behaviors might only occur when a required token is sent to the entry point (or not sent to the entry point) – for example, the existence of a timestamp or anti-CSRF token might affect each entry point in different ways.
  • Invalid Access – accessing pages without meeting their "requirements" might still generate a "beneficial" behavior – for example, accessing a page without a valid anti-CSRF token might trigger a response that reuses a server variable that can be affected, and thus, expose the entry point to attacks.


So in order to truly analyze the effect of the parameter on the various entry points of the application, we need to try everything (or at the very least – try a lot of scenarios),  and we need to do it to as many input parameters as possible, to as many entry/exit points as possible, and in various scenarios.

Furthermore, the behavior itself might vary according to the scenario, input and in-page logic: it can be input reflection, exception, a certain valid response, time delay, content differentiation or anything else; the behaviors that we are interested in are behaviors that can be traced back to a certain process, memory allocation, potential issue or a specific line of code.

The information gathered in such a process will be composed of a lot of behaviors, which vary per page, per input, and per scenario.

These "behaviors" can then be presented to the tester in a simple, visual form, which will enable him to decide which behaviors he should inspect manually.

Don't get me wrong - I am not suggesting that we limit the inspection only to the information presented by such a process - I'm merely stating that it is wise to focus on this information first, and verify the various leads it provides before using the hardcore manual approach. After using this approach for some time, I can clearly state the following:

The information provided by the process, when used by a tester, can transform even a very complex vulnerability into a low hanging fruit.

And that's not all. The collection of behaviors can also be "converted" into other useful forms, such as the ones presented in the following sections.

Source Code Divination
Source code divination is a new concept and approach (can also be referred to as source code fingerprinting).

Think about it - we use fingerprinting techniques to identify web servers, content management systems, operating systems, web application firewalls, and more.

Why not use the same approach to identify specific lines of code? Why not use it to detect all the lines of code, or at the very least, a large portion of the server code?

Nearly all of us classify source code disclosure, or attacks that can obtain the server source code as severe exposures (at least to some extent), and claim in the reports that we provide to customers that attackers can harness this information to enhance their attacks, learn about the system's structure and identify potential flaws in it.

If a large portion of the application's source code could be obtained using accurate "fingerprinting", wouldn't that lead to the same result?

In order to explain how this information can be obtained, let's use an example:

Connection pool exhaustion (or consumption) is one of the many forms of application denial of service attacks. It occurs when an attacker intentionally accesses an entry point (page/web service, etc) that requires a database* connection pool, using multiple threads – more threads the maximum amount of connections in the pool. The attack will delay the responses from entry points that rely on the pool, but won't affect entry points that don't use it (assuming the amount of threads don't affect other resources).

Although this behavior is an exposure in its own right, it also leads to the following conclusion:

It is highly likely that somewhere in the entry point's code, a connection is obtained from a connection pool, and since in many cases, a connection pool is a mechanism used to interact with databases, it's highly likely that the source code is similar to the following (jsp sample):

try {
    Connection conn = DriverManager.getConnection(…);
    …
} catch (…) {…}

Of course – this connection pool might serve a different type of resource, but using additional verifications we might be able to increase the level of certainty – for example, identifying erroneous databases responses in the same entry point, or even detecting certain exposures in other application entry points.

The same approach can be used to convert other behaviors to the lines of code that might have caused them, and since the previous process gathered a lot of behaviors – these can be converted into a fair amount of code - pseudo code that can be presented using any specific syntax, and enable the tester to understand how a certain page behaves – prior to testing that page.

For example, input sent from one page (the "source" page), but reflected in another (the "target" page) is probably shared through a session variable, file or database field. The origin can be isolated by accessing the target page using a different session identifier, but using the same identical process used to access it before (login, history, etc) - with the exception of the source page;

If the reflected input is not present in the target page, the probability for the existence of the following lines of code in the source page and target page increases:

Source Page:
String input1 = request.getParameter("input1");
session.setAttribute("sessionValue1", input1 );

Target Page:
out.println(session.getAttribute("sessionValue1"));

If however, the reflected input would have been present at the verification scenario, than the source code matching the pattern will probably include database access, file access or static server variables – and specific aspects of these behaviors can be isolated in turn (insert statements are more likely to exist in pages that rapidly increase in size, update statements in pages with relatively static size and persistent changes, etc).
At the end of the processes, after performing additional verifications and tests, the options with the highest probability will be selected and presented to the user.

And how will this code be sorted? Which lines will appear first?

Although the sorting problem has many solutions, one of the main solutions is probably "delay-of-service" attacks (yes, I said delay, not deny).

Presented in the research "Temporal Session Race Conditions", these attacks were originally meant to delay the execution of specific lines of code, in order to extend the lifespan of temporary session variables – but these attacks can also be used to sort some of the code – by inspecting if exceptions or conditional behaviors occur instead of the delay, before the delay, after the delay or not at all.

For example, performing a connection pool exhaustion attack on a page while simultaneously sending an error generating value to the same vulnerable page will provide a potentially important piece of information – which code is executed first: the code that attempts to obtain a connection from the pool, or the code that is prone to the exception.

Note - Although this method isn't exactly "safe", it will probably enhance the results more than other methods for sorting divined lines of code.

Like fingerprinting, this information might not be 100% accurate (although it can be VERY accurate, if the processes is performed properly and thoroughly), but can still be very beneficial for the purpose of the test – just like other forms of fingerprinting.

I won't expand the subject of source code divination in this post (I do have plans to discuss it further in separate posts), but it's already implemented in the diviner extension that will be discussed in the following sections.



Memory Structure Divination and Cross Entry-Point Effects
In the previous process, we have discussed how an identified behavior (such as an exception or input reflection) can be classified as persistent or temporary – by reproducing the scenario that caused it using a different session identifier, identical process, and without accessing the "entry point" (source page). This process, alongside additional verifications allowed us to conclude whether a behavior is persistent, temporary or something else.

Although not all the behaviors rely on specific variables that are stored in the server side, some do, and from these behaviors we can conclude how and where does the server stores some of the content.
By crossing the information obtained from interesting scenarios that were discovered in the process, we can even locate multiple entry points that affect the same database tables, fields, session variables and static variables, and thus, construct a general structure of database tables and session attributes.


It's key to understand that the process does not verify the existence of any exposures or attempts to exploit any vulnerability; instead, it's simply uses a method of deduction to attempt to present what's going on behind the scenes,  in order for this information to enhance the abilities of a tester, or a scanner.

The Diviner Extension
During the last year, I collaborated with a number of individuals (especially with @Secure_ET, various colleagues and the OWASP ZAP project) so that these ideas will not remain a theory… and after numerous late night brainstorming sessions, various incarnations and a long development period – we have an initial version that works (beta phase).

The diviner platform – an active information gathering platform that implements many of the previously described concepts, is implemented as a ZAP proxy extension, and can be downloaded from the following address:

It can already illustrate server side behaviors and processes, contains features such as the task list/advisor which provide invaluable leads to potential exposures, present a partial map of the server side memory, and present a partial representation of the server side code.

The extension is deployed using a windows installer (or in binary format for other operating systems), and requires java 1.7.x and ZAP 1.4.0.1 in order to run properly.

Furthermore, since it attempts to identify behaviors that result from valid & invalid scenarios, and can't guess what is valid on its own, it must be used after a short manual crawling process that covers the important application sections with valid values.

It was tested mostly on small scale applications (100+- parameters, +-50) – including real-life applications, and although it will probably work on larger applications (it's not stuck in the database analysis process – be patient) – due to various optimizations (and sacrifices) we didn't yet make – it's recommended not to exceed that size.

We can currently identify 20+- different lines of code, but have plans to implement tests that identify other lines of code, some with high probability, and some with absolute certainty.

We didn't yet implement features that sort the lines of code (and thus, currently rely on default positioning), but plan on implementing them in the future (with restrictions that will prevent their use for actual denial/delay of service attacks).

We have many additional experimental features that aren't mature enough, but are already working on refining them for the future versions.

We don't perform any form of automated vulnerability scanning, but plan on exporting the interesting leads to a format that can be used by external scanners to detect exposures in these abnormal scenarios.
Bottom line - It's not perfect yet, but it's already very useful, and can already help testers locate exposures that can't be located using other means, and make better decisions - quicker.

Acknowledgements
The diviner project was funded by Hacktics ASC.
The following individuals assisted me in various ways, and deserve acknowledgment for their contribution:

Eran Tamari (The lead developer) - for the countless hours of development, the sheer determination, and most of all, for being a true believer.

Simon Bennetts (psiinon) and Axel Neumann - The projects leaders of the OWASP Zed Attack Proxy (ZAP) project - for providing support, useful advice and adjustments that made the creation of Diviner possible.

Liran Sheinbox (Developer) - Diviner's Payload Manager (alpha).

Alex Mor, Oren Ofer and Michal Goldstein (Developers) - for their contribution to the development of Diviner's content differentiation analysis features (alpha).

Alex Ganelis, Tsachi Itschak and Lior Suliman (Developers) - Diviner Installer, ZAP Integration and various modifications.

Zafrir Grosman - material design.

The Flying Saucer Draught Emporium Bar at Houston, TX - for whatever substance that triggered the inspiration.

Friday, July 13, 2012

The 2012 Web Application Scanner Benchmark


Top 10:
The Web Application Vulnerability Scanners Benchmark, 2012
Commercial & Open Source Scanners
An Accuracy, Coverage, Versatility, Adaptability, Feature and Price Comparison of 60 Commercial & Open Source Black Box Web Application Vulnerability Scanners

By Shay Chen
Information Security Consultant, Researcher and Instructor
sectooladdict-$at$-gmail-$dot$-com
July 2012
Assessment Environments: WAVSEP 1.2, ZAP-WAVE (WAVSEP integration), WIVET v3-rev148

Table of Contents
1. Introduction
2. List of Tested Web Application Scanners
3. Benchmark Overview & Assessment Criteria
4. A Glimpse at the Results of the Benchmark
5. Test I - Scanner Versatility - Input Vector Support
6. Test II – Attack Vector Support – Counting Audit Features
7. Introduction to the Various Accuracy Assessments
8. Test III – The Detection Accuracy of Reflected XSS
9. Test IV – The Detection Accuracy of SQL Injection
10. Test V – The Detection Accuracy of Path Traversal/LFI
11. Test VI – The Detection Accuracy of RFI (XSS via RFI)
12. Test VII - WIVET - Coverage via Automated Crawling
13. Test VIII – Scanner Adaptability - Crawling & Scan Barriers
14. Test IX – Authentication and Usability Feature Comparison
15. Test X – The Crown Jewel - Results & Features vs. Pricing
16. Additional Comparisons, Built-in Products and Licenses
17. What Changed?
18. Initial Conclusions – Open Source vs. Commercial
19. Verifying The Benchmark Results
20. So What Now?
21. Recommended Reading List: Scanner Benchmarks
22. Thank-You Note
23. FAQ - Why Didn't You Test NTO, Cenzic and N-Stalker?
24. Appendix A – List of Tools Not Included In the Test

1. Introduction
Detailed Result Presentation at
Tools, Features, Results, Statistics and Price Comparison
(Delete Cache)
A Step by Step Guide for Choosing the Right Web Application Vulnerability Scanner for *You*
A Perfectionist Guide for Optimal Use of Web Application Vulnerability Scanners
[Placeholder]

Getting the information was the easy part. All I had to do was to invest a couple of years in gathering the list of tools, and a couple of more in documenting their various features. It's really a daily routine - you read a couple of posts in news groups in the morning, and couple blogs at the evening. Once you get used to it, it's fun, and even quite addictive.

Then came the "best" fantasy, and with it, the inclination to test the proclaimed features of all the web application vulnerability scanners against each other, only to find out that things are not that simple, and finding the "best", if there is such a tool, was not an easy task.
Inevitably, I tried searching for alternative assessment models, methods of measurements that will handle the imperfections of the previous assessments.

I tried to change the perspective, add tests (and hundreds of those - 940+, to be exact),  examine different aspects, and even make parts of the test process obscure, and now, I'm finally ready for another shot.

In spite of everything I had invested in past researches, due to the focus I had on features and accuracy, and the policy I used when interacting with the various vendors, it was difficult, especially for me, to gain insights from the mass amounts of data that will enable me to choose, and more importantly, properly use the various tools in real life scenarios.

Is the most accurate scanner necessarily the best choice for a point and shoot scenario? and what good will it do if it can't scan an application due to a specific scan barrier it can't handle, or because if does not support the input delivery method?

I needed to gather other pieces of the puzzle, and even more importantly, I needed a method, or more accurately, a methodology.

I'm sorry to disappoint you, dear reader, so early in the article, but I still don't have a perfect answer or one recommendation... But I sure am much closer than I ever was, and although I might not have the answer, I have many answers, and a very comprehensive, logical and clear methodology for employing the use of all the information I'm about to present.

In the previous benchmarks , I focused on assessing  3 major aspects of web application scanners, which revolved mostly around features & accuracy, and even though the information was very interesting, it wasn't necessarily useful, at least not in all scenarios.

So  decided to take it to the edge, but since I already reached the number of 60 scanners, it was hard to make an impression with a couple of extra tools, so instead, I focused my efforts on aspects.

This time, I compared 10 different aspects of the tools (or 14, if you consider non competitive charts), and chose the collection with the aim of providing practical tools for making a decision, and getting a glimpse of the bigger picture.

Let me assure you - this time, the information is presented in a manner that is very helpful, is easy to navigate, and is supported by presentation platforms, articles and step by step methodologies.

Furthermore, I wrapped it all in a summary that includes the major results and features in relation to the price, for those of us that prefer the overview, and avoid the drill down.  Information and Insights that I believe, will help testers invest their time in better-suited tools, and consumers in properly investing their money, in the long term or the short term (but not necessarily both*).

As mentioned earlier, this research covers various aspects for the latest versions of 11 commercial web application scanners, and the latest versions of most of the 49 free & open source web application scanners. It also covers some scanners that were not covered in previous benchmarks, and includes, among others, the following components and tests:

A Price Comparison - in Relation to the Rest of the Benchmark Results
Scanner Versatility - A Measure for the Scanner's  Support of Protocols & Input Delivery Vectors
Attack Vector Support - The Amount & Type of Active Scan Plugins (Vulnerability Detection)
Reflected Cross Site Scripting Detection Accuracy
SQL Injection Detection Accuracy
Path Traversal / Local File Inclusion Detection Accuracy
Remote File Inclusion Detection Accuracy (XSS/Phishing via RFI)
WIVET Score Comparison - Automated Crawling / Input Vector Extraction
Scanner Adaptability - Complementary Coverage Features and Scan Barrier Support
Authentication Features Comparison
Complementary Scan Features and Embedded Products
General Scanning Features and Overall Impression
License Comparison and General Information

And just before we delve into the details, one last tip: don't focus solely on the charts - if you want to really understand what they reflect, dig in.
Lists and charts first, detailed description later.

2. List of Tested Web Application Scanners

The following commercial scanners were included in the benchmark:
The following new free & open source scanners were included in the benchmark:
IronWASP v0.9.1.0

The updated versions of the following free & open source scanners were re-tested in the benchmark:
Zed Attack Proxy (ZAP) v1.4.0.1, sqlmap v1.0-Jul-5-2012 (Github), W3AF 1.2-rev509 (SVN), Acunetix Free Edition v8.0-20120509, Safe3WVS v10.1 FE (Safe3 Network Center) WebSecurify v0.9 (free edition - the new commercial version was not tested), Syhunt Mini (Sandcat Mini) v4.4.3.0, arachni v0.4.0.3, Skipfish 2.07b, N-Stalker 2012 Free Edition v7.1.1.121 (N-Stalker), Watobo v0.9.8-rev724 (a few new WATOBO 0.9.9 pre versions were released a few days before the publication of the benchmark, but I didn't managed to test them in time)

Different aspects of the following free & open source scanners were tested in the benchmark:
VEGA 1.0 beta (Subgraph), Netsparker Community Edition v1.7.2.13, Andiparos v1.0.6, ProxyStrike v2.2, Wapiti v2.2.1, Paros Proxy v3.2.13, Grendel Scan v1.0

The results were compared to those of unmaintained scanners tested in previous benchmarks:
PowerFuzzer v1.0, Oedipus v1.8.1 (v1.8.3 is around somewhere), Scrawler v1.0, WebCruiser v2.4.2 FE (corrections), Sandcat Free Edition v4.0.0.1, JSKY Free Edition v1.0.0, N-Stalker 2009 Free Edition v7.0.0.223, UWSS (Uber Web Security Scanner) v0.0.2, Grabber v0.1, WebScarab v20100820, Mini MySqlat0r v0.5, WSTool v0.14001, crawlfish v0.92, Gamja v1.6, iScan v0.1, LoverBoy v1.0, DSSS (Damn Simple SQLi Scanner) v0.1h, openAcunetix v0.1, ScreamingCSS v1.02, Secubat v0.5, SQID (SQL Injection Digger) v0.3, SQLiX v1.0, VulnDetector v0.0.2, Web Injection Scanner  (WIS) v0.4, Xcobra v0.2, XSSploit v0.5, XSSS v0.40, Priamos v1.0, XSSer v1.5-1 (version 1.6 was released but I didn't manage to test it), aidSQL 02062011 (a newer revision exists in the SVN but was not officially released)
For a full list of commercial & open source tools that were not tested in this benchmark, refer to the appendix.

3. Benchmark Overview & Assessment Criteria
The benchmark focused on testing commercial & open source tools that are able to detect (and not necessarily exploit) security vulnerabilities on a wide range of URLs, and thus, each tool tested was required to support the following features:
·         The ability to detect Reflected XSS and/or SQL Injection and/or Path Traversal/Local File Inclusion/Remote File Inclusion vulnerabilities.
·         The ability to scan multiple URLs at once (using either a crawler/spider feature, URL/Log file parsing feature or a built-in proxy).
·         The ability to control and limit the scan to internal or external host (domain/IP).

The testing procedure of all the tools included the following phases:
Feature Documentation
The features of each scanner were documented and compared, according to documentation, configuration, plugins and information received from the vendor. The features were then divided into groups, which were used to compose various hierarchal charts.
Accuracy Assessment
The scanners were all tested against the latest version of WAVSEP (v1.2, integrating ZAP-WAVE), a benchmarking platform designed to assess the detection accuracy of web application scanners, which was released with the publication of this benchmark. The purpose of WAVSEP’s test cases is to provide a scale for understanding which detection barriers each scanning tool can bypass, and which common vulnerability variations can be detected by each tool.
·         The various scanners were tested against the following test cases (GET and POST attack vectors):
o   816 test cases that were vulnerable to Path Traversal attacks.
o   108 test cases that were vulnerable to Remote File Inclusion (XSS via RFI) attacks.
o   66 test cases that were vulnerable to Reflected Cross Site Scripting attacks.
o   80 test cases that contained Error Disclosing SQL Injection exposures.
o   46 test cases that contained Blind SQL Injection exposures.
o   10 test cases that were vulnerable to Time Based SQL Injection attacks.
o   7 different categories of false positive RXSS vulnerabilities.
o   10 different categories of false positive SQLi vulnerabilities.
o   8 different categories of false positive Path Travesal / LFI vulnerabilities.
o   6 different categories of false positive Remote File Inclusion vulnerabilities.
·        The benchmark included 8 experimental RXSS test cases and 2 experimental SQL Injection test cases, and although the scan results of these test cases were documented in the various scans, their results were not included in the final score, at least for now.
·         In order to ensure the result consistency, the directory of each exposure sub category was individually scanned multiple times using various configurations, usually using a single thread and using a scan policy that only included the relevant plugins.
In order to ensure that the detection features of each scanner were truly effective, most of the scanners were tested against an additional benchmarking application that was prone to the same vulnerable test cases as the WAVSEP platform, but had a different design, slightly different behavior and different entry point format, in order to verify that no signatures were used, and that any improvement was due to the enhancement of the scanner's attack tree.



Attack Surface Coverage Assessment
In order to assess the scanners attack surface coverage, the assessment included tests that measure the efficiency of the scanner's automated crawling mechanism (input vector extraction) , and feature comparisons meant to assess its support for various technologies and its ability to handle different scan barriers.
This section of the benchmark also included the WIVET test (Web Input Vector Extractor Teaser), in which scanners were executed against a dedicated application that can assess their crawling mechanism in the aspect of input vector extraction. The specific details of this assessment are provided in the relevant section.
Public tests vs. Obscure tests
In order to make the test as fair as possible, while still enabling the various vendors to show improvement, the benchmark was divided into tests that were publically announced, and tests that were obscure to all vendors:
·         Publically announced tests: the active scan feature comparison, and the detection accuracy assessment of the SQL Injection and Reflected Cross Site Scripting, composed out of tests cases which were published as a part of WAVSEP v1.1.1)
·         Tests that were obscure to all vendors until the moment of the publication: the various new groups of feature comparisons, the WIVET assessment, and the detection accuracy assessment of the Path Traversal / LFI and Remote File Inclusion (XSS via RFI), implemented as 940+ test cases in WAVSEP 1.2 (a new version that was only published alongside this benchmark).

The results of the main test categories are presented within three graphs (commercial graph, free & open source graph, unified graph), and the detailed information of each test is presented in a dedicated section in benchmark presentation platform at http://www.sectoolmarket.com.

Now that were finally done with the formality, let's get to the interesting part... the results.

4. A Glimpse to the Results of the Benchmark
This presentation of results in this benchmark, alongside the dedicated website (http://www.sectoolmarket.com/) and a series of supporting articles and methodologies ([placeholder]), are all designed to help the reader to make a decision - to choose the proper product/s or tool/s for the task at hand, within the borders of the time or budget.

For those of us that can't wait, and want to get a glimpse to the summary of the unified results, there is a dedicated page available at the following links:

Price & Feature Comparison of Commercial Scanners
http://sectoolmarket.com/price-and-feature-comparison-of-web-application-scanners-commercial-list.html
Price & Feature Comparison of a Unified List of Commercial, Free and Open Source Products


Some of the sections might not be clear to some of the readers at this phase, which is why I advise you to read the rest of the article, prior to analyzing this summary.

5. Test I - Scanner Versatility - Input Vector Support
The first assessment criterion was the number of input vectors each tool can scan (and not just parse).

Modern web applications use a variety of sub-protocols and methods for delivering complex inputs from the browser to the server. These methods include standard input delivery methods such as HTTP querystring parameters and HTTP body parameters,  modern delivery methods such as JSON and XML, and even binary delivery methods for technology specific objects such as AMF, Java serialized objects and WCF.
Since the vast majority of active scan plugins rely on input that is meant to be injected into client originating parameters, supporting the parameter (or rather, the input) delivery method of the tested application is a necessity.

Although the charts in this section don't necessarily represent the most important score, it is the most important perquisite for the scanner to comply with when scanning a specific technology.

Reasoning: An automated tool can't detect a vulnerability in a given parameter, if it can't scan the protocol or mimic the application's method of delivering the input. The more vectors of input delivery that the scanner supports, the more versatile it is in scanning different technologies and applications (assuming it can handle the relevant scan barriers, supports necessary features such as authentication, or alternatively, contains features that can be used to work around the specific limitations).

The detailed comparison of the scanners support for various input delivery methods is documented in detail in the following section of sectoolmarket (recommended - too many scanners in the chart):

The following chart shows how versatile each scanner is in scanning different input delivery vectors (and although not entirely accurate - different technologies):

The Number of Input Vectors Supported – Commercial Tools




The Number of Input Vectors Supported – Free & Open Source Tools


The Number of Input Vectors Supported – Unified List



6. Test II – Attack Vector Support – Counting Audit Features
The second assessment criterion was the number of audit features each tool supports.

Reasoning: An automated tool can't detect an exposure that it can't recognize (at least not directly, and not without manual analysis), and therefore, the number of audit features will affect the amount of exposures that the tool will be able to detect (assuming the audit features are implemented properly, that vulnerable entry points will be detected, that the tool will be able to handle the relevant scan barriers and scanning perquisites,  and that the tool will manage to scan the vulnerable input vectors).

For the purpose of the benchmark, an audit feature was defined as a common generic application-level scanning feature, supporting the detection of exposures which could be used to attack the tested web application, gain access to sensitive assets or attack legitimate clients.

The definition of the assessment criterion rules out product specific exposures and infrastructure related vulnerabilities, while unique and extremely rare features were documented and presented in a different section of this research, and were not taken into account when calculating the results. Exposures that were specific to Flash/Applet/Silverlight and Web Services Assessment (with the exception of XXE) were treated in the same manner.

The detailed comparison of the scanners support for various audit features is documented in detail in the following section of sectoolmarket:

The Number of Audit Features in Web Application Scanners – Commercial Tools




The Number of Audit Features in Web Application Scanners – Free & Open Source Tools


The Number of Audit Features in Web Application Scanners – Unified List



So once again, now that were done with the quantity, let's get to the quality…

7. Introduction to the Various Accuracy Assessments
The following sections presents the results of the detection accuracy assessments performed for Reflected XSS, SQL Injection, Path Traversal and Remote File Inclusion (RXSS via RFI) - four of the most commonly supported features in web application scanners. Although the detection accuracy of a specific exposure might not reflect the overall condition of the scanner on its own, it is a crucial indicator for how good a scanner is at detecting specific vulnerability instances.
The various assessments were performed against the various test cases of WAVSEP v1.2, which emulate different common test case scenarios for generic technologies.
Reasoning: a scanner that is not accurate enough will miss many exposures, and might classify non-vulnerable entry points as vulnerable. These tests aim to assess how good is each tool at detecting the vulnerabilities it claims to support, in a supported input vector, which is located in a known entry point, without any restrictions that can prevent the tool from operating properly.

8. Test III – The Detection Accuracy of Reflected XSS
The third assessment criterion was the detection accuracy of Reflected Cross Site Scripting, a common exposure which is the 2nd most commonly implemented feature in web application scanners, and the one in which I noticed the greatest improvement in the various tested web application scanners.

The comparison of the scanners' reflected cross site scripting detection accuracy is documented in detail in the following section of sectoolmarket:


Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).

The Reflected XSS Detection Accuracy of Web Application Scanners – Commercial Tools



The Reflected XSS Detection Accuracy of Web Application Scanners – Open Source & Free Tools



The Reflected XSS Detection Accuracy of Web Application Scanners – Unified List



9. Test IV – The Detection Accuracy of SQL Injection
The fourth assessment criterion was the detection accuracy of SQL Injection, one of the most famous exposures and the most commonly implemented attack vector in web application scanners.

The evaluation was performed on an application that uses MySQL 5.5.x as its data repository, and thus, will reflect the detection accuracy of the tool when scanning an application that uses similar data repositories.

The comparison of the scanners' SQL injection detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).


The SQL Injection Detection Accuracy of Web Application Scanners – Commercial Tools



The SQL Injection Detection Accuracy of Web Application Scanners – Open Source & Free Tools



The SQL Injection Detection Accuracy of Web Application Scanners – Unified List



Although there are many changes in the results since the last benchmark, both of these exposures (SQLi, RXSS) were previously assessed, so, I believe it's time to introduce something new... something none of the tested vendors could have prepared for in advance...

10. Test V – The Detection Accuracy of Path Traversal/LFI
The fifth assessment criterion was the detection accuracy of Path Traversal (a.k.a Directory Traversal), a newly implemented feature in WAVSEP v1.2, and the third most commonly implemented attack vector in web application scanners.

The reason it was tagged along with Local File Inclusion (LFI) is simple - many scanners don't make the differentiation between inclusion and traversal, and furthermore, a few online vulnerability documentation sources don't. In addition, the results obtained from the tests performed on the vast majority of tools lead to the same conclusion - many plugins listed under the name LFI detected the path traversal plugins.

While implementing the path traversal test cases and consuming nearly every relevant piece of documentation I could find on the subject, I decided to take the current path, in spite of some acute differences some of the documentation sources suggested (but did implemented an infrastructure in WAVSEP for "true" inclusion exposures).

The point is not to get into a discussion of whether or not path traversal, directory traversal and local file inclusion should be classified as the same vulnerability, but simply to explain why in spite of the differences some organizations / classification methods have for these exposures, they were listed under the same name (In sectoolmarket - path traversal detection accuracy is listed under the title LFI).

The evaluation was performed on a WAVSEP v1.2 instance that was hosted on windows XP, and although there are specific test cases meant to emulate servers that are running with a low privileged OS user accounts (using the servlet context file access method), many of the test cases emulate web servers that are running with administrative user accounts.

[Note - in addition to the wavsep installation, to produce identical results to those of this benchmark, a file by the name of content.ini must be placed in the root installation directory of the tomcat server- which is different than the root directory of the web server]

Although I didn't perform the path traversal scans on Linux for all the tools, I did perform the initial experiments on Linux, and even a couple of verifications on Linux for some of the scanners, and as weird as it sounds, I can clearly state that the results were significantly worse, and although I won't get the opportunity to discuss the subject in this benchmark, I might handle it in the next.

In order to assess the detection accuracy of different path traversal instances, I designed a total of 816 OS-adapting path traversal test cases (meaning - the test cases adapt themselves to the OS they are executed in, and to the server they are executed in, in the aspects of file access delimiters and file access paths). I know it might seem a lot, and I guess I did got carried away with the perfectionism, but you will be surprised too see that these tests really represent common vulnerability instances, and not necessarily super extreme scenarios, and that results of the tests did prove the necessity.

The tests were deigned to emulate various combination of the following conditions and restrictions:



If you will take a closer look at the detailed scan-specific results at www.sectoolmarket.com, you'll notice that some scanners were completely unaffected by the response content type and HTTP code variation, while other scanners were dramatically affected by the variety (gee, it's nice to know that I didn't write them all for nothing... :) ).

In reality, there were supposed to more test cases, primarily because I intended to test injection entry points in which the input only affected the filename without the extension, or was injected directly into the directory name. However, due to the sheer amount of tests and the deadline I had for this benchmark, I decided to delete (literally) the test cases that handled these anomalies, and focus on test cases in which the entire filename/path was affected. That being said, I might publish these test cases in future versions of wavsep (they amount to a couple of hundreds).

The comparison of the scanners' path traversal detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).


The Path Traversal / LFI Detection Accuracy of Web Application Scanners – Commercial Tools



The Path Traversal / LFI Detection Accuracy of Web Application Scanners – Open Source & Free Tools



The Path Traversal / LFI Detection Accuracy of Web Application Scanners – Unified List



And what of LFI's evil counterpart, Remote File Inclusion?
(yeah yeah, I know, it was path traversal...)

11. Test VI – The Detection Accuracy of RFI (XSS via RFI)
The sixth assessment criterion was the detection accuracy of Remote File Inclusion (or more accurately, vectors of RFI that can result in XSS or Phishing - and currently, not necessarily in server code execution), a newly implemented feature in WAVSEP v1.2, and the one of most commonly implemented attack vector in web application scanners.
I didn't originally plan to assess the detection accuracy of RFI in this benchmark, however, since I implemented a new structure to wavsep that enables me to write a lot of test cases faster, I couldn't resist the urge to try it... and thus, found a new way to decrease the amount of sleep I get each night.
The interesting thing I found was that although RFI is supposed to work a bit differently than LFI/Path traversal, many LFI/Path traversal Plugins effectively detected RFI exposures, and in some instances, the tests for both of these vulnerabilities were actually implemented in the same plugin (usually named "file inclusions"); thus, while scanning for Traversal/LFI/RFI, I usually activated all the relevant plugins in the scanner, and low and behold - got results from the LFI/Path Traversal plugins that even the RFI dedicated plugins did not detect.
In order to assess the detection accuracy of different remote file inclusion exposures (again, RXSS/Phishing via RFI vectors), I designed a total of 108 remote file inclusion test cases.
The tests were deigned to emulate various combination of the following conditions and restrictions:



Just like the case of path traversal, In reality, there were supposed to be more XSS via RFI test cases, primarily because I intended to test injection entry points in which the input only affected the filename without the extension, or was injected directly into the directory name. However, due to the sheer amount of tests and the deadline I had for this benchmark, I decided to delete (literally) the test cases that handled these anomalies, and focus on test cases in which the entire filename/path was affected. That being said, I might publish these test cases in future versions of wavsep (they amount to dozens).

[Note: Although the tested versions of Appscan and Nessus contain RFI detection plugins, they did not support the detection of XSS via RFI.]

The comparison of the scanners' remote file inclusion detection accuracy is documented in detail in the following section of sectoolmarket:

Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories detected by the tool (which may result in more instances then what the bar actually presents, when compared to the detection accuracy bar).


The RFI (XSS via RFI) Detection Accuracy of Web Application Scanners – Commercial Tools



The RFI (XSS via RFI) Detection Accuracy of Web Application Scanners – Open Source & Free Tools



The RFI (XSS via RFI) Detection Accuracy of Web Application Scanners – Unified List


And after covering all those accuracy aspects, it's time to cover a totally different subject - Coverge.

12. Test VII - WIVET - Coverage via Automated Crawling
The seventh assessment criterion was the scanner's WIVET score, which is related to coverage.

The concept of coverage can mean a lot of things, but in general, what I'm referring to is the ability of the scanner to increase the attack surface of the tested application - to locate additional resources and input delivery methods to attack.

Although a scanner can increase the attack surface in a number of ways, from detecting hidden files to exposing device-specific interfaces, this section of the benchmark focuses on automated crawling and an efficient input vector extraction.

This aspect of a scanner is extremely important in point-and-shoot scans, scans in which the user does not "train" the scanner to recognize the application structure, URLs and requests, either due to time/methodology restrictions, or when the user is not a security expert that knows how to properly use manual crawling with the scanner.

In order to evaluate these aspects in scanners, I used a wonderful OWASP turkey project called WIVET (Web Input Vector Extractor Teaser); The WIVET project is a benchmarking project that was written by an application security specialist by the name of Bedirhan Urgun, and released under the GPL2 license.

The project is implemented as a web application which aims to "statistically analyze web link extractors", by measuring the amount of input vectors extracted by each scanner while crawling the WIVET website, in order to assess how well each scanner can increase the coverage of the attack surface.

Plainly speaking, the project simply measures how well a scanner is able to crawl the application, and how well can it locate input vectors, by presenting a collection of challenges that contain links, parameters and input delivery methods that the crawling process should locate and extract.

Although WIVET used to have an online instance, with my luck, by the time I decided to use it the online version was already gone... so I checked-out the latest subversion revision from the project's google code website (v3-revision148), installed FastCGI on an IIS server (Windows XP), copied the application files to a directory called wivet under the C:\Inetpub\wwwroot\ directory, and started the IIS default website.

In order for WIVET to work, the scanner must crawl the application while consistently using the same session identifier in its crawling requests, while avoiding the 100.php logout page (which initializes the session, and thus the results). The results can then be viewed by accessing the application index page, while using the session identifier used during the scan.

A very nice idea that makes the assessment process easy and effective, however, for me, things weren't that easy. Although some scanners did work properly with the platform, many scanners did not receive any score, even though I configured them exactly according to the recommendations (valid session identifier and logout URL exclusion), so after a careful examination, I discovered the source of my problem: some of the scanners don't send the predefined session identifier in their crawling requests (even though it's explicitly defined in the product), and others simply ignore URL exclusions (in certain conditions).

Since even without these bugs, not all the scanners supported URL exclusions (100.php logout page) and predefined cookies, I had to come up with a solution that will enable me to test all of them... so I changed the WIVET platform a little bit by deleting the link to the logout page (100.php) from the main menu page (menu.php), forwarded the communication of the vast majority of scanners through a fiddler instance, in which I defined a valid WIVET session identifier (using the filter features), and in extreme scenarios in which an upstream proxy was not supported by the scanner, defined the WIVET website as a proxy in an IE browser, loaded fiddler (so it will forward the communication to the system defined proxy - WIVET), defined burp as a transparent proxy that forwards the communication to fiddler (upstream proxy), and scanned burp instead of the WIVET application (the scanner will scan burp which will forward the communication to fiddler which will forward the communication to the system defined proxy - the WIVET website).

These solutions seemed to be working for most vendors, that is until I discovered two more bugs that caused these solutions not to work for another small group of products...

The first bug was related to the emulation of modern browser behavior when interpreting the relative context of links in a frameset (browsers use the link's target frame as the path basis, but some scanners used the path basis of the links origin page), and the other bug was related to another browser emulation issue - some scanners that did not manage to submit forms without an action property (while a browser usually submits such a form to the same URL that form originated from).

I managed to solve the first bug by editing the menu page and manually adding additional links with an alternate context  (added "pages/" to all URLs) to the same WIVET pages , while the second bug was reported to some vendors (and was handled by them).

Finally, some of the scanners had bugs that I did not manage to isolate in the given timeframe, and thus, I didn't manage to get any WIVET score for them (a list of these products will presented at the end of this section).
However, the vast majority of the scanners did got a score, which can be viewed in the following charts and links.

The comparison of the scanners' WIVET score is documented in detail in the following section of sectoolmarket:
http://sectoolmarket.com/wivet-score-unified-list.html

The WIVET Score of Web Application Scanners – Commercial Tools


The WIVET Score of Web Application Scanners – Free and Open Source Tools


The WIVET Score of Web Application Scanners – Unified List


It is important to clarify that due to these scanner bugs (and the current WIVET structure) - low scores and non-existing scores might differ once minor bugs are fixed, but the scores presented in this chart are currently all I can offer.

The following scanners didn't manage to get a WIVET score at all (even after all the adjustments and enhancements I tried), and although this does not mean that their score is necessarily low, or that there isn't any possible way to execute them in-front of WIVET, simply that there isn't a simple method of doing it (at least not one that I discovered):
Syhunt Mini (Sandcat Mini), Webcruiser, IronWASP, Safe3WVS free edition, N-Stalker 2012 free edition, Vega, Skipfish.
In addition, I didn't try scanning WIVET with various unmaintained scanners, scanners that didn't have a spider feature (WATOBO in the assessed version, Ammonite, etc), or with the following assessed tools: Nessus, sqlmap.
It's crucial to note that scanners with burp-log parsing features (such sqlmap and IronWASP) can effectively be assigned with the WIVET score of burp, that scanners with internal proxy features (such as ZAP, Burpsuite, Vega, etc) can be used with the crawling mechanisms of other scanners (such as Acunetix FE), and that as a result of both of these conclusions, any scanner that supports any of those features can be assigned the WIVET score of any scanner in the possession of the tester (by using the crawling mechanism of a scanner through a proxy such as burp, in order to generate scan logs).

13. Test VIII – Scanner Adaptability - Crawling & Scan Barriers
By using the seemingly irrelevant term "adaptability" in relation to scanners, I'm actually referring to the scanner's ability to adapt and scan the application, despite different technologies, abnormal crawling requirements and varying scan barriers, such as Anti-CSRF tokens, CAPTCHA mechanisms, platform specific tokens (such as required viewstate values) or account lock mechanisms.

Although not necessarily a measurable quality, the ability of the scanner to handle different technologies and scan barriers is an important perquisite, and in a sense, almost as important as being able to scan the input delivery method.

Reasoning: An automated tool can't detect a vulnerability in a point and shoot scenario if it is can't locate & scan the vulnerable location due to the lack of support in a certain a browser add-on, the lack of support for extracting data from certain non-standard vectors, or the lack of support in overcoming a specific barrier, such as a required token or challenge. The more barriers the scanner is able to handle, the more useful it is when scanning complex applications that employ the use of various technologies and scan barriers (assuming it can handle the relevant input vectors, supports the necessary features such as authentication, or has a feature that can be used to work around the specific limitations).

The following charts shows how many types of barriers does each scanner claim to be able to handle (these features were not verified, and the information currently relies on documentation or vendor supplied information):

The Adaptability Score of Web Application Scanners – Commercial Tools


The Adaptability Score of Web Application Scanners – Free and Open Source Tools


The Adaptability Score of Web Application Scanners – Unified List


The detailed comparison of the scanners support for various barriers is documented in detail in the following of sectoolmarket:



14. Test IX – Authentication and Usability Feature Comparison
Although supporting the authentication required by the application seems like a crucial quality, in reality, certain scanner chaining features can make-up for the lack of support in certain authentication methods, by employing the use of a 3rd party proxy to authenticate on the scanner's behalf.

For example, if we wanted to use a scanner that does not support NTLM authentication (but does support an upstream proxy), we could have defined the relevant credentials in burpsuite FE, and define it as an upstream proxy for the tested scanner.

However, chaining the scanner to an external tool that supports the authentication still has some disadvantages, such as potential stability issues, thread limitation and inconvenience.

The following comparison table shows which authentication methods and features are supported by the various assessed scanners:

15. Test X – The Crown Jewel - Results & Features vs. Pricing
Finally, after reading through all the sections and charts, and analyzing the different aspects  in which each scanner was measured, it's time to expose the price (at least for those of you that did manage to resist the temptation to access this link at the beginning).

The important thing to notice, specifically in relation to commercial scanner pricing, is that each product might be a bundle of several semi-independent products that cover different aspects of the assessment process, which are not necessarily related to the web application security. These products currently include web service scanners, flash application scanners and CGI scanners (SAST and IAST features were not included on purpose).

In short, the scanner price might reflect (or not) a set of products that might have been priced separately as an independent product.

Another issue to pay attention to is the type of license acquired. In general, I did not cover non commercial prices in this comparison, and in addition, did not include any vendor specific bundles, sales, discounts and sales pitches. I presented the base prices listed in the vendor website or provided to me by the vendor, according to a total of 6 predefined categories, which are in fact, combinations of the following concepts:
Consultant Licenses: although there isn't a commonly accepted term, I defined "Consultant" licenses as licenses that fit the common requirements of a consulting firm - scanning an unrestricted amount of IP addresses, without any boundaries or limitations.

Limited Enterprise Licenses: Any license that allowed scanning an unlimited but restricted set of addresses (for example - internal network addresses or organization-specific assets) was defined as an enterprise license, which might not be suited for a consultant, but will usually suffice for an organization interested in assessing its own applications.
Website/Year - a license to install the software on a single station and use it for a  single year against a single IP address (the exception to this rule is Netsparker, in which the per website price reflects 3 Websites).
Seat/Year - a license to install the software on a single station and use it for a single year.
Perpetual Licenses - pay once, and it's yours (might still be limited by seat, website, enterprise or consultant restrictions). The vendor's website usually includes additional prices for optional support and product updates.

The various prices can be viewed in the dedicated comparison in sectoolmarket, available in the following address:

It is important to remember that this prices might change, vary or be affected by numerous variables, from special discounts and sales to a strategic conscious decision of a vendors to invest in you as a customer or a beta testing site.

16. Additional Comparisons, Built-in Products and Licenses
While in the past I used to present additional information in external PDF files, with the new presentation platform I am now able to present the information in a media that is much easier to use and analyze. Although anyone can access the root URL of sectoolmarket and search the various sections on his own, I decided to provide a short summary of additional lists and features that were not covered in a dedicated section of this benchmark, but were still documented and published in sectoolmarket.

List of Tools
The list of tools tested in this benchmark, and in the previous benchmarks, can be accessed through the following link:
Additional Features
Complementary scan features that were not evaluated or included in the benchmark:
·         Complementary Scan Features
·         General Scanner Features

In order to clarify what each column in the report table means, use the following glossary table:
Title
Possible Values
Configuration & Usage Scale
Very Simple - GUI + Wizard
Simple - GUI with simple options, Command line with scan configuration file or simple options
Complex - GUI with numerous options, Command line with multiple options
Very Complex - Manual scanning feature dependencies, multiple configuration requirements
Stability Scale
Very Stable - Rarely crashes, Never gets stuck
Stable - Rarely crashes, Gets stuck only in extreme scenarios
Unstable - Crashes every once in a while, Freezes on a consistent basis
Fragile – Freezes or Crashes on a consistent basis, Fails performing the operation in many cases
Performance Scale
Very Fast - Fast implementation with limited amount of scanning tasks
Fast - Fast implementation with plenty of scanning tasks
Slow - Slow implementation with limited amount of scanning tasks
Very Slow - Slow implementation with plenty of scanning tasks

Scan Logs
In order to access the scan logs and detailed scan results of each scanner, simply access the scan-specific information for that scanner, by clicking on the scanner version in the various comparison charts:
·         http://sectoolmarket.com/

17. What Changed?
Since the latest benchmark, many open source & commercial tools added new features and improved their detection accuracy.

The following list presents a summary of changes in the detection accuracy of commercial tools that were tested in the previous benchmark (+new):
·         IBM AppScan - no significant changes, new results for Path Traversal and WIVET.
·         WebInspect - a dramatic improvement in the detection accuracy of SQLi and XSS (fantastic result!), new results for Path Traversal, RFI (fantastic result!), and WIVET (fantastic result!)
·         Netsparker - no significant changes, new results for Path Traversal and WIVET.
·         Acunetix WVS - a dramatic improvement in the detection accuracy of SQLi (fantastic result!) and XSS (fantastic result!), and new results for Path Traversal, RFI and WIVET.
·         Syhunt Dynamic - a dramatic improvement in the detection accuracy of XSS (fantastic result!) and SQLi, and new results for Path Traversal, RFI and WIVET.
·         Burp Suite - a dramatic improvement in the detection accuracy of XSS and SQLi (fantastic result!), and new results for Path Traversal and WIVET.
·         ParosPro - New results for Path Traversal and WIVET.
·         JSky - New results for RFI, Path Traversal and WIVET.
·         WebCruiser - No significant changes.
·         Nessus - a dramatic improvement in the detection accuracy of Reflected XSS, potential bug in the LFI/RFI detection features.
·         Ammonite - New results for RXSS, SQLi, RFI and Path Traversal (fantastic result!)
The following list presents a summary of changes in the detection accuracy of free and open source tools that were tested in the previous benchmark (+new):
·         Zed Attack Proxy (ZAP) – a dramatic improvement in the detection accuracy of Reflected XSS exposures (fantastic result!), in addition to new results for Path Traversal and WIVET.
·         IronWASP - New results for SQLi, XSS, Path Traversal and RFI (fantastic result!).
·         arachni – an improvement in the detection accuracy of Reflected XSS exposures (mainly due to the elimination of false positives), but a decrease in the accuracy of SQL injection exposures (due to additional false positives being discovered). There's also new results for RFI, Path Traversal (incomplete due to a bug), and WIVET.
·         sqlmap – a dramatic improvement in the detection accuracy of SQL Injection exposures (fantastic result!).
·         Acunetix Free Edition – a dramatic improvement in the detection accuracy of Reflected XSS exposures, in addition to a new WIVET result.
·         Syhunt Mini (Sandcat Mini) - a dramatic improvement in the detection accuracy of both XSS (fantastic result!) and SQLi. New results for RFI.
·         Watobo – Identical results, in addition to new results for Path Traversal and WIVET. The author did not test the latest Watobo version, which was released a few days before the publication of this benchmark.
·         N-Stalker 2012 FE – no significant changes, although it seems that the decreased accuracy is actually an unhandled bug in the release (unverified theory).
·         Skipfish –  insignificant changes that probably result from the testing methodology and/or testing environment. New results for Path Traversal, RFI and WIVET.
·         WebSecurify – a major improvement in the detection accuracy of RXSS exposures, and new results for Path Traversal and WIVET.
·         W3AF – a slight increase in the SQL Injection detection accuracy. New results for Path Traversal (fantastic result!), RFI and WIVET.
·         Netsparker Community Edition – New results for WIVET.
·         Andiparos & Paros – New results for WIVET.
·         Wapiti – New results for Path Traversal, RFI and WIVET.
·         ProxyStrike – New results for WIVET (Fantastic results for an open source product! again!)
·         Vega - New results for Path Traversal, RFI and WIVET.
·         Grendel Scan – New results for WIVET.

18. Initial Conclusions – Open Source vs. Commercial
The following section presents my own personal opinions on the results, and is not based purely on accurate statistics, like the rest of the benchmark.

After testing various versions of over 51 open source scanners on multiple occasions, and after comparing the results and experiences to the ones I had after testing 15 commercial ones (including tools tested in the previous benchmarks and tools I did not reported), I have reached the following conclusions:
·         As far as accuracy & features, the distance between open source tools and commercial tools is insignificant, and open source already rival, and in some rare cases, even exceed the capabilities of commercial scanners (and vice versa).

·         Although most open source scanners have not yet adjusted to support applications that use new technologies (AJAX, JSON, etc), recent advancement in the crawler of ZAP proxy (not tested in the benchmark, and might be reused by other projects), and the input vectors supported by a new project named IronWASP are a great beginning to the process. On the other hand, most of the commercial vendors already adjusted themselves to some of the new technologies, and can be used to scan them in a variety of models.

·         The automated crawling capability of most commercial scanners is significantly better than those of open source projects, making these tools better for point and shot scenarios... the difference however, is not significant for some open source projects, which can "import" or employ the crawling capabilities of the a free version of a commercial product (requires some experience with certain tools - probably more suited for a consultant then a QA engineer).

·         Some open source tools, even the most accurate ones, are relatively difficult to install & use, and still require fine-tuning in various fields, particularly stability. Other open source projects however, improved over the last year, and enhanced their user experience in many ways.

19. Verifying The Benchmark Results
The results of the benchmark can be verified by replicating the scan methods described in the scan log of each scanner, and by testing the scanner against WAVSEP v1.2 and WIVET v3-revision148.
The same methodology can be used to assess vulnerability scanners that were not included in the benchmark.
The latest version of WAVSEP can be downloaded from the web site of project WAVSEP (binary/source code distributions, installation instructions and the test case description are provided in the web site download section):

The latest version of WIVET can be downloaded from the project web site, or preferably, checked-out from the project subversion repository:
svn checkout http://wivet.googlecode.com/svn/trunk/ wivet-read-only

20. So What Now?
So now that we have all those statistics, it's time to analyze them properly, and see which conclusions we can get to. I already started writing a couple of articles that will make the information easy to use, and defined a methodology that will explain exactly how to use it. Analyzing the results however, will take me some time, since most of my time in the next few months will be invested in another project I'm working on (will be released soon), one I've been working on for the past year.

Since I didn't manage to test all the tools I wanted, I might update the results of the benchmark soon with additional tools (so you can think of it as a dynamic benchmark), and I will surely update the results in sectoolmarket (made some promises).

If you want to get notifications on new scan results, follow my blog or twitter account, and i'll do my best to tweet notification when I find the time to perform some major updates.

Since I have already been in the situation in the past, then I know what's coming… so I apologize in advance for any delays in my responses in the next few weeks, especially during august.

21. Recommended Reading List: Scanner Benchmarks
The following resources include additional information on previous benchmarks, comparisons and assessments in the field of web application vulnerability scanners:
·         "SQL Injection through HTTP Headers", by Yasser Aboukir (an analysis and enhancement of the 2011 60 scanners benchmark, with a different approach for interpreting the results, March 2012)
·         "The Scanning Legion: Web Application Scanners Accuracy Assessment & Feature Comparison", one of the predecessors of the current benchmark, by Shay Chen (a comparison of 60 commercial & open source scanners, August 2011)
·         "Building a Benchmark for SQL Injection Scanners", by Andrew Petukhov (a commercial & opensource scanner SQL injection benchmark with a generator that produces 27680 (!!!) test cases, August 2011)
·         "Webapp Scanner Review: Acunetix versus Netsparker", by Mark Baldwin (commercial scanner comparison, April 2011)
·         "Effectiveness of Automated Application Penetration Testing Tools", by Alexandre Miguel Ferreira and Harald Kleppe (commercial & freeware scanner comparison, February 2011)
·         "Web Application Scanners Accuracy Assessment", one of the predecessors of the current benchmark, by Shay Chen (a comparison of 43 free & open source scanners, December 2010)
·         "State of the Art: Automated Black-Box Web Application Vulnerability Testing" (Original Paper), by Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell (May 2010) – original paper
·         "Analyzing the Accuracy and Time Costs of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, February 2010)
·         "Why Johnny Can’t Pentest: An Analysis of Black-box Web Vulnerability Scanners", by Adam Doup´e, Marco Cova, Giovanni Vigna (commercial & open source scanner comparison, 2010)
·         "Web Vulnerability Scanner Evaluation", by AnantaSec (commercial scanner comparison, January 2009)
·         "Analyzing the Effectiveness and Coverage of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, October 2007)
·         "Rolling Review: Web App Scanners Still Have Trouble with Ajax", by Jordan Wiens (commercial scanners comparison, October 2007)
·         "Web Application Vulnerability Scanners – a Benchmark" , by Andreas Wiegenstein, Frederik Weidemann, Dr. Markus Schumacher, Sebastian Schinzel (Anonymous scanners  comparison, October 2006)

22. Thank-You Note
During the research described in this article, I have received help from plenty of individuals and resources, and I’d like to take the opportunity to thank them all.

I might be reusing the texts, due to the late night hour and the constant lack of sleep I have been through in the last couple of months, but I mean every word that is written here.

For all the open source tool authors that assisted me in testing the various tools in unreasonable late night hours and bothered to adjust their tools for me, discuss their various features and invest their time in explaining how I can optimize their use,
For the kind souls that helped me obtain evaluation licenses for commercial products, for the CEO's, Marketing Executives, QA engineers, Support and Development teams of commercial vendors, which saved me tons of time, supported me throughout the process, helped me overcome obstacles and proved to me that the process of interacting with a commercial vendor can be a pleasant one, and for the various individuals that helped me contact these vendors.
I can't thank you enough, and wish you all the best.

For the information sources that helped me gather the list of scanners over the years, and gain knowledge, ideas, and insights, including (but not limited to) information security sources such as Security Sh3ll (http://security-sh3ll.blogspot.com/), PenTestIT (http://www.pentestit.com/), The Hacker News (http://thehackernews.com/), Toolswatch (http://www.vulnerabilitydatabase.com/toolswatch/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Google (of course), Twitter (my latest addiction) and many others great sources that I have used over the years to gather the list of tools.

I hope that the conclusions, ideas, information and payloads presented in this research (and the benchmarks and tools that will follow) will contribute to all the vendors, projects and most importantly, testers that choose to rely on them.

23. FAQ - Why Didn't You Test NTO, Cenzic and N-Stalker?
Prior to the benchmark, I made an important decision. I decided to go through official channels, and either contact vendors and work with them, or use public evaluation versions of relatively simple products. I had a huge amount of tasks, and needed the support to cut the learning curve of understanding how optimize the tools. I was determined to meet my deadline, didn't have any time to spare, and was willing to make certain sacrifices to meet my goals.

As for why specific vendors were not included, this is the short answer:
NTO: I only managed to get in touch with NTO about two weeks before the benchmark publication. I didn't have luck contacting the guys I worked with in the previous benchmarks, but was eventually contacted by Kim Dinerman. She was nice and polite, and apologized for the time the process took. After explaining to her which timeframe they have for enhancing the product (an action performed by other commercial vendors as well, in order to prepare for the publically known tests of the benchmark), they decided that the timeframe and circumstances don't provide an even opportunity and decided not to participate.
I admit that by the time they contacted me, I was so loaded with tasks, that it was somewhat relieved, even though I was curious and wanted to assess their product. That being said, I decided prior to the benchmark that I will respect the decisions of vendors, even if will cause me to not get to a round scanner number.

N-Stalker: I finally received a valid N-Stalker license one day before the publication of the benchmark - a couple of days after the final deadline I had for accepting any tool. I decided to give it a shot, just in case it will be a simple process, however, with my luck, I immediately discovered a bug that prevented me from properly assessing the product and it's features, and unlike the rest of tests which were performed with a sufficient timeframe... this time, I had no time to find a workaround. I decided not to publish the partial results I had (I did not want to create the wrong impression or hurt anyone's business), and notified the vendor on the bug and on my decision.
The vendor, from his part, thanked me for the bug report, and promised to look up the issue. Sorry guys... I wanted to test them too... next benchmark.

Cenzic: the story of Cenzic is much simpler than the rest. I simply didn't manage to get in touch, and even though I did have access to a license, I decided prior to the benchmark not to take that approach. As I mentioned earlier, I decided to respect the vendor decisions, and not to assess their product without their support.

24. Appendix A – List of Tools Not Included In the Test
The following commercial web application vulnerability scanners were not included in the benchmark, due to deadlines and time restrictions from my part, and in the case of specific vendors, for other reasons.
Commercial Scanners not included in this benchmark
·         N-Stalker Commercial Edition (N-Stalker)
·         Hailstorm (Cenzic)
·         NTOSpider (NTO)
·         McAfee Vulnerability Manager (McAfee / Foundstone)
·         Retina Web Application Scanner (eEye Digital Security)
·         SAINT Scanner Web Application Scanning Features (SAINT co.)
·         WebApp360 (NCircle)
·         Core Impact Pro Web Application Scanning Features (Core Impact)
·         Parasoft Web Application Scanning Features (a.k.a WebKing, by Parasoft)
·         MatriXay Web Application Scanner (DBAppSecurity)
·         Falcove (BuyServers ltd, currently Unmaintained)
·         Safe3WVS 13.1 Commercial Edition (Safe3 Network Center)
The following open source web application vulnerability scanners were not included in the benchmark, mainly due to time restrictions, but might be included in future benchmarks:
Open Source Scanners not included in this benchmark
·         Vanguard
·         WebVulScan
·         SQLSentinel
·         XssSniper
·         Rabbit VS
·         Spacemonkey
·         Kayra
·         2gwvs
·         Webarmy
·         springenwerk
·         Mopset 2
·         XSSFuzz 1.1
·         Witchxtoolv
·         PHP-Injector
·         XSS Assistant
·         Fiddler XSSInspector/XSRFInspector Plugins
·         GNUCitizen JAVASCRIPT XSS SCANNER - since WebSecurify, a more advanced tool from the same vendor is already tested in the benchmark.
·         Vulnerability Scanner 1.0 (by cmiN, RST) - since the source code contained traces for remotely downloaded RFI lists from locations that do not exist anymore.
The benchmark focused on web application scanners that are able to detect either Reflected XSS or SQL Injection vulnerabilities, can be locally installed, and are also able to scan multiple URLs in the same execution.
As a result, the test did not include the following types of tools:
·         Online Scanning Services – Online applications that remotely scan applications, including (but not limited to) Appscan On Demand (IBM), Click To Secure, QualysGuard Web Application Scanning (Qualys), Sentinel (WhiteHat), Veracode (Veracode), VUPEN Web Application Security Scanner (VUPEN Security), WebInspect (online service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC – currently offline), Cloud Penetrator (Secpoint),  Zero Day Scan, DomXSS Scanner, etc.
·         Scanners without RXSS / SQLi detection features:
o   Dominator (Firefox Plugin)
o   fimap
o   lfimap
o   DotDotPawn
o   lfi-rfi2
o   LFI/RFI Checker (astalavista)
o   CSRF Tester
o   etc
·         Passive Scanners (response analysis without verification):
o   Watcher (Fiddler Plugin by Casaba Security)
o   Skavanger (OWASP)
o   Pantera (OWASP)
o   Ratproxy (Google)
o   etc
·         Scanners of specific products or services (CMS scanners, Web Services Scanners, etc):
o   WSDigger
o   Sprajax
o   ScanAjax
o   Joomscan
o   wpscan
o   Joomlascan
o   Joomsq
o   WPSqli
o   etc
·         Web Application Scanning Tools which are using Dynamic Runtime Analysis:
o   PuzlBox (the free version was removed from the web site, and is now sold as a commercial product named PHP Vulnerability Hunter)
o   Inspathx
o   etc
·         Uncontrollable Scanners - scanners that can’t be controlled or restricted to scan a single site, since they either receive the list of URLs to scan from Google Dork, or continue and scan external sites that are linked to the tested site. This list currently includes the following tools (and might include more):
o   Darkjumper 5.8 (scans additional external hosts that are linked to the given tested host)
o   Bako's SQL Injection Scanner 2.2 (only tests sites from a google dork)
o   Serverchk (only tests sites from a google dork)
o   XSS Scanner by Xylitol (only tests sites from a google dork)
o   Hexjector by hkhexon – also falls into other categories
o   d0rk3r by b4ltazar
o   etc
·         Deprecated Scanners - incomplete tools that were not maintained for a very long time. This list currently includes the following tools (and might include more):
o   Wpoison (development stopped in 2003, the new official version was never released, although the 2002 development version can be obtained by manually composing the sourceforge URL which does not appear in the web site- http://sourceforge.net/projects/wpoison/files/ )
o   etc
·         De facto Fuzzers – tools that scan applications in a similar way to a scanner, but where the scanner attempts to conclude whether or not the application or is vulnerable (according to some sort of “intelligent” set of rules), the fuzzer simply collects abnormal responses to various inputs and behaviors, leaving the task of concluding to the human user.
o   Lilith 0.4c/0.6a (both versions 0.4c and 0.6a were tested, and although the tool seems to be a scanner at first glimpse, it doesn’t perform any intelligent analysis on the results).
o   Spike proxy 1.48 (although the tool has XSS and SQLi scan features, it acts like a fuzzer more then it acts like a scanner – it sends payloads of partial XSS and SQLi, and does not verify that the context of the returned output is sufficient for execution or that the error presented by the server is related to a database syntax injection, leaving the verification task for the user).
·         Fuzzers – scanning tools that lack the independent ability to conclude whether a given response represents a vulnerable location, by using some sort of verification method (this category includes tools such as JBroFuzz, Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type of exposure that was verified were included in the benchmark (Powerfuzzer).
·         CGI Scanners: vulnerability scanners that focus on detecting hardening flaws and version specific hazards in web infrastructures (Nikto, Wikto, WHCC, st4lk3r, N-Stealth, etc)
·         Single URL Vulnerability Scanners - scanners that can only scan one URL at a time, or can only scan information from a google dork (uncontrollable).
o   Havij (by itsecteam.com)
o   Hexjector (by hkhexon)
o   Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)
o   Mysqloit (by muhaimindz)
o   PHP Fuzzer (by RoMeO from DarkMindZ)
o   SQLi-Scanner (by Valentin Hoebel)
o   Etc.
·         Vulnerability Detection Assisting Tools – tools that aid in discovering a vulnerability, but do not detect the vulnerability themselves; for example:
o   Exploit-Me Suite (XSS-Me, SQL Inject-Me, Access-Me)  
o   XSSRays (chrome Addon)
·         Exploiters - tools that can exploit vulnerabilities but have no independent ability to automatically detect vulnerabilities on a large scale. Examples:
o   MultiInjector
o   XSS-Proxy-Scanner
o   Pangolin
o   FGInjector
o   Absinth
o   Safe3 SQL Injector (an exploitation tool with scanning features (pentest mode) that are not available in the free version).
o   etc
·         Exceptional Cases
o   SecurityQA Toolbar (iSec) – various lists and rumors include this tool in the collection of free/open-source vulnerability scanners, but I wasn’t able to obtain it from the vendor’s web site, or from any other legitimate source, so I’m not really sure it fits the “free to use” category.