Security Tools Benchmarking: December 2010

Well, it’s finally done. What I originally thought will only take me a couple of days, and found myself doing for the past 9 months is finally ready for release, and it’s titled:

Web Application Scanners Accuracy Assessment

Freeware & Open Source Scanners

Comparison & Assessment of 43 Free & Open Source Black Box Web Application Vulnerability Scanners

By Shay Chen

Information Security Consultant, Researcher and Instructor

http://sectooladdict.blogspot.com/

sectooladdict -$at$- gmail -$dot$- com

December 2010

Assessment Environment: WAVSEP 1.0 (http://code.google.com/p/wavsep/)

Introduction

I’ve been collecting them for years, trying to get my hands on anything that was released within the genre. It started as a necessity, transformed into a hobby, and eventually turned into a relatively huge collection… But that’s when the problems started.

While back in 2005 I could barely find freeware web application scanners, by 2008 I had SO MANY of them that I couldn’t decide which ones to use. By 2010 the collection became so big that I came to the realization that I HAVE to choose.

I started searching for benchmarks in the field, but at the time, only located benchmarks the focused on comparing commercial web application scanners (with the exception of one benchmark that also covered 3 open source web application scanners), leaving the freeware & open source scanners in an uncharted territory;

http://www.virtualforge.de/index.php/en/library/white-papers/web-application-vulnerability-scanners-a-benchmark_en.html (Anonymous scanners)
http://anantasec.blogspot.com/2009/01/web-vulnerability-scanners-comparison.html (commercial scanners)
http://www.cs.ucsb.edu/~adoupe/static/black-box-scanners-dimva2010.pdf (mostly commercial, but including W3AF, paros and grendel-scan)
http://ha.ckers.org/files/Accuracy_and_Time_Costs_of_Web_App_Scanners.pdf (commercial scanners)

By 2010 I had over 50 tools, so I eventually decided to test them myself using the same model used in previous benchmarks (a big BIG mistake).

I initially tested the various tools against a vulnerable ASP.net web application and came to conclusions as to which tool is the “best”… and if it weren’t for my curiosity, that probably would have been the end of it and my conclusions might have mislead many more.

I decided to test the tools against another vulnerable web application, just to make sure the results were consistent, and arbitrarily selected “Insecure Web App” (a vulnerable JEE web application) as the second target… and to my surprise, the results of the tests against it were VERY different.

Some of the Tools that were efficient in the test against the vulnerable ASP.net application (which will stay anonymous for the time being) didn’t function very well and missed many exposures, while some of the tools that I previously classified as “useless” detected exposures that NONE of the other tools found.

After performing an in-depth analysis for the different vulnerabilities in the tested applications, I came to the conclusion that although the applications included a similar classification of exposures (SQL Injection, RXSS, Information disclosure, etc), the properties and restrictions in the exposure instances were VERY different in each application.

That’s when it dawned on me that the different methods that tools use to discover security exposures might be efficient for detecting certain common instances of a vulnerability while simultaneously being inefficient for detecting other instances of the same vulnerability, and that tools with “lesser” algorithms or different approaches (which might appear to be less effective at first) might be able to fill the gap.

So the question remains… Which tool is the best? Is there one that surpasses the others? Can there be only one?

I decided to find out…

It started as a bunch of test cases, and ended as a project containing hundreds of scenarios (currently focusing on Reflected XSS and SQL Injection) that will hopefully help in unveiling the mystery.

(A PDF version of this benchmark will be available shortly in the WAVSEP project home page at http://code.google.com/p/wavsep/)

Thank-You Note

Before I’ll describe project WAVSEP and the results of the first scanner benchmark performed using it, I’d like to thank all the tool developers and vendors that shared freeware & open source tools with the community over the years; if it weren’t for the long hours they’ve invested and the generosity they had to share their creations, then my job (and that of others in my profession) would have been much harder.

I’d like to express my sincere gratitude for Shimi Volkovich (http://il.linkedin.com/pub/shimi-volkovich/20/173/263), for taking the time to design the logo I’ll soon be using.

I would also like to thank all the sources that helped me gather the list of scanners over the years, including (but not limited to) information security sources such as PenTestIT (http://www.pentestit.com/), Security Sh3ll (http://security-sh3ll.blogspot.com/), Security Database (http://www.security-database.com/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Help Net Security (http://www.net-security.org/), Astalavista (http://www.astalavista.com/), Google (of course) and many others that I have neglected to mention due to my failing memory.

I hope that the conclusions, ideas, information and payloads presented in this research (and the benchmarks and tools that will follow) will benefit all vendors, and specifically help the open source community to locate code sections that all tool vendors could assimilate to improve their products; to that end I’ll try and contact each vendor in the next few weeks, in order to notify them on source codes that could be assimilated in their product to make it even better (on the basis of development technology and the license of each code section).

Phase I – The “Traditional” Benchmark

Testing the scanners against vulnerable training & real life applications.

As I mentioned earlier, In the initial phase of the benchmark, I have tested the various scanners in front of different vulnerable “training” applications (OWASP InsecureWebApp, a vulnerable .Net Application and a simple vulnerable application I have written myself), and tested many of them against real life applications (ASP.Net applications, Java applications based on Spring, Web application written in PHP, etc).

I decided not to publish the results just yet, and for a damn good reason which I did not predict in the first place; nevertheless, the initial process was very helpful because it helped me to learn about the different aspects of the tools: features, vulnerability list, coverage, installation processes, configuration methods, usage, adaptability, stability, performance and a bunch of other aspects.

I have found VERY interesting results that prove that certain old scanners might provide great benefits in many cases that many modern projects will not handle properly.

The process also enabled me to verify the support of the various tools in their proclaimed features (which I have literally done for the vast majority of the tools, using proxies, sniffers and other experiments), and even get a general measure of their accuracy and capabilities.

However, after seeing the results diversity in different applications and technologies, and after dealing with the countless challenges that came along the way, I have discovered several limitations and even a fundamental flaw in testing the accuracy, coverage, stability and performance of scanners in this manner (I have managed to test around 50 free and open source scanners by this point, as insane and unbelievable as this number might sound);

We may be able to estimate the general capabilities of a scanner from the amount of REAL exposures that it located, the amount of exposures that it missed (false negatives) and from the amount of FALSE exposures (false positives) it identified as security exposures, BUT on the other hand, the output of such a process will very much depend on the type of exposures that exist in the tested application, how much each scanner is adapted to the tested application technology and which private cases of exposures and barriers exist in the tested application.

A scanner that will be very useful for scanning PHP web sites might completely fail the task of scanning a ASP.Net web application, and a tool perfectly suited for that task might crash when faced with certain application behaviors, or be useless in detecting a private case of a specific vulnerability that is not supported by the tool.

I guess what I’m trying to say is this:

There are many forms and variations to each security exposure, and in order to prove my point, I’ll use the example of reflected cross site scripting;

Locations vulnerable to reflected cross site scripting might appear in many forms; they may require the attacker to send a whole HTML tag as a part of the crafted link, require the injection of an HTML event (in case the input-affected-output is printed in the context of a tag and the usage of tag-composing-characters is restricted), they may appear in locations vulnerable to SQL injection (and thus restrict the use of certain characters, or even require the usage of initial payloads that “disable” the SQL injection vulnerability first), require browser specific payloads or even direct injection of javascript/vbscript (in case the context is within a script tag, certain HTML events or even in the context of certain properties), and these cases are only a fragment of the whole list!

So, how can the tester know which of these cases is handled by each scanner from the figures and numbers presented in a general benchmark?

I believe he can’t. No matter how solid the difference appears, he really can’t.

Such information may allow him to root out useless tools (tools that miss even the most obvious exposures), and even identify what appears to be a significant difference in the accuracy of locating certain exposure instances, but the latter case might have been very different if the tested applications would have been prone to certain exposure instances that are the specialty of a different scanner, or would have included a technological barrier that requires a specific feature or behavior to bypass.

Thus, I have come to believe that the only way I could truly provide useful information to testers on the accuracy and coverage of freely available web application scanners is by writing detailed test cases for different exposures, starting with some core common exposures such as SQL Injection, cross site scripting and maybe a couple of others.

And thus, I have ended up investing countless nights in the development of a new test-case based evaluation application, designed specifically to test the support of each tool for detecting MANY different cases of certain common exposures.

The results of the original benchmark (against the vulnerable training web applications) will be published separately in a different article (since by now, many of them have been updated, and the results require modifications).

Phase II - Project WAVSEP

After documenting and testing the features of every free & open source web application scanner and scan script that I could get my hands on, I discovered that the most common features were Reflected Cross Site Scripting (RXSS) and SQL Injection (SQLi). I decided to focus my initial efforts on these two vulnerabilities, and develop a platform that could truly evaluate how good each scanner is in detecting them, which tool combinations provide the best results and which tool can bypass the largest amount of detection barriers.

Project WAVSEP (Web Application Vulnerability Scanner Evaluation Project) was implemented as a set of vulnerable JSP pages; each page implementing a unique test case.

A test case is defined as a unique combination of the following elements:

A certain instance of a given vulnerability.
Attack vectors with certain input origins (either GET or POST values, and in the future, also URL/path, cookie, various headers, file upload content and other origins).

Currently, only GET and POST attack vectors are covered, since most scanners support only GET and POST vectors (future versions of WAVSEP will include support for additional databases, additional response types, additional detection barriers, additional attack vector origins and additional vulnerabilities).

Project WAVSEP currently consists of the following test cases:

64 Reflected XSS test cases (32 GET cases, 32 POST cases -> 66 total vulnerabilities)
130 SQL Injection test cases, most of them implemented for MySQL & MSSQL (65 GET cases, 65 POST Vases -> 136 total vulnerabilities)

o The list of test cases includes vulnerable pages that respond with 500 HTTP errors, 200 HTTP Responses with erroneous text, 200 HTTP Responses with differentiation or completely identical 200 HTTP responses.

o 80 out of 136 cases are simple SQL injection test cases (500 & 200 erroneous HTTP responses), and 56 are Blind SQL Injection test cases (valid and identical 200 HTTP responses).

7 different categories of false positive Reflected XSS vulnerabilities (GET OR POST).
10 different categories of false positive SQL Injection vulnerabilities (GET OR POST).

Each exposure category in WAVSEP contains an index page with descriptions of different barriers in test cases, structures of a sample detection payloads and examples of such payloads.

A general description of each test case is also available in the following excel spreadsheet: http://code.google.com/p/wavsep/downloads/detail?name=VulnerabilityTestCases.xlsx&can=2&q=

Those that wish to verify the results of the benchmark can download the latest source code of project WAVSEP (including the list of test cases and their description) from the project’s web site:

http://code.google.com/p/wavsep/

Benchmark Overview

As mentioned before, the benchmark focused on testing free & open source tools that are able to detect (and not necessarily exploit) security vulnerabilities on a wide range of URLs, and thus, each tool tested needed to support the following features:

Either open source or free to use, so that open source projects and vendors generous enough to contribute to the community will benefit from the benchmark first.
The ability to detect Reflected XSS and/or SQL Injection vulnerabilities.
The ability to scan multiple URLs at once (using either a crawler/spider feature, URL/Log file parsing feature or a built-in proxy).
The ability to control and limit the scan to internal or external host (domain/IP).

As a direct implication, the test did NOT include the tools listed in Appendix A – A List of Tools Not Included In The Test.

The purpose of WAVSEP’s test cases is to provide a scale for understanding which detection barriers each scanning tool can bypass, and which vulnerability variations can be detected by each tool.

The Reflected Cross Site Scripting vulnerable pages are pretty standard & straightforward, and should provide reliable basis for assessing the detection capabilities of different scanners.

However, it is important to remember that the SQL Injection vulnerable pages used a MySQL database as a data repository, and thus, the SQL Injection detection results only reflect detection results of SQL Injection vulnerabilities in this type of database; the results that might vary when the back end data repository will be different (a theory that will be verified in the next benchmark).

Description of Comparison Tables

The list of tools tested in this benchmark is organized within the following reports:

List of Tested Scanners (http://wavsep.googlecode.com/files/List%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20v1.0.pdf)

Source, License and Technical Details of Tested Scanners (http://wavsep.googlecode.com/files/Details%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20v1.0.pdf)

For those of you that wish to get straight to the point, the results of the accuracy assessment are organized within the following reports:

Benchmark Results – Reflected XSS Detection Accuracy (http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20RXSS%20Detection%20Accuracy%20v1.0.pdf)

Benchmark Results – SQL Injection Detection Accuracy – Total (http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20v1.0.pdf)

Benchmark Drilldown – Blind SQL Injection Detection (http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20-%20Blind%20v1.0.pdf)

Benchmark Drilldown – Erroneous SQL Injection Detection

(http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20-%20ErrorBased%20v1.0.pdf)

Additional information was gathered during the benchmark, including information related to the different features of various scanners. These details are organized in the following reports, and might prove useful when searching for tools for specific tasks or tests:

Comparison of Active Vulnerability Detection Features (http://wavsep.googlecode.com/files/Active%20Vulnerability%20Detection%20Features%20Comparison%20v1.0.pdf)

Comparison of Complementary Scanning Features - Passive Analysis, CGI Scanning, Brute Force, etc (http://wavsep.googlecode.com/files/Complementary%20Scan%20Features%20Comparison%20v1.0.pdf)

Comparison of Usability, Coverage and Scan Initiation Features (http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%281%20of%203%29%20v1.0.pdf)

Comparison of Authentication, Scan Control and Connection Support Features (http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%282%20of%203%29%20v1.0.pdf)

Comparison of Advanced and Uncommon Features (http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%283%20of%203%29%20v1.0.pdf)

Information regarding the scan logs, list of untested tools and abnormal behaviors of scanners can be found in the article appendix sections:

The following appendix report contains a list of scanners that were not included in the test:

Appendix A – A List of Tools not included in the Test (The end of the article)

The scan logs (describing the executing process and configuration of each scanner) can be viewed in the following appendix report: Appendix B – WAVSEP Scanning Logs (http://wavsep.googlecode.com/files/WavsepScanLogs%20v1.0.pdf)

During the benchmark, certain tools with abnormal behavior were identified; the list of these tools is presented in the following appendix report:

Appendix C – Scanners with Abnormal Behavior (The end of the article)

List of Tested Scanners

The following report (PDF) contains the list of scanners tested in this benchmark, in addition to their version, their author and their status: http://wavsep.googlecode.com/files/List%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20v1.0.pdf

For those of you that want a quick glimpse, the following scanners were tested in the benchmark:

Acunetix Web Vulnerability Scanner (Free Edition), aidSQL, Andiparos, arachni, crawlfish, Gamja, Grabber, Grendel Scan, iScan, JSKY Free Edition, LoverBoy, Mini MySqlat0r, Netsparker Community Edition, N-Stalker Free Edition, Oedipus, openAcunetix, Paros Proxy, PowerFuzzer, Priamos, ProxyStrike, Sandcat Free Edition, Scrawler, ScreamingCSS, ScreamingCobra, Secubat, SkipFish, SQID (SQL Injection Digger), SQLiX, sqlmap, UWSS(Uber Web Security Scanner), VulnDetector, W3AF, Wapiti, Watobo, Web Injection Scanner (WIS), WebCruiser Free Edition, WebScarab, WebSecurify, WSTool, Xcobra, XSSer, XSSploit, XSSS, ZAP.

Source, License and Technical Details of Tested Scanners

The following report (PDF) contains a comparison of licenses, development technology and sources (home page) of different scanners: http://wavsep.googlecode.com/files/Details%20of%20Tested%20Web%20Application%20Vulnerability%20Scanners%20v1.0.pdf

Comparison of Active Vulnerability Detection Features

The following report (PDF) contains a comparison of active vulnerability detection features in the various scanners: http://wavsep.googlecode.com/files/Active%20Vulnerability%20Detection%20Features%20Comparison%20v1.0.pdf

Aside from the Count column (which represents the total amount of active vulnerability detection features supported by the tool, not including complementary features such as web server scanning and passive analysis), each column in the report represents an active vulnerability detection feature, which translates to the exposure presented in the following list:

SQL – SQL Injection

BSQL – Blind SQL Injection

RXSS – Reflected Cross Site Scripting

PXSS – Persistent / Stored Cross Site Scripting

DXSS – DOM XSS

Redirect – External Redirect / Phishing via Redirection

Bck – Backup File Detection

Auth – Authentication Bypass

CRLF – CRLF Injection / Response Splitting

LDAP – LDAP Injection

XPath – X-Path Injection

MX – MX Injection

Session Test – Session Identifier Complexity Analysis

SSI – Server Side Include

RFI-LFI – Directory Traversal / Remote File Include / Local File Include (Will be separated into different categories in future benchmarks)

Cmd – Command Injection / OS Command Injection

Buffer – Buffer Overflow / Integer Overflow (Will be separated into different categories in future benchmarks)

CSRF – Cross Site Request Forgery

A-Dos – Application Denial of Service / RegEx DoS

Comparison of Complementary Scanning Features

The following report (PDF) contains a comparison of complementary vulnerability detection features in the various scanners: http://wavsep.googlecode.com/files/Complementary%20Scan%20Features%20Comparison%20v1.0.pdf

In order to clarify what each column in the report table means, use the following interpretation:

Web Server Hardening – plugins that scan for HTTP method support (Trace, WebDAV), directory listing, Robots and cross-domain information disclosure, version specific vulnerabilities, etc.

CGI Scanning - Default files, common vulnerable applications, etc.

Passive Analysis – security tests that don’t require any actual attacks, and are based instead on information gathering and analysis of responses, including certificate & cipher tests, gathering of comments, mime type analysis, autocomplete detection, insecure transmission of credentials, google hacking, etc.

File Enumeration – directory and file enumeration features.

Comparison of Usability and Coverage Features

The following report (PDF) contains a comparison of usability, coverage and scan initiation features of different scanners:

http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%281%20of%203%29%20v1.0.pdf

Configuration & Usage Scale

Very Simple - GUI + Wizard

Simple - GUI with simple options, Command line with scan configuration file or simple options

Complex - GUI with numerous options, Command line with multiple options

Very Complex - Manual scanning feature dependencies, multiple configuration requirements

Stability Scale

Very Stable - Rarely crashes, Never gets stuck

Stable - Rarely crashes, Gets stuck only in extreme scenarios

Unstable - Crashes every once in a while, Freezes on a consistent basis

Fragile – Freezes or Crashes on a consistent basis, Fails performing the operation in many cases

(Unlike the accuracy values presented in the benchmark for W3AF, which are up date, the stability values for W3AF represent the condition of 1.0-RC3, and not 1.0-RC4; the values will be updated in the next benchmark, after the new version will be thoroughly tested)

Performance Scale

Very Fast - Fast implementation with limited amount of scanning tasks

Fast - Fast implementation with plenty of scanning tasks

Slow - Slow implementation with limited amount of scanning tasks

Very Slow - Slow implementation with plenty of scanning tasks

Comparison of Connection and Authentication Features

The following report (PDF) contains a comparison of connection, authentication and scan control features of different scanners:

http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%282%20of%203%29%20v1.0.pdf

Comparison of Advanced Features

The following report (PDF) contains a comparison of advanced and uncommon scanner features:

http://wavsep.googlecode.com/files/List%20of%20Scanner%20Features%20%283%20of%203%29%20v1.0.pdf

Benchmark Results – Reflected XSS Detection Accuracy

The results of the Reflected Cross Site Scripting (RXSS) benchmark are presented in the following report (PDF format):

http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20RXSS%20Detection%20Accuracy%20v1.0.pdf

The results only include vulnerable pages linked from the index-xss.jsp index page (RXSS-GET or RXSS-POST directories, in addition to the RXSS-FalsePositive directory). XSS Vulnerable locations in the SQL injection vulnerable pages were not taken into account, since they don’t necessarily represent a unique scenario (or at least not until the “layered vulnerabilities” scenario will be implemented).

Benchmark Results – SQL Injection Detection Accuracy

The overall results of the SQL Injection benchmark are presented in the following report (PDF format):

http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20v1.0.pdf

Benchmark Drilldown – Erroneous SQL Injection Detection

The results of the Error-Based SQL Injection benchmark are presented in the following report (PDF format):

http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20-%20ErrorBased%20v1.0.pdf

Benchmark Drilldown – Blind SQL Injection Detection

The results of the Blind SQL Injection benchmark are presented in the following report (PDF format):

http://wavsep.googlecode.com/files/Web%20Application%20Scanner%20SQLi%20Detection%20Accuracy%20-%20Blind%20v1.0.pdf

Initial Analysis & Conclusions

After performing an initial analysis on the data, I have come to a simple conclusion as to which combination of tools will be the most effective in detecting Reflected XSS vulnerabilities in the public (unauthenticated) section of a tested web site, while providing the least amount of false positives:

Netsparker CE (42 cases), alongside Acunetix Free Edition (38 cases, including case 27 which is missed by Netsparker), alongside Skipfish (detects case 12 which is missed by both tools). I’d also recommend executing N-Stalker on small applications since it able to detect certain cases that none of the other tested tools can (but the XSS scanning feature is limited to 100 URLs).

Using Sandcat or Proxy Strike alongside Burp Spider/Paros Spider/External Spider can help detect additional potentially vulnerable locations (cases 10, 11, 13-15 and 17-21) that could be manually verified by a human tester.

So combining four tools will give the best possible result of RXSS detection in the unauthenticated section of an application, using today’s free & open source tools… WOW, it took some time to get to that conclusion. However, scanning the public section of the application is one thing, and scanning the internal section (authenticated section) of the application is another; effectively scanning the authenticated section requires various features such as authentication support, URL scanning restrictions, manual crawling (in case damage might be caused from crawling certain URLs), etc; so the conclusions for the public section are not necessarily fit for the internal section.

During the next few days, I’ll try and analyze the results and come to additional conclusions (internal RXSS scanning, external & internal SQLi scanning, etc). Simply check my blog in a few days to see which conclusions were already published.

An updated benchmark document will be released in the WAVSEP project homepage after each addition, conclusion or change.

A comment about accuracy and inconsistent results

During the benchmark, I have executed each tool more than once, and on rare occasions, dozens of times. I have discovered that some of the tools have inconsistent results in certain fields (particularly SQL injection). The following tools produced inconsistent results in the SQLi detection field: Skipfish (my guess is the inconsistencies are related to crawling problems and connection timeouts), Oedipus, and probably a couple of others that I can’t remember.

It is important to note that the 100% Reflected XSS detection ratio that Sandcat and ProxyStrike produce comes with a huge amount of false positives, a fact that signifies that the detection algorithm works more like a passive scanner (such as watcher by casaba), and less like an active intelligent scanner that verifies that the injection returned is sufficient to exploit the exposure in the given scope. This conclusion does not necessarily pinpoint anything about other features of these scanners (for example, the SQL injection detection module of proxystrike is pretty decent), or presume that the XSS scanning features of these tools are “useless”; on the contrary, these tools can be used as means to obtain more leads for human verification, and can be very useful in the right context.

Furthermore, the 100% SQL Injection detection ratio of Wapiti needs to be further investigated since andiparos produced the same ratio when the titles of the various pages contained the word SQL (which is part of the reason that in the latest version of WAVSEP, this word does not appear anywhere).

Additional conclusions will follow.

So What Now?

So now that we have plenty of statistics to analyze, and a new framework for testing scanners, it’s time to discuss the next phases.

Although the calendar tells me that it took me 9 months to conduct this research, in reality, it took me a couple of years to collect all the tools, learn how to install and use them, gather everything that was freely available for more than 5 minutes and test them all together.

However, since my research led me to develop a whole framework for benchmarking (aside from the WAVSEP project which was already published), I believe (or at least hope) that thanks to the platform, future benchmarks will be much easier to conduct, and in fact, I’m planning on updating the content of the web site (http://sectooladdict.blogspot.com/) with additional related content on a regular basis.

In addition to different classes of benchmarks, the following goals will be in the highest priority:

Improve the testing framework (WAVSEP); add additional test cases and additional security vulnerabilities.
Perform additional benchmarks on the framework, and on a consistent basis. I'm currently aiming for one major benchmark per year, although I might start with twice per year, and a couple of initial releases that might come even sooner.
Publish the results of tests against sample vulnerable web applications, so that some sort of feedback on other types of exposures will be available (until other types of vulnerabilities will be implemented in the framework), as well as features such as authentication support, crawling, etc.
Gradually develop a framework for testing additional related features, such as authentication support, malformed HTML tolerance, abnormal response support, etc.
Integration with external frameworks for assessing crawling capabilities, technology support, etc.

I hope that this content will help the various vendors improve their tools, help pen-testers choose the right tool for each task, and in addition, help create some method of testing the numerous tools out there.

The different vendors will receive an email message from an email address designated for communicating with them. I urge them to try and contact me through that address, and not using alternative means, so I’ll be able to set my priorities properly. I apologize in advance for any delays in my responses in the next few weeks.

Appendix A – A List of Tools Not Included In the Test

The benchmark focused on web application scanners that are free to use (freeware and/or open source), are able to detect either Reflected XSS or SQL Injection vulnerabilities, and are also able to scan multiple URLs in the same execution.

As a direct implication, the test did NOT include the following types of tools:

· Commercial scanners - The commercial versions of AppScan, WebInspect, Cenzic, NTOSpider, Acunetix, Netsparker, N-Stalker, WebCruiser, Sandcat and many other commercial tools that I failed to mention. Any tool in the benchmark that holds the same commercial name is actually a limited free version of the same product, and does not refer (or even necessarily reflect on) the full product.
· Online Scanning Services – Online applications that remotely scan applications, including (but not limited to) Zero Day Scan, Appscan On Demand, Click To Secure, QualysGuard Web Application Scanning (Qualys), Sentinel (WhiteHat), Veracode (Veracode), VUPEN Web Application Security Scanner (VUPEN Security), WebInspect (online service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC – currently offline), etc.
· Scanners without RXSS / SQLi detection features, including (but not limited to):

o LFIMap

o phpBB-RFI Scanner

o DotDotPawn

o CSRF Tester

o etc

· Passive Scanners (response analysis without verification), including (but not limited to):

o Watcher (Fiddler Plugin by Casaba Security)

o Skavanger (OWASP)

o Pantera (OWASP)

o Rat proxy (Google)

o Etc

· Scanners for specific products or services (CMS scanners, Web Services Scanners, etc), including (but not limited to):

o WSDigger

o Sprajax

o ScanAjax

o Joomscan

o Joomlascan

o Joomsq

o WPSqli

o etc

· White box & Code Review Application Scan Tools, including (but not limited to):

o PuzlBox

o Inspathx

o etc

· Uncontrollable Scanners - scanners that can’t be controlled or restricted to scan a single site, since they either receive the list of URLs to scan from Google Dork, or continue and scan external sites that are linked to the tested site. This list currently includes the following tools (and might include more):

o Darkjumper 5.8 (scans additional external hosts that are linked to the given tested host)

o Bako's SQL Injection Scanner 2.2 (only tests sites from a google dork)

o Serverchk (only tests sites from a google dork)

o XSS Scanner by Xylitol (only tests sites from a google dork)

o Hexjector (by hkhexon) – also falls into other categories

o etc

· Deprecated Scanners - incomplete tools that were not maintained for a very long time. This list currently includes the following tools (and might include more):

o Wpoison (development stopped in 2003, the new official version was never released, although the 2002 development version can be obtained by manually composing the sourceforge URL which does not appear in the web site- http://sourceforge.net/projects/wpoison/files/ )

o etc

· De facto Fuzzers – tools that scan applications in a similar way to a scanner, but where the scanner attempts to conclude whether or not the application or is vulnerable (according to some sort of “intelligent” set of rules), the fuzzer simply collects abnormal responses to various inputs and behaviors, leaving the task of concluding to the human user.

o Lilith 0.4c/0.6a (both versions 0.4c and 0.6a were tested, and although the tool seems to be a scanner at first glimpse, it doesn’t perform any intelligent analysis on the results).

o Spike proxy 1.48 (although the tool has XSS and SQLi scan features, it acts like a fuzzer more then it acts like a scanner – it sends payloads of partial XSS and SQLi, and does not verify that the context of the returned output is sufficient for execution or that the error presented by the server is related to a database syntax injection, leaving the verification task for the user).

· Fuzzers – scanning tools that lack the independent ability to conclude whether a given response represents a vulnerable location, by using some sort of verification method (this category includes tools such as JBroFuzz, Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type of exposure that was verified were included in the benchmark (Powerfuzzer).
· CGI Scanners: vulnerability scanners that focus on detecting hardening flaws and version specific hazards in web infrastructures (Nikto, Wikto, WHCC, st4lk3r, N-Stealth, etc)
· Single URL Vulnerability Scanners - scanners that can only scan one URL at a time, or can only scan information from a google dork (uncontrollable).

o Havij (by itsecteam.com)

o Hexjector (by hkhexon)

o Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)

o Mysqloit (by muhaimindz)

o PHP Fuzzer (by RoMeO from DarkMindZ)

o SQLi-Scanner (by Valentin Hoebel)

o Etc.

· The following scanners:

o sandcatCS 4.0.3.0 - Since sandcat 4.0 free edition, a more advanced tool from the same vendor is already tested in the benchmark.

o GNUCitizen JAVASCRIPT XSS SCANNER - since WebSecurify, a more advanced tool from the same vendor is already tested in the benchmark.

o Vulnerability Scanner 1.0 (by cmiN, RST) - since the source code contained traces for remotely downloaded RFI lists from locations that do not exist anymore. I might attempt to test it anyway in the next benchmark.

o XSSRays 0.5.5 - I might attempt to test it in the next benchmark.

o XSSFuzz 1.1 - I might attempt to test it in the next benchmark.

o XSS Assistant - I might attempt to test it in the next benchmark.

· Vulnerability Detection Helpers – tools that aid in discovering a vulnerability, but do not detect the vulnerability themselves; for example:

o Exploit-Me Suite (XSS-Me, SQL Inject-Me, Access-Me)

o Fiddler X5s plugin

· Exploiters - tools that can exploit vulnerabilities but have no independent ability to automatically detect vulnerabilities on a large scale. Examples:

o MultiInjector

o XSS-Proxy-Scanner

o Pangolin

o FGInjector

o Absinth

o Safe3 SQL Injector (an exploitation tool with scanning features (pentest mode) that are not available in the free version).

o etc

· Exceptional Cases

o SecurityQA Toolbar (iSec) – various lists and rumors include this tool in the collection of free/open-source vulnerability scanners, but I wasn’t able to obtain it from the vendor’s web site, or from any other legitimate source, so I’m not really sure it fits the “free to use” category.

Appendix B – WAVSEP Scanning Logs

The execution logs, installation steps and configuration used while scanning with the various tools are all described in the following report (PDF format):

http://wavsep.googlecode.com/files/WavsepScanLogs%20v1.0.pdf

Appendix C – Scanners with Abnormal Behavior

During the assessment, parts of the source code of open source scanners and the HTTP communication of some of the scanners was analyzed; some tools behaved in an abnormal manner that should be reported:

· Priamos IP Address Lookup – The tool Priamos attempts to access “whatismyip.com” (or some similar site) whenever a scan is initiated (verified by channeling the communication through Burp proxy). This behavior might derive from a trojan horse that infected the content on the project web site, so I’m not jumping to any conclusions just yet.
· VulnerabilityScanner Remote RFI List Retrieval (listed in the scanners that were not tested, appendix A, developed by a group called RST, http://pastebin.com/f3c267935) – In the source code of the tool VulnerabilityScanner (a python script), I found traces for remote access to external web sites for obtaining RFI lists (might be used to refer the user to external URLs listed in the list). I could not verify the purpose of this feature since I didn’t manage to activate the tool (yet); in theory, this could be a legitimate list update feature, but since all the lists the tool uses are hardcoded, I didn’t understand the purpose of the feature. Again, I’m not jumping to any conclusions; this feature might be related to the tool’s initial design, which was not fully implemented due to various considerations. I’ll try and drill deeper in the next benchmark (and hopefully, manage to test the tool’s accuracy as well).

Although I did not verify that any of these features is malicious in nature, these features and behaviors might be abused to compromise the security of the tester’s workstation (or to incriminate him in malicious actions), and thus, require additional investigation to disqualify this possibility.

Sunday, December 26, 2010

Web Application Scanner Benchmark (v1.0)