WAVSEP 2013/2014 Score Chart:
The Web Application Vulnerability Scanners Benchmark
Commercial, SAAS & Open Source Scanners
An Accuracy, Coverage, Versatility, Adaptability,
Feature and Price Comparison of 63 Black Box Web Application Vulnerability Scanners and SAAS
Services
Part I
Information Security Researcher, Analyst, Tool Author and Speaker
February 2014
Assessment Environments: WAVSEP 1.5,
WIVET
v3-rev148, ZAP-WAVE (WAVSEP integration), various
undisclosed verification platforms
sectooladdict-{at}-gmail-{dot}-com
Table of Contents
1. Introduction
2. List of Tested Web Application Scanners
3. Benchmark Overview & Assessment Criteria
4. A Glimpse at the Results of the Benchmark
5. SURPRISE, SURPRISE!
6. How to Read and Use the Results - IMPORTANT
7. Test I - Scanner Versatility - Input Vector Support
8. Test II - WIVET - Coverage via Automated Crawling
9. Introduction to the Various Accuracy Assessments
10. Test III – The Detection Accuracy of Unvalidated
Redirect (NEW!)
11. Test IV – The Detection Accuracy of Backup/Hidden Files (NEW!)
12. Test V – The Detection Accuracy of Path Traversal/LFI
13. Test VI – The Detection Accuracy of RFI (XSS via RFI)
14. Test VII – The Detection Accuracy of Reflected XSS
15. Test VIII – The Detection Accuracy of SQL Injection
16. Test IX – Attack Vector Support – Counting Audit
Features
17. Test X – Scanner Adaptability - Crawling & Scan
Barriers
18. Test XI – Authentication and Usability Feature
Comparison
19. Test XII – The Crown Jewel - Results & Features vs.
Pricing
20. Additional Comparisons, Built-in Products and Licenses
21. What Changed?
22. Initial Conclusions – Open Source vs. Commercial
23. Verifying The Benchmark Results
24. So What Now?
25. Recommended Reading List: Scanner Benchmarks
26. Acknowledgments
27. Appendix A – List of Tools Not Included In the Test
Detailed Result Presentation at
Tools, Features, Results, Statistics and Price Comparison
(Delete Cache Prior to Viewing)
|
A Step by Step Guide for Choosing the Right Web
Application Vulnerability Scanner for *You*
|
It is fashionably late, but the time eventually came.
Months and months of research finally came to fruition with
the publication of the yearly WAVSEP benchmark, the fourth one in the
series.
It's been a very exciting year for the project… with many
new things happening.
I'd like to share some of those, as they can put in perspective
how the project is progressing:
I've noticed the project was included in many continues
integration processes of various commercial vendors, and lately, even in
similar processes of open source projects (for example – ZAP).
The same commercial vendors, as well as colleagues and people
I met in conferences around the world, brought to my attention that various government
institutes and agencies worldwide use the platform as an assessment platform for
vulnerability scanners, often as the main one.
I got contacted by many organizations in the financial and
technology sector that asked me to help them do the same, and found some time
to enhance the platform for that purpose.
I also received source code contributions from multiple
project and individuals, as well as support from volunteers, feedback, and
plenty of inspiration.
I even began receiving phone calls, and on multiple occasions,
from "angels", relevant companies and investors around the globe, that
wanted to know whether or not to invest in vulnerability detection initiatives
and products.
With all the support, contribution, and data collected in
this research over the years, I believe that soon a subject that still remained
obscure could finally be determined –
A simple process that will enable to evaluate the customized
ROI per product:
The Return of Investment (ROI) from each product in the
category
While were definitely not there yet, not in this article
anyway, with each publication, there's less and less missing pieces, and the
data collected while preparing for this publication closed a significant portion
of the gap.
The assessment covers 12 different aspects of the
tools (or 16, if you consider non competitive charts), including two new
attack vectors they were not assessed in the past (!), and this time, they were
all assigned with recommended priorities that readers can use for evaluation.
The research also managed to finally breach the
traditional level 60 cap (the best metaphor a gamer could come up with
at 5AM) and add three additional products to the assessment, to a total of 63
different web application vulnerability scanners, including some that were
never assessed in the past, and with potential to add more in the near future.
These include a total of 14 commercial products
and SAAS services, as well as 49 free and/or opensource
projects.
Following its tradition, the research focused on the main module
which is usually associated with term "web application vulnerability
scanner", and this time, it is in our interest to define this module
properly, as well as the difference between it and other modules that may be
associated to the same title.
Although the term "web application scanner" meant
different things over the years, I believe that dividing its various
functionalities into modules, can help understand the focus of this research,
as well as properly classify and evaluate the contribution of the various modules
in the future.
Since I didn't find any dominant classification, I am going
to use a descriptive one for the purpose of this research.
Black-Box web application scanners may contain any of the
following modules:
(*) Generic Application-Level Vulnerability Detection
Module: a collection of features that attempt to identify generic exposures
in the application layer, without prior knowledge about the application and its
structure, and while potentially overcoming barriers along the way. This
module is the primary focus of this research.
(*) Known Application-Level / Web Server Vulnerability
Detection Module: Commonly classified as a CGI scanner (a bit old school
for my taste), or a web server scanner, but often using the same classification
as the above module – the collection of features that falls under that category
attempts to identify vulnerabilities that are known (and/or were published) in
a shelf product. This module is NOT covered by this research
Additional modules may include "Generic Vulnerability
Exploitation Module", "Known Vulnerability Exploitation Module",
a "web site infection" detection module, and others. These too, are
not covered by this research, and although many of the tested projects/products
contain a couple of types, they are also often implemented in separate products.
So now that were done clarifying and classifying, as always,
one last tip:
A lot of the information gathered in this research cannot be
presented in graphs, so if you're seeking for the more significant content,
you'll have dig in past the charts and graphs. If you're reading 3 graphs and
can already declare a winner, you're missing some good stuff along the way.
Try the sections in the main menu with all the fancy words beside
them… they usually do the trick.
Update:
Update:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.
2. List of Tested Web Application Scanners
The following commercial scanners were covered
in the benchmark:
Acunetix WVS
v9.0
build 20140113 (Acunetix)
|
NTOSpider
v6.0
builds 773/778
(NT OBJECTives)
|
Netsparker
v3.1.7.0
(Netsparker Ltd, p.k.a
Mavituna Security)
|
IBM AppScan
v9.0.0.999 & v8.8.0.0
build 466 (IBM)
|
WebInspect
v10.1.177.0
SecureBase 4.11.00
(HP)
|
Syhunt Dynamic v5.0.0.7 RC2
(Syhunt)
|
Burp Suite
v1.5.20 (Portswigger)
|
N-Stalker Enterprise Edition X, build 10.13.11.31 (N-Stalker)
|
WebCruiser
v2.7.0 EE (Janus Security)
|
The following SAAS services were assessed in
the benchmark:
ScanToSecure - during
January 2014 (Netsparker Ltd)
|
The previous results of the following commercial
scanners were included in the benchmark, since they were not updated
since the previous benchmark (website):
The following commercial scanners will be updated
soon:
Nessus (Tenable Network Security) - Web Scanning Features
The latest versions of following free/open source
scanners were re-tested:
Zed Attack Proxy (ZAP) v2.2.2 (OWASP)
|
IronWASP v 0.9.7.4 (Lavakumar Kuppan)
|
W3AF
v1.6
revision 5460aa0377
(The W3AF team)
|
arachni
v 0.4.6
(Tasos
Laskos)
|
Skipfish
v2.10b
(Google)
|
WATOBO
v 0.9.19
(Andreas
Schmidt)
|
VEGA
v1.1 beta
build 108 (Subgraph)
|
Wapiti
v2.3.0
(Nicolas Surribas)
|
XSSer
v1.6-1
(OWASP)
|
Netsparker Community Edition v 3.1.6.0
(Netsparker
Ltd)
|
N-Stalker 2012
Free Edition v 10.13.11.31
(N-Stalker)
|
Syhunt Mini v4.4.3.0
(Syhunt)
(p.k.a Sandcat Mini)
|
New aspects of the following open source scanners
were tested in the benchmark:
Andiparos
v1.0.6
(Compass Security) |
Paros Proxy
v3.2.13
(Milescan)
|
The previous results of the following free
scanners were included in the benchmark, since they were not updated
since the previous benchmark (website):
Acunetix Free Edition v8.0-20120509 (Acunetix)
|
N-Stalker 2009 Free Edition v7.0.0.223
(N-Stalker)
|
WebSecurify
v0.9
latest free edition (GNUCITIZEN)
|
Sandcat Free Edition
v4.0.0.1 (Syhunt)
|
WebCruiser
v2.4.2 FE
|
JSKY Free Edition
v1.0.0
|
Scrawler
v1.0
(HP)
|
Safe3WVS
v10.1 FE (Safe3 Network Center)
|
The results of the following open source scanners
were included but not re-verified:
sqlmap
v1.0-Jul-5-2012 (Github) – already achieved mastery in its supported feature
|
DSSS (Damn Simple SQLi Scanner) v0.1h – 0.2h exists, will be tested in the
future
|
aidSQL
02062011 – newer version released in 2013-05-27, will be tested in the future
|
The results were compared to those of unmaintained
scanners tested in the past:
ProxyStrike
v2.2
|
Grendel Scan
v1.0
|
PowerFuzzer
v1.0
|
Oedipus
v1.8.1 (v1.8.3 is around somewhere)
|
Xcobra
v0.2
|
XSSploit
v0.5
|
UWSS (Uber Web Security Scanner) v0.0.2
|
Grabber
v0.1
|
WebScarab
v20100820
|
Mini MySqlat0r
v0.5
|
WSTool
v0.14001
|
crawlfish
v0.92
|
Gamja
v1.6
|
iScan
v0.1
|
LoverBoy
v1.0
|
openAcunetix
v0.1
|
ScreamingCSS
v1.02
|
Secubat
v0.5
|
SQID (SQL Injection Digger) v0.3
|
SQLiX
v1.0
|
VulnDetector
v0.0.2
|
Web Injection Scanner (WIS) v0.4
|
XSSS
v0.40
|
Priamos
v1.0
|
For a full list of commercial & open source tools that
were not tested in this benchmark, refer to the appendix.
3. Benchmark Overview & Assessment Criteria
The benchmark focused on testing commercial & open
source tools that are able to detect (and not necessarily exploit)
security vulnerabilities on a wide range of URLs, and thus, each tool tested
was required to support the following features:
(*)The ability to detect Reflected XSS and/or
SQL Injection and/or Path Traversal/Local File Inclusion/Remote File Inclusion
vulnerabilities.
(*)The ability to scan multiple URLs at once
(using either a crawler/spider feature, URL/Log file parsing feature or a
built-in proxy).
(*)The ability to control and limit the scan to
internal or external host (domain/IP).
The testing procedure of all the tools included the
following phases:
Feature Documentation
The features of each scanner were documented and compared,
according to documentation, configuration, plugins and information received
from the vendor. The features were then divided into groups, which were
used to compose various hierarchal charts.
Accuracy Assessment
The fact that a scanner supports a certain category of
tests, does not say anything on HOW WELL it is able to detect the
supported issues. The purpose of the accuracy assessment is to see how
effective each scanner is in detecting a variety of vulnerabilities, and to see
whether or not the detection logic "settles" for simple scenarios, or
covers a collection of common and advanced scenarios.
The scanners were all tested against the latest version of WAVSEP (v1.5), a benchmarking platform designed to assess the
detection accuracy of web application scanners, which was released alongside
the publication of this benchmark.
The purpose of WAVSEP’s test cases is to provide a scale for
understanding which detection barriers each scanning tool can bypass, and which
common vulnerability variations can be detected by each tool.
WAVSEP 1.5 added includes test cases from ZAP-WAVE, code
contributions from various volunteers and a collection of 250+ NEW
test cases for two new exposures: unvalidated redirect and obsolete/hidden
files.
The various scanners were tested against the following test
cases (GET/POST):
- 60 test cases that were vulnerable to Phishing via Unvalidated Redirect.
- 184 test cases that included Hidden, Obsolete and Backup files.
- 816 test cases that were vulnerable to Path Traversal attacks.
- 108 test cases that were vulnerable to (XSS via) Remote File Inclusion attacks.
- 66 test cases that were vulnerable to Reflected Cross Site Scripting attacks.
- 80 test cases that contained Error Disclosing SQL Injection exposures.
- 46 test cases that contained Blind SQL Injection exposures.
- 10 test cases that were vulnerable to Time Based SQL Injection attacks.
The various scanners were also tested against a variety
of false positive scenarios:
- 9 different categories of false positive Unvalidated Redirect vulnerabilities.
- 3 different categories of false positive Obsolete/Hidden/Backup files.
- 8 different categories of false positive Path Traversal / LFI vulnerabilities.
- 6 different categories of false positive Remote File Inclusion vulnerabilities.
- 7 different categories of false positive Reflected XSS vulnerabilities.
- 10 different categories of false positive SQL Injection vulnerabilities.
Overall, a collection of 1413 vulnerable test
cases for 6 different attack vectors, each test case
simulating a different and unique scenario that may exist in an
application.
Although the testing platform included a variety of
experimental test cases for similar and different vulnerabilities (DOM-XSS,
information disclosure issues, etc), these were not included in the scope of
the benchmark, and their results did not affect the final score.
Attack Surface Coverage Assessment
In order to assess the scanners attack surface coverage, the
assessment included tests that measure the efficiency of the scanner's
automated crawling mechanism (input vector extraction) , and feature
comparisons meant to assess its support for various technologies and its
ability to handle different scan barriers.
This section of the benchmark also included the WIVET test (Web Input Vector
Extractor Teaser v3-rev148), in which scanners were executed against a
dedicated application that can assess their crawling mechanism efficiency in
the aspect of input vector extraction. The specific details of this assessment
are provided in the relevant section.
Result Verification
In order to ensure the result consistency, the directory of
each exposure sub category was individually scanned multiple times using
various configurations (for the vast majority of tested products),
usually using a single thread and using a scan policy that only included the
relevant plugins.
In order to ensure that the detection features of each
scanner were truly effective, most of the scanners were tested against
an additional benchmarking application that was prone to the same
vulnerable test cases as the WAVSEP platform, but had a different
design, slightly different behavior and different entry point format, in order
to verify that no signatures were used, and that any improvement was due to the
enhancement of the scanner's attack tree.
Furthermore, in order to verify that all WIVET
results were reliable, the vast majority of tools were also tested
against an unpublished online version of WIVET that included additional enhancements
that prevent pre-adaptation to the platform URLs (http://wivet.webscantest.com/).
Finally, since the test was performed with the aid of
several volunteers, some results were verified by more than one person and on
multiple environments.
Making the Results Useful to Vendors
In order to help vendors understand which scenarios were "missed"
by their products, the list of identified test cases was documented in
detail, for each class of vulnerabilities, and the list of test cases that
were missed can be deducted from that list. Since WAVSEP contains detailed
documentation on each and every test case, this information can help vendors
identify their weaknesses and cover prominent scenarios.
Refer to the scan description section (click the version
link) of each scanner in http://www.sectoolmarket.com
to locate exactly which test cases were identified by each scanner.
Public tests vs. Obscure tests
In order to make the test as fair as possible, while still
enabling the various vendors to show improvement, the benchmark was divided
into tests that were publically announced, and tests that were obscure
to all vendors:
(*)Publically announced tests: the various feature comparisons,
the WIVET assessment and the detection accuracy assessment of the SQL
Injection, Reflected Cross Site Scripting, Path
Traversal/LFI and (XSS via) Remote File Inclusion were well
known to all vendors, and already published as a part of WAVSEP v1.2 (which was
available online for the last year and a half).
(*)Tests that were obscure to all vendors until
the moment of the publication: the detection accuracy assessment of the Unvalidated
Redirect and Obsolete/Hidden File Detection implemented as 256+
NEW test cases in WAVSEP 1.5 (a new version that was only
published alongside this benchmark).
The results of the main test categories are presented within
three graphs (commercial/SAAS graph, free & open source graph, unified
graph), and the detailed information of each test is presented in a dedicated
section in benchmark presentation platform at http://www.sectoolmarket.com.
4. A Glimpse to the Results of the Benchmark
This presentation of results in this benchmark, alongside
the dedicated result presentation website (http://www.sectoolmarket.com/) and a series of supporting articles and methodologies, are
all designed to help the reader to make a decision - to choose the
proper product/s or tool/s for the task at hand, within the borders of the time
or budget.
A summary of the most significant results can be seen in the
following links, and filtered according to the product license
(commercial/opensource):
Price & Feature Comparison of Commercial Scanners
Price & Feature Comparison of a Unified List of
Commercial, Free and Open Source Products
Some of the sections might not be clear to some of
the readers at this phase, especially since many of them contain new
conclusions and new results, which is why I advise both veterans and newcomers to
read the rest of the article, prior to analyzing this summary.
5. SURPRISE SURPRISE
Although on a general basis – the vast majority of product
improve their results from benchmark to benchmark, and this case is not different,
this benchmark also has an above -average amount of conflicting results.
More than a few tools that got high results in the previous
benchmarks categories, got lesser results in this one – in the same
categories, although nothing in the test environment has changed.
Furthermore, some of the new tests were met with… surprising
difficulty by the vast majority of the tools in the industry, leading me
to believe that many products in the industry had grown to a size which may be
challenging to maintain in the future years.
The overall problem is related to product testing and maintenance
– the fact that software bugs may cause a variety of crucial features not to function
for long periods of time, without anyone being aware of them.
The cost of the mitigating processes to the vendor (or lack
of to the consumer!) may be very high, and to the fact that it's very difficult
for the consumer to indentify such issues, especially on a periodic basis, can
have a major effect.
It's hard to avoid it… all you need to do is take a look at
a couple of the "new" charts, and even in some of the
"traditional" WAVSEP charts to notice this issue, which I will discuss
in details in some of the sections.
This phenomenon is something which I will probably analyze
in future publication, and should be a reason to be concerned, especially since
unless certain precautions will be taken, will probably become more severe with
time.
6. How to Read and Use the Benchmark Results
The practical reader, the one who wants to make use of the
information provided in this research to his advantage, can use the following
guidelines for interpreting the results, and the following steps to get to
practical decisions:
(*) Although it's tempting to look only at the tools at the
top, it's important to remember that insignificant differences in
results are just that – insignificant – and should be treated
accordingly. The benchmark can never cover every single scenario, and a few
percents don't always make a product better in a category (although plenty
of percents probably do). I would therefore recommend the reader that evaluates
a tool to figure out whether or not the tool has a good score in an
assessment and in general, instead of falling to the 100% percent trap. That
being said, a perfect score certainly isn't bad, so don't take it the other way
around either.
(*) When trying to figure which tool you should use, try the
following simple methodology:
1. Input Vector and Scan Barrier Support
Figure out if the input delivery method (Test I)
used by the application or applications you are using is supported by
the scanners you are evaluating. Do the same for the various security
mechanisms, technologies and scan barriers that are used in
the application (Text X). The scanner won't work at all, or will provide
little value if it won't support those.
[Note: pentesters should probably go for a tool that
supports enough of those, as the technological barriers they may encounter
vary, while other organization may use tools that support only what they need]
2. Crawling & Input Vector Extraction
If you use scanners mainly in a point-and-shoot scenario,
and prefer as much automation as possible, a high WIVET score will be the
second most important feature you should follow.
[Note: for the most part, most pentesters can deal
with a reasonable score as well, although a high one will certainly help, while
organizations and QA/DEV departments really need a tool with a high score in
this category – especially in 2014]
3. Vulnerability Detection Features and Accuracy
It's hard to say what's more important – so try and keep
those in balance. The more accurate and the more feature rich – the better.
Bear in mind that an accuracy difference of 1%, 5% or even 10% is NOT necessarily
significant, although larger differences might be.
4. Price
No point in buying a product that can't run, isn't automated
enough for you (in case you need it), isn't accurate at all (will only result
in extra work for you), or doesn't have enough features to justify the price,
but once all that out of the way, price is your next criterion. Bear in mind
that you can usually negotiate, and that from time to time, prices changes.
5. All the rest
Some features may be special, such as platform specific
capabilities, result documentation features, complementary features that can
make your life easier, configure your WAF, generate reports for you manager or
get you a free trip to mars.
Some of these features may even tip the scale on the expanse
of other features, but in the long run, try to stick to that order.
Also note that these are general guidelines, and that if
this choice is significant, you might want to consult with an expert to help
you evaluate which tools match your needs.
7. Test I - Versatility - Input Vector Support
As I mentioned in previous posts from 2012,
after investigating the field of DAST for the past five years, I consider the
scanner's support for the tested application input delivery method
to be the single MOST significant aspect in the selection process of any
scanner.
Reasoning:
the input delivery method (a.k.a the input vector) is the method
used by the HTML/Flash/Applet/Silverlight application to deliver user-originating
input from the client to the server.
These "formats" include common formats such as:
(*)Query String Parameters
(URL?param1=value1¶m2=value2)
(*)HTTP Body Parameters (param1=value1¶m2=value2)
And "modern" formats such as:
(*) JSON Arrays ({"param1":"value1","param2":"value2"})
(*) XML Elements and Attributes (element value )
These methods may also include binary delivery methods for
technology specific objects such as AMF, Java serialized objects and WCF, as
well as many other input delivery methods.
Since the majority of attacks rely on malicious
input being delivered through input parameters to the
application, a scanner that is not able to deliver those values to most of the
application server entry points WILL NOT be a good choice.
An automated tool can't detect vulnerabilities in a given
parameter, if it can't scan the protocol or mimic the application's method of
delivering the input.
In fact, lack of support for the dominant input vector used
by the application can make the scanner NEARLY USELESS for that specific
application (without demoting how useful it may be for other types of
applications).
While organizations that stick with specific development
technologies only need to verify that the scanner they use supports
the input delivery method used by their applications, since in 2013/2014
there is a vast collection of different input delivery methods, versatility
becomes a major issue for pentesters, and to some extent for
organization that rapidly develop applications in different technologies.
Although the position in this section charts don't
necessarily represent the most important score, it is the most
important perquisite for the scanner to comply with when scanning a
specific technology.
|
Therefore, the first assessment criterion of this benchmark
is the number of input vectors each tool can scan (not just
parse), which is a major component in the scanner versatility score.
Important Note
Although, it may seem logical that a scanner that supports
an input delivery method will do so consistently, some scanners support
for an input vector may be limited to SOME of the vulnerability
detection plugins, while the rest may be supported only for basic input
delivery methods.
I became aware of this condition after a thorough research,
and unfortunately, at the moment there is no sure way to verify which detection
capabilities of scanners are actually supported for each input vector, at least
not on a large scale, and for the vast majority of scanners.
Since WAVSEP test cases are implemented with either query string
or HTTP body parameters, only the support for these vectors was actually
verified, and the rest of the information in this section derives from a
thorough research that covered the vendor proclaimed results, source code (when
possible) and feature documentation.
Future versions of WAVSEP may include test cases to verify
the support of scanners for different input vectors.
Before viewing the charts that represent the versatility
of different vulnerability scanners, it may be a good time to mention interesting
features of two products which are related to this category.
This proclamation does not mean that the author takes a
stand as to which product is "the best" (a conclusion that anyone
who read my previous benchmarks knows very well not to expect), just that the
approach these products take to classify attacks, manage scan scope
and present the information to the user can be very beneficial in many
situations.
The products I refer to are NTOSpider and Acunetix,
and to some extent IronWASP, ZAP and Burp (and products
with similar features, in case I forgot any), each taking an interesting
approach to input vector support and scan scope management:
(*) NTOSpider enables the user to manage which
input vectors should be tested for each attack, therefore presenting which vectors
are supported for each attack, information which is very hard to
obtain from documentation:
(*) Acunetix presents which attacks are performed
per directory, schema, file, etc:
Other tools contained interesting features (with no
attack-per-vector info) that provided control over which input vectors
will be scanned:
IronWASP
input delivery method scope selection in scan wizard:
OWASP ZAP input delivery method scope selection in the
configuration window:
Similar features were verified in Burp Suite Pro,
and may exist in other products as well.
|
The more vectors of input delivery that the scanner
supports, the more versatile it is in scanning different technologies and
applications (assuming it can handle the relevant scan barriers, supports
necessary features such as authentication, or alternatively, contains features
that can be used to work around the specific limitations).
The detailed comparison of the scanners support for various
input delivery methods is documented in detail in the following section of
sectoolmarket: http://www.sectoolmarket.com/input-vector-support-unified-list.html
The following charts shows how versatile each scanner is in
scanning different input delivery vectors (and although not entirely comprehensive
- different technologies):
Result Update (29/03/2014): Appscan, ZAP and arachni reported support for additional input vectors AFTER the original benchmark publication (in the same tested versions). The current charts include these updates, alongside others.
Result Update (29/03/2014): Appscan, ZAP and arachni reported support for additional input vectors AFTER the original benchmark publication (in the same tested versions). The current charts include these updates, alongside others.
The Number of Input Vectors Supported – Commercial Tools & SAAS
Versatility of Open Source Scanners vs. Commercial Scanners in
2014
The vast majority of open source tools tested in 2012 (with
the exception of IronWASP) did not support vectors besides the basic
GET/POST/Header/Cookie vectors, making the task of using them against "modern"
applications that rely on JSON/XML/etc impractical.
However, as the graph proves, certain open source vendors
invested efforts in supporting additional input delivery methods in their
vulnerability scanning features, and thus, these scanners can be used effectively
against applications with "modern" input vectors and technologies.
Although this scenario is rare, and by no means
representative, the careful inspector will even identify
input delivery methods that are only supported by certain open source
projects (for example, ZAPs support for GWT), although the same goes the other
way around for many vectors supported by commercial vendors .
8. Test II - WIVET - Crawling Coverage
The second assessment criterion was focused on assessing crawling
coverage features, which included the various discovery methods used
to increase the attack surface of the tested application: to locate
additional resources and input delivery methods to attack.
Although scanners can increase the attack surface in a number
of ways, from detecting hidden files to exposing device-specific interfaces
(mobile, tablet, etc), this assessment was focused at assessing the automated
crawling capabilities and input vector extraction coverage
(as opposed to input vector scanning support measured in
the previous section) of the various scanners, and is primarily represented
using the scanner's WIVET score.
This aspect of a scanner is extremely important in
point-and-shoot scans, scans in which the user does not "train" the
scanner to recognize the application structure, URLs and requests, either due
to time/methodology restrictions, or when the user is not a security expert
that knows how to properly use manual crawling with the scanner.
Although users that can afford "training" the scanner
to recognize the URL and input sources in the application (by using it as a
proxy, for example) don't necessarily require enhanced crawling coverage, organizations
and individuals that prefer or require using the web
application scanner in an automated manner (point-and-shoot)
should consider the crawling coverage / input vector extraction to be of highest
importance, second only to the support of the scanner for testing
the necessary input delivery vectors.
|
As mentioned earlier, in order to evaluate these aspects in
scanners, I used a project called WIVET (Web Input Vector Extractor Teaser); The WIVET project is a
benchmarking project that was written by Bedirhan
Urgun, and released under the GPL2
license.
The project is implemented as a web application which aims
to "statistically analyze web link extractors", and measures the
amount of input vectors extracted by each scanner scanning the WIVET website.
Plainly speaking, the project simply measures how well a scanner is able to crawl the application, and how well can it locate input vectors, by presenting a collection of challenges that contain links, parameters and input delivery methods that the crawling process should locate and extract.
Plainly speaking, the project simply measures how well a scanner is able to crawl the application, and how well can it locate input vectors, by presenting a collection of challenges that contain links, parameters and input delivery methods that the crawling process should locate and extract.
In order for WIVET to work, the scanner must crawl the
application while consistently using the same session identifier in its
crawling requests, and while avoiding the 100.php logout page (which
initializes the session, and thus the results).
The results can then be viewed by accessing the application
index page, while using the session identifier used during the scan.
During the tests I used a variety of workarounds designed to
"assist" scanners with missing proxy/cookie customization features to
scan WIVET, usually by scanning a proxy that forwarded the communication to
WIVET while adding consistent session identifiers and restricting the access to
the logout page.
The scan configuration used with each scanner against WIVET
was documented in detail in the scanners "scan log", and the comparison
of the scanners' WIVET score is presented in the following section of
sectoolmarket:
http://sectoolmarket.com/wivet-score-unified-list.html
Result Update (29/03/2014): the impressive 96% result of Webinspect can be achieved by selecting the "depth first" mode in the scan wizard. The default option in the wizard is slightly less efficient, but still yields a great result that competes with the best result of any other scanner (94%).
Result Update (29/03/2014): the impressive 96% result of Webinspect can be achieved by selecting the "depth first" mode in the scan wizard. The default option in the wizard is slightly less efficient, but still yields a great result that competes with the best result of any other scanner (94%).
Due to technical difficulties and time constraints the WIVET
results of ScanToSecure are not yet included, it can be assumed
to have the same score of Netsparker, since this is the engine at its core.
The WIVET Score of Web Application Scanners – Free and Open
Source Tools
Although the scan success rate was much higher than in previous
years, still, some of the scanners were not able to scan this platform despite
all my efforts. The score of these projects will be updated as soon as they
enhance their crawling mechanisms enough to scan WIVET.
It's crucial to remind the reader that scanners with
burp-log parsing features (such sqlmap and IronWASP) can effectively be
assigned with the WIVET score of burp, and also that scanners with internal
proxy features (such as ZAP, Burp, etc) can be used with the crawling
mechanisms of other scanners (such as Netsparker CE).
Thus, any scanner that supports any of those features can be
artificially "enhanced" and assigned the WIVET score of any other scanner
in the possession of the tester.
9. Introduction to the Accuracy Assessments
The following sections presents the results of the detection
accuracy assessments performed for *Unvalidated
Redirect*, *Old,
Backup and Unreferenced Files*, *Path
Traversal / LFI*, *(XSS via)
Remote File Inclusion*, *Reflected
XSS* and *SQL
Injection*, six of the most commonly supported features in web
application scanners.
Since two of these assessments are *NEW* to this
yearly benchmark (the backup files and unvalidated redirect accuracy
assessments - which were not disclosed to the various vendors prior to the
publication of this benchmark), two more were new in the 2012 benchmark
(the path traversal/LFI and the remote file inclusion accuracy assessments),
and two existed in the benchmark from day one (SQL injection and reflected XSS)
– there's an interesting combination of results that can help assess the
overall scanner's performance.
Sure - the detection accuracy of a specific exposure might
not reflect the overall condition of the scanner on its own, but the careful
reader can go back and analyze previous benchmarks to identify
patterns, and as always, these results serve as a crucial indicator for how
good a scanner is at detecting specific vulnerability instances.
The various assessments were performed against the various
test cases of WAVSEP v1.5, which emulate different common test case
scenarios for generic technologies.
Reasoning:
a scanner that is not accurate enough will not be able to identify many
exposures, and might classify non-vulnerable entry points as vulnerable. These
tests aim to assess how good is each tool at detecting the vulnerabilities it
claims to support, in a supported input vector, which is located in a
known entry point, without any restrictions that can prevent the tool from
operating properly.
These accuracy assessments were also performed under optimal
conditions (or at least as optimal as we could create), since the purpose
was to see how well the detection logic functions, with no interference from
various barriers that can affect it in applications.
Such optimal conditions included scanning relatively small
groups of URLs, using a limited amount of threads, defining optimal
configuration entries (in some cases), and so on.
Therefore, to reproduce these results, it is necessary to
follow the exact instructions listed in the various scan logs included in sectoolmarket.
10. Test III - Unvalidated Redirect Detection
The third assessment criterion was the detection accuracy of
Unvalidated Redirect, a common exposure which is also a commonly
implemented feature in web application scanners, and most importantly, a NEW
TEST in WAVSEP which the vendors were not aware of prior to the
publication of this article.
It's also included in OWASP TOP 10 2010
and in OWASP
TOP 10 2013, and represents a continued effort to make WAVSEP as compliant
as possible with the various OWASP TOP 10 lists.
This score chart is different from the rest because unlike
the rest of the detection accuracy charts, it calculates the score only based
on QueryString/GET test cases, and does not take into account the HTTP POST
test cases.
The reason to include only GET test cases in the score
calculation is related to the properties of an unvalidated redirect attack:
It's essentially a phishing enhancing attack which relies
on web site redirection features that redirect the browser to user-controlled
addresses sent in the input. These attacks eventually redirect the user to an
attacker controlled website, while misleading even cautious users that verify
the domain address prior to accessing a link.
For example:
Original URL -
Abused URL -
A case could be made to state that since submitting
malicious redirect values in POST parameters requires the user to first access
an HTML form in an attacker controlled website, than there's no
point in performing this attack at all, since the user already
"trusted" the attackers website.
In fact, this statement is well ingrained in the
perception of many tool authors, which usually don't submit any redirect
payloads in POST parameters.
Several arguments can be made against that perception:
(*) Detecting persistent unvalidated redirect
attacks (like persistent XSS attacks) in which the payload is
"injected" into the database and affects other users, may very well
justify sending redirect payloads in POST parameters.
(*) Detecting session-hosted unvalidated redirect
attacks and pages in the actual website that embed externally
supplied URLs in a form that will later be submitted using POST may
justify performing POST tests as well.
Regardless of whether the argument is true or not, due to
the lack of support for POST unvalidated redirect tests in most of the tested
products, I decided not to include the POST test cases in this benchmark,
despite the fact that they are already included in WAVSEP, and despite the various
scenarios in which testing POST parameters with unvalidated redirect payloads
may lead to valid vulnerabilities (persistent redirect, session redirect,
reprinted redirect form, etc).
The POST test cases may however be included in the next
benchmark, in one way or the other, and the full results are already included
in the relevant scan logs of sectoolmarket.
|
In order to assess the detection accuracy of different unvalidated
redirect instances, I used a total of 30-60 test cases (for 302
redirection, and even for JS redirection). I also used a bunch of false positive
test cases, to see how permissive the detection process is.
The comparison of the scanners' unvalidated redirect
detection accuracy is documented in detail in the following section of
sectoolmarket:
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool (which may
result in more instances then what the bar actually presents, when compared to
the detection accuracy bar).
The Unvalidated Redirect Detection Accuracy of Commercial/SAAS
Scanners
The Unvalidated Redirect Detection Accuracy of
Opensource/Free Scanners
The Unvalidated Redirect Detection Accuracy of Scanners –
Unified List
11. Test IV - Backup/Hidden File Detection
The fourth assessment criterion was the detection accuracy
of Old, Backup and Unreferenced Files, a very common exposure,
that may lead to source code and configuration theft, which is also a commonly
implemented feature in web application scanners, and once again, a NEW TEST
in WAVSEP which the vendors were not aware of prior to the publication of this
article.
This is also the test in which the results are MOST
SURPRISING.
To make it clear, this test assessed the capabilities of
scanners to locate backup files with non-executable extensions, compressed
versions of files and directories that developers may have forgotten, sequential
files or copies of files and directories that are remnants of various
development tests, and additional hazards that may lead to source code
configuration disclosure.
For those of you that doubt the importance of this vector, it's
an exposure that as a pen-tester I personally abused to download the entire
source code of banks, e-commerce web sites, and credit card companies, obtained
connection strings and hard-coded credentials from obsolete source
code fragments and configuration files, as well as located numerous hidden
entry points that were vulnerable to exposures that the rest of the
application was not prone to.
What I'm trying to say is that while some instances of this
exposure may yield insignificant results, some severe instance could mean the
"game is over" for the application, and expose every server
side vulnerability or hidden credential to the attacker.
Back in the old days, I used a collection of tools and
lists to identify such issues;
I made heavy use of Sensepost's Wikto
with customized lists of files and extensions; I used the backup/hidden
file detection features of the earliest published version of W3AF to download
the source code of several banks, and from time to time, even suffered
through the false positives of the mythical Paros Proxy obsolete file
detection features.
However, since then, many open source and commercial tools
mastered those attacks, and tried to make the detection task easier.
But as the results obviously show,
something bad happened along the way, which is not necessarily related
to this specific vulnerability, as much as it is related to a major
problem that affects the entire automated vulnerability
detection industry.
Insufficient Implementation of
TDD
If there's any obvious conclusion that the reader can
conclude from this benchmark, this is probably it:
The is a serious problem in (and therefore insufficient
use of) implementations of TDD in the development of many web
application vulnerability scanners:
Test Driven
Development is a development process in which the software developers
invest efforts in writing unit tests for code modules, often even prior to
writing the modules themselves, and in which the build process of the product
uses these tests to verify the code modules function properly, and that there
aren't any unexpected behaviors.
TDD is usually very costly to implement, but in my
opinion, pays in the long run – and in many aspects.
Now don't get me wrong, I'm certain that almost all
vendors use TDD to some extent, however, after experiencing what I have in
this benchmark, I'm also certain its probably insufficient (at least for some
products).
And honestly, I find it very hard to blame the
vendors.
Allow me to elaborate:
There is a lot of competition in this product category,
and new features are often rushed to market as soon as possible. It also
takes a major effort to write unit-tests that include network
communication and scanning, and to review the results, even for a
single vulnerability detection plugin.
Although it makes sense that the same outcome could be
accomplished using traditional QA processes, which may very
well be true for small-mid scale projects, one need only to look at the insane
number of plugins and features in products like Qualys, Appscan,
Webinspect and W3AF, to understand the futility of leaving all
the testing to humans.
Imagine how much effort it would take to manually test
that 200 generic detection plugins function properly… Implementing
unit-tests for all those modules isn't a small investment as well.
And what about 50000 signature-based product
specific vulnerabilities? How long will it take to manually test that (or
develop unit-tests to verify) those features work?
During the testing process, I have seen plugins in several
tools which were actually named after the various extensions
of obsolete files I was trying to detect in WAVSEP, and still,
scanning the platform with some or all of them did not yield results for many
tools.
My assumption is that the same problem is also responsible
for the results of tools that got 100% in previous benchmarks, and got
different results in this bulk of tests, even though the testing framework
(WAVSEP/WIVET) did not include any changes in the test cases scanned.
My Assumption:
The various plugins and features are based on a scan
engine, and changes made to the engine (or plugins) may cause some of them to
malfunction.
Since there wasn't a unit test (or other pre/post build
test method) for those plugins, newer versions were released while those
plugins were not functioning, maybe even for years, and without
anybody knowing about it.
Not so scary when considering , let's say - small scale
projects,
But VERY scary when you consider a product
update that causes many plugins to malfunction in a scanner with 50000
plugins, which is released after the organization tested it
successfully and used it for years, and while the official recommendation of
the vendor was to install the update.
The vendor may never know, and the customer/user may only
discover the issue after vulnerabilities that the product was suppose to
identify will be exploited.
Customers that are currently not aware of a problem,
vendors that may never be, and entities that can abuse that problem
are a terrible combination… No malice intended.
|
In order to assess the detection accuracy of different old/backup/hidden
file instances, I used a total of 184 test cases (many of them simulating
files created in windows XP / windows 7 developer stations, as well as in
common Linux flavors such as Ubuntu, Debian and Fedora). I also used three main
groups of false positive behaviors - each representing real life scenarios that
vulnerability scanners can experience.
The comparison of the scanners' old, backup and unreferenced
files detection accuracy is documented in detail in the following section of
sectoolmarket:
Note: as
mentioned earlier, I saw various features in several of the tested tools that
were supposed to identify additional results, but for some reason did not
function. My current assumption (and that's all that is – my assumption) is
that the reason is related to bugs in the engine or the module of those tools.
As luck (or lack of) would have it, the same problem seemed
to persist for many vendors in that specific category of tests.
Disclaimer:
The results of OWASP ZAP in the obsolete file detection test
were obtained using an external ZAP extension called Good-Old-Files (GoF -
included in ZAP built-in marketplace).
The extension was written by a colleague of mine by the name
of Michal Goldstein, and was
originally inspired (to the previous extension authors) by various modules in W3AF.
She was not aware
of the benchmark, or to the fact I was assessing her project, and when I built
the testing platform, I used input from a collection of tools and sources to
build the benchmark test-bed, including GoF/W3AF.
Those of you that believe that might have affected the
testing process may feel free to ignore the results of that tool.
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool (which may
result in more instances then what the bar actually presents, when compared to
the detection accuracy bar).
The Old/Backup/Hidden File Detection Accuracy of
Commercial/SAAS Scanners
The Old/Backup/Hidden File Detection Accuracy of
Opensource/Free Scanners
The Old/Backup/Hidden File Detection Accuracy of Scanners –
Unified List
12. Test V - Path Traversal / LFI Detection
The fifth assessment criterion is identical to the previous
benchmark - the detection accuracy of Path Traversal (a.k.a Directory
Traversal), an assessment feature that was implemented in WAVSEP v1.2, and
tested in the 2012
benchmark for the first time.
It's also the third most commonly implemented attack vector
in web application scanners, and a significant attack vector in its own right.
Many scanners had a difficult time locating a variety of
traversal test cases in 2012, but this time, the results show a significant
improvement in the results of many of the tools, proving that many vendors
invested major efforts in improving their products.
Path Traversal vs. Local File Inclusion – Reminder
As I explained in the past, the reason Path Traversal was
tagged along with Local File Inclusion (LFI) is simple - many scanners don't
make the differentiation between inclusion and traversal, and furthermore, a
few online vulnerability documentation sources do. In addition, the results
obtained from the tests performed on the vast majority of tools lead to the
same conclusion - many plugins listed under the name LFI detected the path
traversal test cases.
While implementing the path traversal test cases in 2012 and
consuming nearly every relevant piece of documentation I could find on the
subject, I decided to take the current path, in spite of some acute
differences some of the documentation sources suggested (although I did
implemented an infrastructure in WAVSEP for "true" inclusion
exposures).
The point is not to get into a discussion of
whether or not path traversal, directory traversal and local file inclusion
should be classified as the same vulnerability, but simply to explain why in
spite of the differences some organizations / classification methods have for
these exposures, they were listed under the same name.
The evaluation was performed on a WAVSEP v1.2
instance that was hosted on a windows XP VM, and although there are specific
test cases meant to emulate servers that are running with a low privileged OS
user accounts (using the servlet context file access method), many of the
test cases emulate web servers that are running with administrative user
accounts.
[Note - in addition to the wavsep installation, to produce
identical results to those of this benchmark, a file by the name of
content.ini must be placed in the root installation directory of the tomcat
server- which is different than the root directory of the web server.
It’s also crucial to install WAVSEP on windows, and run
the tomcat server with administrative privileges, as some of the test cases
rely on windows-specific paths or require access to directories outside of
the web server scope]
|
In order to assess the detection accuracy of different path
traversal instances, I used a total of 816 path traversal test cases,
and a bunch of false positive test cases as well.
The comparison of the scanners' path traversal detection
accuracy is documented in detail in the following section of sectoolmarket:
Note:
During the testing of the development version of W3AF
(the latest stable I could get was 1.2 which was tested in 2012, and the
current development version was 1.6+) I experienced several bugs, specifically
bugs that prevented the scanner from scanning HTML forms submitted using HTTP
POST (or in short, POST parameters).
One of these bugs was related to the LFI/Path Traversal
detection plugin, which caused the scan to crash whenever it was used,
after detecting only a few vulnerable test cases.
I tried various methods to overcome that bug artificially,
but failed to do so, so I was not able to obtain the actual results of the
latest version of W3AF, and thus, decided to use the results from the
previous benchmark to represent it's score.
The bugs were reported to the project leader, and hopefully,
will be fixed in the future.
I had similar issues trying to use the various LFI/RFI
plugins of Qualys, and unfortunately, wasn't able to overcome them and
get an actual score by the publication of this benchmark (which is why Qualys
is absent from the LFI/RFI charts). I'm currently not sure if the reason is a
bug in product or in the configuration used during my testing process.
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool
(which may result in more instances then what the bar actually presents, when
compared to the detection accuracy bar).
Result Update (29/03/2014): The results of arachni were improved from 30.88% to 100% (!!!) according to vendor recommendations provided AFTER the original benchmark publication, by using the source code disclosure plugin, in addition to the local file inclusion and path traversal plugins, after verifying that the plugin behavior is relevant to the exposure (the name may deceive), and while using the same version.
The result of Webinspect were likewise improved from 72.06% to 91.18% by using a custom configuration provided by the vendor AFTER the original benchmark publication, using the same tested version, which included the following plugins:
i. 10287 – Local File Include
ii. 10271 – Local File Inclusion/Reading Vulnerability
iii. 10272 – Possible Local File Inclusion/Reading Vulnerability
iv. 11327 – LFI Tomcat
v. 11332 – LFI IIS
Result Update (29/03/2014): The results of arachni were improved from 30.88% to 100% (!!!) according to vendor recommendations provided AFTER the original benchmark publication, by using the source code disclosure plugin, in addition to the local file inclusion and path traversal plugins, after verifying that the plugin behavior is relevant to the exposure (the name may deceive), and while using the same version.
The result of Webinspect were likewise improved from 72.06% to 91.18% by using a custom configuration provided by the vendor AFTER the original benchmark publication, using the same tested version, which included the following plugins:
i. 10287 – Local File Include
ii. 10271 – Local File Inclusion/Reading Vulnerability
iii. 10272 – Possible Local File Inclusion/Reading Vulnerability
iv. 11327 – LFI Tomcat
v. 11332 – LFI IIS
The Path Traversal / LFI Detection Accuracy of Commercial /SAAS Scanners
13. Test VI - (XSS via) RFI Detection
The sixth assessment criterion was again, identical
to the 2012 benchmark - the detection accuracy of Remote File Inclusion
(or more accurately, vectors of RFI that can result in XSS or Phishing - and
currently, not necessarily in server code execution), an assessment suite
implemented in WAVSEP v1.2, which was tested in the 2012 benchmark for the
first time, with interesting results indeed.
A reminder - although in the 2012 benchmark several products
identified the vulnerable test cases properly, some products with RFI
detection features ignored it completely.
Obviously, 1.5 years after the 2012 publication, that's no
longer the case for the vast majority of vendors; the detection accuracy and
support for (XSS via) RFI was dramatically improved in many tools, and
we – the users – can reap the rewards in penetration tests.
In order to assess the detection accuracy of different
remote file inclusion exposures, I used a total of 108 (xss via) remote
file inclusion test cases, and as always, a bunch of false positive
cases that represent common scenarios.
The comparison of the scanners' (xss via) remote file
inclusion detection accuracy is documented in detail in the following section
of sectoolmarket:
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool
(which may result in more instances then what the bar actually presents, when
compared to the detection accuracy bar).
The (XSS via) RFI Detection Accuracy of Commercial/SAAS Scanners
The (XSS via) RFI Detection Accuracy of Commercial/SAAS Scanners
The (XSS via) RFI Detection Accuracy of Opensource/Free
Scanners
The (XSS via) RFI Detection Accuracy of Scanners -Unified
List
14. Test VII - Reflected XSS Detection
The seventh assessment criterion has been a part of the
yearly WAVSEP assessment for four years now (!), and the results of the various
vendors that maintain their tools emphasize that well.
As the title suggests, this section deals with the detection
accuracy of Reflected Cross Site Scripting, a very common
exposure which is the 2nd most commonly implemented feature in web application vulnerability
scanners.
The assessment was performed using 66 different
Reflected XSS test cases and a bunch of false positive test cases, and while ignoring
the results of the various experimental RXSS test cases included in WAVSEP 1.5
(although the "experimental" results are included in most of the
individual tools scan logs in sectoolmarket).
There's not much to say in this section that wasn't already
said in previous articles and benchmarks, except to present the current (and
generally IMPRESSIVE) results of the various maintained products /
projects.
The comparison of the scanners' reflected cross site
scripting detection accuracy is documented in detail in the following section
of sectoolmarket:
Note
Bugs in certain products seemed to affect their detection
accuracy for Reflected XSS, since in the past, these products obtained higher
results (notably arachni/W3AF).
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool (which may
result in more instances then what the bar actually presents, when compared to
the detection accuracy bar).
Note:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.
The Reflected XSS Detection Accuracy of Commercial/SAAS
Scanners
The Reflected XSS Detection Accuracy of Opensource/Free
Scanners
The Reflected XSS Detection Accuracy of Scanners - Unified List
15. Test VIII – SQL
Injection Detection Accuracy
15. Test VIII - SQL Injection Detection
The eight assessment criterion was the good old SQL
Injection detection accuracy, another assessment suite that's been with us for
the last four years (!) of WAVSEP benchmarks.
As one of the most famous exposures (and powerful attacks) and
the most commonly implemented attack vector in web application scanners, it's
also one of the aspects in which maintained projects showed the greatest
improvement over the years.
Although the release of WAVSEP 1.5 includes optional
vulnerable SQL injection test cases that were adjusted to support other
databases (such as MSSQL, ORACLE, etc – contributed due to the endless
generosity of the ZAP team members), due to time constraints, the evaluation
was only performed on an application that used MySQL 5.5.x as its data
repository, and thus, can only reflect the detection accuracy of the tool
when scanning an application that uses similar data repositories.
My assumption however, is that the detection results of
error-based test cases and behavior based test cases will be nearly identical
if the underlying database will be different, but that there will be a
difference for some of the tested tools in test cases that require time-based
detection methods (in which some scanners may not support using the
appropriate database-specific time delaying function).
|
The comparison of the scanners' SQL injection detection
accuracy is documented in detail in the following section of sectoolmarket:
Result Chart Glossary
Note that the GREEN bar represents the vulnerable test case
detection accuracy, while the RED bar
represents false positive categories detected by the tool
(which may result in more instances then what the bar actually presents, when
compared to the detection accuracy bar).
Note:
During the assessment of Qualys it is highly likely that an optimization mechanism affected the scan results of POST test cases (compared to WAVSEP 2012 results). Although in the case of other vendors disabling similar mechanisms solved the problem, in the case of Qualys this optimization mechanism could not be disabled via the configuration interface. We are currently trying to find solutions to the problem.
The SQL Injection Detection Accuracy of Commercial/SAAS
Scanners
The SQL Injection Detection Accuracy of Opensource/Free Scanners
The SQL Injection Detection Accuracy of Scanners – Unified
List
16. Test IX - Attack Vector Support
The ninth assessment criterion is the number of audit
features each tool supports.
For the purpose of the benchmark, an audit feature was
defined as a common generic application-level scanning feature,
supporting the detection of exposures which could be used to attack the tested
web application, gain access to sensitive assets or attack legitimate clients.
The definition of the assessment criterion rules out product
specific exposures and infrastructure related vulnerabilities, while unique and
extremely rare features were documented and presented in a different section of
this research, and were not taken into account when calculating the results.
Reasoning:
An automated tool can't detect an exposure without a code module designed to
identify the issue, and therefore, the number of audit features will affect
the type (and amount) of exposures that the tool will be able to detect
(assuming the audit features are implemented properly, that
vulnerable entry points will be detected, that the tool will be
able to handle the relevant scan barriers and scanning perquisites,
and that the tool will manage to scan the vulnerable input vectors).
Although I typically place the assessment of supported
audit features in a position of higher importance in the benchmark, my
current research led me to make some changes.
I still consider the amount of supported generic
vulnerability detection features (a.k.a audit plugins) to be a very
significant aspect, probably more than I ever did.
Unfortunately, I came to the conclusion the current list
that the WAVSEP project documents is like a drop in the ocean.
WAVSEP currently contains information on which scanners
are relatively more audit-feature rich – relative, as in
relation to other projects, not to the actual variety of attacks out there.
Although "relative" may still be very
useful to the consumer, in my opinion, it's not as useful to the industry as
I had hoped.
Originally, when I created the list of supported audit
plugins which is currently used (and covers 32 attack categories at
the moment), I composed it from the list of plugins that were commonly supported
by scanners at the time (2009-2010).
Although the list was somehow limited, and by no means
representative to the overall list of attacks that scanners should detect
(and hopefully would one day be able to detect), it was enough to represent
the differences between the products.
Five years passed – and many things changed.
Numerous new generic application-level attacks
were invented, published or re-classified.
Projects like CWE, CAPEC, OWASP Testing
Guide, Attacks
and Vulnerabilities,
WASC
and others added more and more attack classifications, and
that's without taking into account the numerous vectors that were published
in blogs, conferences and competitions, which often didn't get the attention
they deserved.
While the commonly implemented scanning features in
scanners were usually derived from feature demands, attack vectors
receiving higher levels of "popularity" and publicity,
vulnerabilities that the vendors (and to some extent the users) perceived
to be the most common or severe, and sometimes some vendor-specific "exotic"
vectors, there was never any roadmap that will classify to consumers what was
MORE important for vendors to support.
So after figuring that out, prior to the benchmark, I
decided to expand my list of attacks and vulnerabilities, so I could
properly map the contribution of the various tools against the overall
risk map, and during the research stages that preceded this
publication, I started researching which vectors that scanners can potentially
identify actually exist, and which of those are supported by the
individual scanners.
Well, it went pretty well…
In fact, it went so well that so far I classified 227
distinct application attacks;
227 attacks, not including multipliers due to
persistent/session/indirect states, and I'm not even done mapping and
classifying them.
Needless to say, that's a lot of mapping tasks for each
individual product.
In fact, the effort of classifying and prioritizing those
vectors while verifying which products supported them was so high,
that I had to postpone their publication, or else the research you are
currently reading might not have been published any time soon.
So, at the moment, this section describes the relative
support for various audit features, and the rest of the content collected
during the research will have to wait for another publication.
|
The detailed comparison of the scanners support for various
audit features is documented in detail in the following section of
sectoolmarket:
Note
The audit-feature count results of Webinspect may change in
the coming days due additional verification processes I'm currently conducting.
If eventually there are any changes, I will announce them using the comparison
dedicated twitter account: @sectoolmarket
The Number of Audit Features in Scanners – Commercial/SAAS
Tools
The Number of Audit Features in Scanners – Opensource/Free Tools
The Number of Audit Features in Scanners – Unified List
17. Test X - Adaptability - Scan Barriers
Applications may contain a variety of mechanisms and
technologies that could be pose a barrier to a scanner – and in fact,
effectively prevent it from being effective when scanning the application.
Scan barriers such as Anti-CSRF tokens, CAPTCHA mechanisms,
platform specific tokens (such as required viewstate values) or account lock
mechanisms have already become an integral part of many applications.
Complicated RIA client technologies such as Flash, Applets and Silverlight are
certainly not rare.
Although not necessarily a measurable quality, the ability
of the scanner to handle different technologies and scan barriers is an
important perquisite, and in a sense, almost as important as being able
to scan the input delivery method.
Reasoning:
An automated tool can't detect a vulnerability in a point and shoot scenario if
it is can't locate and scan the vulnerable location due to the lack of
support in a certain a browser add-on, the lack of support for extracting data
from certain non-standard vectors, or the lack of support in overcoming a
specific barrier, such as a required token or challenge. The more barriers the
scanner is able to handle, the more useful it is when scanning complex
applications that employ the use of various technologies and scan barriers.
The detailed comparison of the scanners support for various
barriers is documented in detail in the following of sectoolmarket:
The following charts show how many types of barriers each product
claims to be able to handle (note that many of these features were not
verified, and the information currently relies on documentation, research and
vendor supplied information):
The Adaptability Score of Commercial/SAAS Scanners
The Adaptability Score of Opensource/Free Scanners
The Adaptability Score of Web Application Scanners – Unified
List
18. Test XI - Authentication/Usability
Although supporting the authentication method required by
the application seems like a crucial quality (and certainly is a convenient
feature), in reality, certain scanner proxy chaining features can make-up
for the lack of support in most of the authentication methods, by employing the
use of a 3rd party proxy to authenticate on the scanner's behalf.
For example, if we wanted to use a scanner that does not
support NTLM authentication (but does support an upstream proxy), we could have
defined the relevant credentials in Burpsuite FE, and define it as an upstream
proxy for the scanner we intend to use.
However, chaining the scanner to an external tool that
supports the authentication still has some disadvantages, some of them major, such
as reduced performance, potential stability issues, thread
limitation and general inconvenience.
The following comparison table shows which authentication
methods and features are supported by the various assessed scanners:
19. Test XII - Results/Features vs. Pricing
The following assessment is in fact a summary of the
important results, in comparison to the product price and features.
This section will probably be the most useful section for
anyone looking to purchase a commercial or SAAS solution, or is debating
whether or not to use open source products instead.
As I mentioned in the introduction, since web application
scanners might actually be a bundle of several semi-independent
products (generic vulnerability scanner, known vulnerability scanner,
infection scanner, etc), it's very important to notice which modules are
included in each offer, especially in relation to commercial scanner pricing.
WAVSEP currently focuses on assessing the generic
vulnerability scanning module of web application scanners, and whatever it
is you're paying might be relative to the rest of the modules the
product contains (or does not contain), in case you actually need those.
In short, the scanner price might (or might not) reflect a
set of products that could probably have been priced separately as independent
products.
For your convenience, I invested some effort in mapping which
of these products contain additional modules, although some classification of
modules might still be missing.
The mapped modules include generic web-app
scanning modules, generic web service scanning modules, flash
application scanning modules and CGI scanning modules
(a.k.a web server scanning modules or known vulnerability scanning modules).
The mapped categories don't yet include SAST
and IAST scanning modules, Applet/Silverlight scanning modules, website
infection scanning modules and additional categories which may be mapped in the
future.
Another important issue to pay attention to is the type
of license acquired.
In general, I did not cover non commercial prices in
this comparison, and in addition, did not include any vendor specific bundles, sales,
discounts and sales pitches.
I presented the base prices listed in the vendor website or
provided to me by the vendors, according to a total of 6 predefined categories,
which are in fact, combinations of the following concepts:
Consultant Licenses:
although there isn't a commonly accepted term, I defined "Consultant"
licenses as licenses that fit the common requirements of a consulting firm -
scanning an unrestricted amount of IP addresses, without any boundaries or
limitations.
Limited Enterprise Licenses: Any license that allowed scanning an unlimited but
restricted set of addresses (for example - internal network addresses or
organization-specific assets) was defined as an enterprise license, which might
not be suited for a consultant, but will usually suffice for an organization
interested in assessing its own applications.
Website/Year
- a license to install the software on a single station and use it for one year
against a single IP address (the exception to this rule is Netsparker, in which
the price per website reflects 3 Websites).
Seat/Year
- a license to install the software on a single station and use it for a single
year.
Perpetual Licenses
- pay once, and it's yours (might still be limited by seat, website, enterprise
or consultant restrictions). The vendor's website usually includes additional
prices for optional support and product updates.
The various prices can be viewed in the dedicated comparison
in sectoolmarket, available in the following address:
It is important to remember that these prices might change,
vary or be affected by numerous variables, from special discounts and sales to
a strategic conscious decision of vendors to invest in you as a customer or as
a beta testing site.
20. Additional Comparisons
The following section contains additional information on the
tested tools that was documented throughout the research, and may be of use to
the reader.
List of Tools
The list of tools tested in this benchmark, and in the
previous benchmarks, can be accessed through the following link:
Additional Features
Complementary scan features that were not evaluated or
included in the benchmark:
In
order to clarify what each column in the report table means, use the following
glossary table:
Title
|
Possible Values
|
Configuration and Usage Scale
|
Very Simple -
GUI + Wizard
Simple - GUI
with simple options, Command line with scan configuration file or simple
options
Complex
- GUI with numerous options, Command line with multiple options
Very Complex
- Manual scanning feature dependencies, multiple configuration requirements
|
Stability Scale
|
Very Stable
- Rarely crashes, Never gets stuck
Stable -
Rarely crashes, Gets stuck only in extreme scenarios
Unstable
- Crashes every once in a while, Freezes on a consistent basis
Fragile –
Freezes or Crashes on a consistent basis, Fails performing the operation in
many cases
|
Performance Scale
|
Very Fast
- Fast implementation with limited amount of scanning tasks
Fast - Fast
implementation with plenty of scanning tasks
Slow - Slow
implementation with limited amount of scanning tasks
Very Slow
- Slow implementation with plenty of scanning tasks
|
Scan Logs
In order to access the scan logs and detailed scan results
of each scanner, simply access the scan-specific information for that scanner,
by clicking on the scanner version in the various comparison charts: http://sectoolmarket.com/
21. What Changed?
Since the latest benchmark, many open source and commercial tools added new features and improved their detection accuracy.
The following list presents a summary of changes in the
detection accuracy and coverage of Commercial tools that were tested in
the previous benchmark (+new):
(*) NTOSpider – NTOSpider last assessment took
place in 2011, and since then there has been a significant improvement
in all the test categories, as well as new results for tests not performed in
2011. It also came out FIRST in the WIVET category (along with 3
other products) and the XSS category (along with 10 others), and got
high scores in many others. The rankings it got for the new tests
(redirect/backup) were mixed.
(*) N-Stalker (Commerical Edition) – The
commercial edition of N-Stalker was assessed in this benchmark for
the first time. The only comparable result was to the XSS result of
the free version tested in 2012, and in that case, there was a significant
improvement. The rest of the results got it various ranks.
(*) Qualys – Qualys was first tested in 2012,
and since then. The WIVET score didn't change (still one of the highest),
and there are some new test results as well, but the SQL Injection and
Reflected XSS results are actually worse, due to what I currently
attribute to temporary bugs, either in the product or (less likely) my
testing procedure.
(*) ScanToSecure – Another new SAAS service which is
assessed for the first time, and got results that were almost identical to
those of Netsparker.
(*) Netsparker (Commercial Edition) – Netsparker
results were improved in almost every single category. The WIVET score was
slightly improved (one of the highest), it came out FIRST in the Reflected
XSS (along with 10 others) and Remote File Inclusion (along with 4
others) categories, dramatically improved the previous Local File Inclusion
results (one of the highest results), and got a great results in many
other tests. Like the vast majority of the products in the industry, its
results were somehow mixed in the new tests (backup/redirect).
(*)WebInspect – Webinspect
significantly improved its scores in various categories: It was the only winner
in the client/barrier coverage feature comparison, came out FIRST
in the WIVET assessment (along with 3 others), the Remote File
Inclusion (along with 4 others), Reflected XSS (along with 10
others) and SQL Injection (along with 4 others) categories, got
surprisingly high score in the new (and secret) Unvalidated
Redirect category (highest among commercial), and plenty of other high scores
in different categories, but didn't get a good score in the backup/hidden file
detection assessment.
(*) AppScan – AppScan too significantly improved its
scores in various categories: It was the only winner in the Local
File Inclusion and Supported Audit Features categories, got one of
the highest WIVET scores, came out FIRST in the SQL Injection (along
with 4 others), Reflected XSS (along with 10 others) and Remote File
Inclusion (along with 4 others) categories, got plenty of high scores in
other categories, but got mixed results in the new tests (backup/hidden files
and unvalidated redirect).
(*) Acunetix WVS (Commercial Edition) – Acunetix
slightly improved the results from the previous benchmarks, and got some very
interesting new results: it got the BEST SCORE in the NEW Backup/Hidden
Files category among commercial scanners (and some would argue, in total),
came out FIRST in WIVET (along with 3 others), SQL Injection
(along with 4 others), Reflected XSS (along with 10 others), got great
results in many other categories, but didn't get a good score in the new
unvalidated redirect category.
(*) Syhunt Dynamic – Syhunt dramatically
improved their WIVET score (came out FIRST, along with 3 others),
and slightly improved other scores as well (LFI, etc). They got a mixed result
when scanning backup/hidden files, and didn't have a plugin to scan unvalidated
redirect test cases (at least as far as I could tell).
(*) Burp Suite Pro – Burp is the undisputed
winner of the (overall) versatility category, was the only winner
in the input vector support category (followed closely by NTO, and less
closely by Appscan, Webinspect and IronWASP), got one of the highest
scores in detecting Backup/Hidden Files (relative), and decent
scores in many other categories. It also came out FIRST in the SQL
Injection (along with 4 others) and Reflected XSS (along with 10
others) categories, and dramatically improved its RFI score, but alas,
didn't get a good score in the WIVET test (same as last year).
(*) WebCruiser – No significant changes
compared to previous versions in the tested categories.
(*) ParosPro – was not retested, since no
updates were released, so it has identical results.
(*) JSky – was not retested, since no updates
were released, so it has identical results.
(*) Ammonite
– was not retested, since no updates were released, so it has identical
results.
The following list presents a summary of changes in the
detection accuracy and coverage of Opensource/Free tools that were
tested in the previous benchmark:
(*) ZAP – ZAP significantly improved almost
all of its results. It implemented a new AJAX crawling feature that dramatically
improved its WIVET score (highest among opensource) – but this
feature optional and requires time to use. It came out FIRST in the Reflected
XSS category (along with 10 others), got one of the highest scores in SQL
Injection, Remote File Inclusion and Local File Inclusion, as well as a decent
result in many others categories. If you take into account the external GoF
plugin, ZAP is also the winner of the Backup/Hidden file
detection category, although I'm leaving that interpretation to the reader. ZAP
however, didn't get a good score when tested against unvalidated redirect test
cases.
(*) IronWASP – Although IronWASP too had a new
AJAX crawling feature, it was released too late for me to test it properly, and
in my opinion, required a little more polishing (although rumors say it gets an
insane WIVET score). It did however, make a clean (and unexpected) take away by
being the only winner in the new and hidden Unvalidated Redirect
category, with an impressive score that detected test cases that no other tool
has. It also co-won the Reflected XSS category (along with 10
others), and got some great results in many other tests. Due to technical
difficulties, I still don't a WIVET score for it, but hopefully will have soon.
(*) Skipfish – Skipfish is back in the game. Although previous version were relatively buggy, the currently tested version
had very impressive results, notable result consistency (which unfortunately I
did not measure), and a dramatic improvement in almost every test
category I used it in. It got very impressive results in many categories, and
also a relatively very high results in the unvalidated redirect category.
(*) Vega – Vega was definitely a surprising
player in this benchmark. It came out FIRST in both Reflected XSS
(along with 10 others) and Remote File Inclusion (along with 4 others). It
got a fantastic WIVET result for an open source tool (the best opensource
result without using a visible browser – something no other opensource tool
with good result did – worth reusing for other java tools), and got very
impressive results in both the Local File Inclusion (although with
lots of false positives) and SQL Injection. Sadly enough, it didn't have
plugins for unvalidated redirect or backup/hidden files that I could test.
(*) Arachni – although anyone that will
install and use the latest version of arachni will immediately notice a
significant improvement in usability – a very impressive improvement if I might
add, and probably the most consistent behavior I saw – and unfortunately did
not measure (the idea behind the "AutoThrottle" feature is
very interesting – and probably responsible for some of the consistent results –
since it got the same results regardless of how many URLs it scanned – very
rare in this industry), a bug in the XSS plugins seemed to reduce its score in
that category in comparison to the previous assessment, and another bug caused
the backup/hidden file detection plugin not to function at all. It still came
out FIRST in the Remote File Inclusion test (along with 4
others), improved some other results, and got the third best score in
the NEW Unvalidated Redirect category (along with Webinspect), and also got
me thinking on how easy it is to start a new SAAS business just by using it.
(*) W3AF – The development version of W3AF had
several bugs that affected its score, and in fact, some results were actually
worse than the last benchmark (bugs were reported to the vendor). It did
however, still manage to surprise and get the best score for an opensource tool
in the Unvalidated Redirect category (and second best score in that
category in total), a relatively good result in the Backup/Hidden File
detection category, and a couple of other results that were impressive,
especially in the context of the open-source industry (wivet, features).
(*) WATOBO – WATOBO significantly improved
both its SQL Injection and Reflected XSS scores, got the same
scores in LFI, and got above average (relative) results in backup/hidden file
detection (which were generally bad to mediocre for most tools), but at the time
of the test did not have any RFI or Unvalidated Redirect features I could test.
(*) WAPITI – those who recall this tool which
got surprisingly high scores in previous benchmarks, would be delighted to know
that the project has been recently revived and that a new version was
released. It got relatively good results (impressive WIVET for an opensource
tool), as well as improvement in almost every category. It did however, have a
hard time with the Backup/Hidden file category in which it got a low score.
(*) N-Stalker 2012 FE Significantly improved
its Reflected XSS Score compared to the previous benchmark.
(*) Netsparker Community/Free Edition got some
slight improvements in some of its scores, and still has one of the best WIVET
scores for a free tool, but in the overall, there were no major changes
compared to the previous benchmark.
(*) SQLMap, WebSecurify, Acunetix FE and a couple of other
projects were not retested, and most of the features of Syhunt Mini, AndiParos
and Paros were not retested (although the latter three got some new results for
unvalidated redirect and backup/hidden files).
22. Opensource vs. Commercial - Insights?
The conclusions I have this year in relation to the open
source vs. commercial tools enigma are not as decisive compared to the previous
year.
Part of that is because I didn't yet completed all the
analysis processes I planned, and part of it because there really was a
significant improvement in the open source industry (and without taking lightly
the significant improvements that took place in the commercial section).
Projects such as ZAP and IronWASP started supporting
scanning input delivery methods of modern web applications, including
JSON/AJAX, XML, and even nearly unique vectors such as OData and GWT, that even
most commercial vendors don't support.
Projects like W3AF have long ago been almost as
feature rich as Webinspect and Appscan (although they still lack stability),
Vega is coming closer to having a crawling mechanism that can produce similar results
to that of a commercial vendor, and if I were Qualys (or any other cloud
vendor), I would watch the Improvement of the Arachni project CLOSELY.
Seriously – Install it and give it a shot… The results don't emphasize the
maturity level it got to.
However, in sheer numbers, as an overall solution, most open
source tools still lag a bit behind some of the major commercial
players, at least if you take into account all the categories… although I admit
that I don't say that with the same confidence as I did before, and I believe
that further analysis is required to get to a practical conclusion.
23. Verifying the Benchmark Results
The results of the benchmark can be verified by replicating
the scan methods described in the scan log of each scanner (accessible in sectoolmarket through the version link
of each product), and by testing the scanner against WAVSEP v1.5 (obtained from the sourceforge WAVSEP repository)
and WIVET v3-revision148.
The same methodology can be used to assess vulnerability
scanners that were not included in the benchmark.
24. So What's Next?
During this research, which I have been conducting for the
past 18 months or so (7 of those just to gather the results you are currently
seeing), I gathered a ton of information.
Due to my consistently tight schedule, too many adventurous endeavors
and the fact that I didn't want to delay the publication any longer, I didn't
publish A LOT of content that was gathered, so in the next couple of weeks I'm
going to try and wrap it up so it could come to fruition ASAP…in my opinion,
the conclusions from the unpublished content can be very interesting for the
technological trends in this industry.
The benchmark was branded as part I for a reason, and
although I might add the results of additional products soon, in the upcoming
weeks, I plan to focus on trying to see how much effort will be required to
release part II, which will have a very different result format compared
to the typical WAVSEP benchmark.
25. Recommended Read-List: Benchmarks
The following resources include additional information on
previous benchmarks, comparisons and assessments in the field of web
application vulnerability scanners:
(*) "HackMiami Web
Application Scanner 2013 PwnOff", by James Ball, Alexander Heid, Rod
Soto (a comparison of 5 web application scanners published at the HackMiami
2013 conference).
(*) "Top
10: The Web Application Vulnerability Scanners Benchmark, 2012", one
of the predecessors of the current benchmark, by Shay Chen (a comparison of 60
commercial and open source scanners, July 2012)
(*)"Enemy
of the State: A State-Aware Black-Box Web Vulnerability Scanner", by Adam
Doup´e, Ludovico Cavedon, Christopher Kruegel, and Giovanni Vigna (a comparison
of 3 scanners published in 2012).
(*)"SQL Injection through HTTP Headers", by Yasser Aboukir (an analysis and enhancement of the 2011
60 scanners benchmark, with a different approach for interpreting the results,
March 2012)
(*)"The Scanning Legion: Web Application Scanners Accuracy
Assessment & Feature Comparison",
one of the predecessors of the current benchmark, by Shay Chen (a comparison of 60
commercial and open source scanners, August 2011)
(*)"Building a Benchmark for SQL Injection Scanners", by Andrew Petukhov (a commercial and opensource scanner
SQL injection benchmark with a generator that produces 27680 (!!!) test
cases, August 2011)
(*)"Webapp
Scanner Review: Acunetix versus Netsparker", by Mark Baldwin
(commercial scanner comparison, April 2011)
(*)"Effectiveness of
Automated Application Penetration Testing Tools", by
Alexandre Miguel Ferreira and Harald Kleppe (commercial and freeware scanner
comparison, February 2011)
(*)"Web Application Scanners Accuracy Assessment", one of the predecessors of the current benchmark, by
Shay Chen (a comparison of 43
free and open source scanners, December 2010)
(*)"State of the Art: Automated Black-Box Web Application
Vulnerability Testing" (Original Paper),
by Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell (May 2010) – original
paper
(*)"Analyzing the Accuracy and Time Costs of Web Application
Security Scanners", by Larry Suto (commercial
scanners comparison, February 2010)
(*)"Why Johnny Can’t Pentest: An Analysis of Black-box Web
Vulnerability Scanners", by
Adam Doup´e, Marco Cova, Giovanni Vigna (commercial and open source scanner
comparison, 2010)
(*)"Web Vulnerability Scanner Evaluation", by AnantaSec (commercial scanner comparison, January
2009)
(*)"Analyzing the Effectiveness and Coverage of Web Application
Security Scanners", by Larry Suto (commercial
scanners comparison, October 2007)
(*)"Rolling
Review: Web App Scanners Still Have Trouble with Ajax", by Jordan Wiens (commercial scanners comparison,
October 2007)
(*)"Web Application Vulnerability Scanners – a Benchmark" , by Andreas Wiegenstein, Frederik Weidemann, Dr.
Markus Schumacher, Sebastian Schinzel (Anonymous scanners comparison, October
2006)
26. Acknowledgements
While performing the research described in this article, I
have received help from plenty of individuals and resources, and I’d like to
take the opportunity to acknowledge them all.
To the researchers Ozhan
Sisic and Sharath
Unni which contributed content and
results to the assessment, and did so at the expense of their own time, in
dense timeframes, and often in unreasonable hours and timeframes.
To the various additional volunteers that did their best to
assist me whenever they could, especially to the ones that chose to stay
anonymous.
To the various members at Denim Group, and especially Dan Cornel, which assisted
throughout the project, adapted their excellent platform Threadfix to fit my needs,
and enabled me to handle nearly unreadable results and share
information with volunteers that participated in the tests around the
world.
For the various entities and projects that contributed code
to WAVSEP, including (but not limited to) the various authors of the ZAP project and Lavakumar Kuppan from the IronWASP project.
To Dan
Kuÿkendall from NTOBJECTives who permitted me to use their online enhanced adaptation
of WIVET as an additional verification mechanism for the local WIVET results.
For all the open source tool authors that
assisted me in testing the various tools in unreasonable late night hours and
bothered to adjust their tools for me, discuss their various features and
invest their time in explaining how I can optimize their use,
To the CEO's, Product Managers, Marketing Executives,
QA engineers, Support Personal and Development teams of commercial
vendors, which saved me tons of time, supported me throughout the process,
helped me overcome obstacles and made my experience a pleasant one.
To the various information sources that helped me gather the
list of scanners over the years, spread the news about the previous benchmarks,
and gain knowledge, ideas, and insights, including (but not limited to) information
security sources such as Security Sh3ll (http://security-sh3ll.blogspot.com/), PenTestIT (http://www.pentestit.com/), The Hacker News (http://thehackernews.com/), Toolswatch (http://www.vulnerabilitydatabase.com/toolswatch/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Google (of course), Twitter (and the never-ending
list of favorites I keep there) and many others great sources that I have used
over the years to gather the list of tools.
I can't thank you all enough, and wish you all the best.
27. Appendix A: Tools That Were Not Included
The following commercial web application
vulnerability scanners were not included in the benchmark,
due to deadlines and time restrictions from my part:
Commercial Scanners not included in this benchmark
(*)Websure
The following open source web application
vulnerability scanners were not included in the benchmark, mainly
due to time restrictions, but might be included in future benchmarks:
Open Source Scanners not included in this benchmark
(*)Spacemonkey
(*)Vulnerability Scanner 1.0 (by cmiN, RST)
The following is a partial list of SAAS scanners
were not included in the benchmark, mainly due to time
restrictions, but might be included in future benchmarks:
SAAS Online Scanning Services
Appscan On Demand (IBM), Click To Secure, Sentinel
(WhiteHat), Veracode (Veracode), Quatrashield, Veracode Dynamic Analysis, edgescan, VUPEN Web Application Security Scanner (VUPEN Security), WebInspect (online
service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC – currently
offline), Cloud Penetrator (Secpoint), Zero Day Scan, DomXSS Scanner, Golem
Technologies, etc.
Web Application Testing Tools which are using Dynamic
Runtime Analysis (IAST):
(*)Seeker
(Quotium)
(*)Appscan
Glassbox (IBM)
(*)Contrast (Contrast Security)
The benchmark focused on web application scanners that are
able to detect at least Reflected XSS or SQL Injection vulnerabilities, can be
locally installed, and are also able to scan multiple URLs in the same
execution.
As a result, the test did not include the following
types of tools:
Scanners without RXSS / SQLi detection features:
(*)LFI/RFI Checker (astalavista)
(*)Etc.
Passive Scanners (response analysis without verification):
(*)Etc.
Scanners of specific products or services (CMS scanners, Web
Services, etc):
(*)WSDigger
(*)Sprajax
(*)ScanAjax
(*)Joomscan
(*)wpscan
(*)Joomlascan
(*)Joomsq
(*)WPSqli
(*)Etc.
Uncontrollable Scanners
Scanners that can’t be controlled or restricted to scan a
single site, since they either receive the list of URLs to scan from Google
Dork, or continue and scan external sites that are linked to the tested site.
This list currently includes the following tools (and might include more):
(*)Darkjumper 5.8 (scans additional external hosts
that are linked to the given tested host)
(*)Bako's SQL Injection Scanner 2.2 (only
tests sites from a google dork)
(*)Serverchk (only tests sites from a google dork)
(*)XSS Scanner by Xylitol (only tests sites
from a google dork)
(*)Hexjector by hkhexon – also falls into
other categories
(*)d0rk3r by b4ltazar
Deprecated Scanners
Incomplete tools that were not maintained for a very long
time; currently includes the following tools (and might include more):
(*)Wpoison (development stopped in 2003, the new
official version was never released, although the 2002 development version can
be obtained by manually composing the sourceforge URL which does not appear in
the web site- http://sourceforge.net/projects/wpoison/files/ )
De facto Fuzzers
Tools that scan applications in a similar way to a scanner,
but where the scanner attempts to conclude whether or not the application or is
vulnerable (according to some sort of “intelligent” set of rules), the fuzzer
simply collects abnormal responses to various inputs and behaviors, leaving the
task of concluding to the human user.
(*)Lilith 0.4c/0.6a (both versions 0.4c and 0.6a were
tested, and although the tool seems to be a scanner at first glimpse, it
doesn’t perform any intelligent analysis on the results).
(*)Spike proxy 1.48 (although the tool has XSS
and SQLi scan features, it acts like a fuzzer more then it acts like a scanner
– it sends payloads of partial XSS and SQLi, and does not verify that the
context of the returned output is sufficient for execution or that the error
presented by the server is related to a database syntax injection, leaving the
verification task for the user).
Fuzzers
Scanning tools that lack the independent ability to conclude
whether a given response represents a vulnerable location, by using some sort
of verification method (this category includes tools such as JBroFuzz,
Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type of
exposure that was verified were included in the benchmark (Powerfuzzer).
CGI Scanners
Vulnerability scanners that focus on detecting hardening
flaws and version specific hazards in web infrastructures (Nikto, Wikto, WHCC,
st4lk3r, N-Stealth, etc)
Single URL Vulnerability Scanners
Scanners that can only scan one URL at a time, or can only
scan information from a google dork (uncontrollable):
(*)Havij (by itsecteam.com)
(*)Hexjector (by hkhexon)
(*)Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)
(*)Mysqloit (by muhaimindz)
(*)PHP Fuzzer (by RoMeO from DarkMindZ)
(*)SQLi-Scanner (by Valentin Hoebel)
(*)Etc.
Vulnerability Detection Toolkits
Tools that aid in discovering vulnerabilities, but do not
detect the vulnerability themselves; for example:
Exploitation Tools
Tools that can exploit vulnerabilities without any
independent ability to automatically detect vulnerabilities on a large scale.
Examples:
(*)MultiInjector
(*)XSS-Proxy-Scanner
(*)Pangolin
(*)FGInjector
(*)Absinth
(*)Safe3 SQL Injector (an exploitation tool with
scanning features (pentest mode) that are not available in the free
version)
(*)Etc.
Exceptional Cases
(*)SecurityQA Toolbar (iSec) – various lists and
rumors include this tool in the collection of free/open-source vulnerability
scanners, but I wasn’t able to obtain it from the vendor’s web site, or from
any other legitimate source, so I’m not really sure it fits the “free to use”
category.
Great article and analysis. Thanks for sharing
ReplyDeleteI'm speechless man
ReplyDeleteAbsolutely amazing! Thank you so much for your hard work! You are making an enormous contribution to the state of web scanner out there.
ReplyDeleteI'm a little confused about the false positive bar. On some areas you show 100% being false positives but then also say 18% of being accurately detected. How is this possible?
ReplyDeleteThe percentage score of the false positive ratio is relative to the false positive test cases, not to the whole.
DeleteFor example, for backup/hidden file detection (which is the category the percentages you mentioned came from) - there are 184 different REAL vulnerable hidden/backup files,
and 3 categories (not necessarily single files) of false positive behaviors.
In the case of the scanner that got this score, it means it found 18% out of the 184 real vulnerabilities, but found numerous false positives that fall into each one of the 3 false positive categories (it identifies dynamic 200-404 responses, custom 200/404 responses and default file responses as hidden files).
The same idea applies to all the rest of the tests - false positive test cases represent application behavior categories that may be identified as a vulnerability - and these test cases are separated from the actual vulnerable test cases.
Hi,
ReplyDeleteCan you please share with me your expert comments on whether any of the opensource tool listed above is Linux friendly ?
Apart from this can you please share with me the benchmark comparison for Kali Linux and OpenVas.
Thank you in anticipation.
Regards,
Dhruv Trehan
All the java ones should be fairly linux friendly (ZAP, Vega, etc) - and in fact, many of them should be already included within Kali linux.
DeleteThe reason Kali isn't included is that it's a collection of tools, not a scanner - and in fact, contains many of the tools assessed (check the various kali menus and you'll recognize the names).
As for OpenVAS, it's a network/ifrastructure penetration testing tools - not an application penetration testing tool (at least not in focus), and therefore was not in the scope of this benchmark.
I am missing BeyondTrust's (former eEye) Web Application Scanner or BeyondSaas.
ReplyDeleteWebCruiser Web Vulnerability Scanner 3
ReplyDeletehttp://lobatandawgs.com/104-webcruiser-web-vulnerability-scanner-3.html
http://shanghaiblackgoons.com/107-webcruiser-web-vulnerability-scanner-3.html
My company has expressed interest in a new cloud based scanner from Detectify. Would it be possible to add this tool to your future evaluations?
ReplyDeleteThis comment has been removed by the author.
ReplyDelete