Leveraging multistage hybrid source code analysis in Lucent Sky AVM

September 18, 2015

Lucent Sky AVM makes use of “hybrid source code analysis” in scanning applications. This post first explains how different types of static analysis work, before diving into how Lucent Sky uses a multistage hybrid approach to automate how vulnerabilities are found and fixed.

Most SAST (static application security testing) tools identify vulnerabilities by first creating a “flow graph” (a model that represents the logic of the application), then applying security rules on the flow graph. There are two ways to generate the flow graph - analyzing the source code of the application, or analyzing the binary files of the application. These two approaches not only have different benefits and restrictions, but how they’re implemented also has a major impact on their effectiveness.

Direct source code analysis vs. abstracted source code analysis

The primary benefit of analyzing source code is that applications can be analyzed before they can be compiled. This allows for the use of SAST during an earlier stage of the SDLC. The main restriction is that the coverage is less, because functions that use binary libraries (such as .dll and .jar files) cannot be accurately analyzed.

Most SAST tools that analyze the source code of an application do not analyze the actual source code. Instead, they first create an abstraction of the source code, and then analyze the abstraction. Because the abstraction is language-neutral (i.e. similar code in C# and Java will be represented by the same abstraction), this mechanism allows SAST vendors to build a single analysis engine that works on multiple programming languages.

However, this abstraction process comes with a major downside. No matter how accurate the abstraction is, some information about the original source code is lost during the process, which causes more false positives and false negatives. This is where direct source code analysis shines.

Direct source code analysis does not abstract the source code. Instead, it analyzes source code in its original form. Not many analysis engines do this, because to do so requires language-dependent customizations for each language the engine supports. However, by directly analyzing the source code, the analysis engine can extract all the details about both the code and the context of the vulnerabilities contained there.

Direct bytecode analysis vs. decompiled bytecode analysis

The major reason for conducting a bytecode or binary analysis is the ability to analyze an application and its referenced libraries without access to the complete source code (or access to any source code in some cases). The downside is that binary files, even with their corresponding debug symbols, lack some information that is only present in the source code.

There are two common implementations of bytecode analysis: decompiled bytecode analysis and direct bytecode analysis. Decompiled bytecode analysis uses a bytecode decompiler to translate the bytecode into an abstraction that is language-neutral, and then analyzes this abstraction. This is similar to how abstracted source code analysis works, but because it has an extra layer of abstraction, more source code information is lost.

Direct bytecode analysis works by creating the “flow graph” directly from the bytecode. Similar to direct source code analysis, this approach produces fewer false positives and, with the help of debug symbols, preserves more information from the original source code and the resulting vulnerabilities.

Multi-stage hybrid analysis

Lucent Sky AVM uses a multi-stage hybrid analysis that combines both direct bytecode analysis and direct source code analysis. Direct source code analysis allows Lucent Sky AVM to understand the context of a vulnerability, and the developer’s original intent when the vulnerability was introduced. This enables the generation of the correct mitigation (“Instant Fix”) at the correct location in the source code. Direct bytecode analysis not only enhances the accuracy of the analysis and lowers false positives, it also enables Lucent Sky AVM to identify vulnerabilities in referenced third-party libraries and even mitigate them in most cases.

By investing in the painstaking engineering efforts of direct source code analysis and direct bytecode analysis, Lucent Sky AVM is able to bring greater efficiency to the vulnerability mitigation process. We did the hard work, so you don’t have to.