Ghidra: NSA Reverse Engineering Software

Ghidra is a software reverse engineering (SRE) framework developed by NSA's Research Directorate. This framework includes a suite of full-featured, high-end software analysis tools that enable users to analyze compiled code on a variety of platforms including Windows, MacOS, and Linux. Capabilities include disassembly, assembly, decompilation, debugging, emulation, graphing, and scripting, along with hundreds of other features. Ghidra supports a wide variety of processor instruction sets and executable formats and can be run in both user-interactive and automated modes. Users may also develop their own Ghidra plug-in components and/or scripts using the exposed API. In addition there are numerous ways to extend Ghidra such as new processors, loaders/exporters, automated analyzers, and new visualizations.

In support of NSA's Cybersecurity mission, Ghidra was built to solve scaling and teaming problems on complex SRE efforts, and to provide a customizable and extensible SRE research platform. NSA has applied Ghidra SRE capabilities to a variety of problems that involve analyzing malicious code and generating deep insights for NSA analysts who seek a better understanding of potential vulnerabilities in networks and systems.

Log4j Vulnerability Mitigation

Please read! There have been several published CVE security vulnerabilities noted for log4j which Ghidra uses for logging. The known issues have been resolved in log4j 2.17.1. We strongly encourage anyone using previous versions of Ghidra or a build from source, to remediate this issue by either upgrading to the latest Ghidra 10.1.2 version, or patching your current version.

To patch your current Ghidra installation:

Delete any log4j jar files in Ghidra/Framework/Generic/lib.

Replace those jar files with the newer log4j 2.17.1 version: log4j-api-2.17.1.jar and log4j-core-2.17.1.jar.

Update the log4j version to refer to 2.17.1 in <install_dir>/Ghidra/Features/GhidraServer/data/classpath.frag.

You can find these in the latest Ghidra 10.1.2 release, or from:

https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-api/2.17.1/log4j-api-2.17.1.jar

https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2.17.1/log4j-core-2.17.1.jar

The details of the vulnerabilities can be found in the following: CVE-2021-44228, CVE-2021-44832, CVE-2021-45046, CVE-2021-45105.

What's New in Ghidra 10.1

The not-so-fine print: Please Read!

Ghidra 10.1 is fully backward compatible with project data from previous releases. However, programs and data type archives which are created or modified in 10.1 will not be useable by an earlier Ghidra version.

This release includes many new features and capabilities, performance improvements, quite a few bug fixes, and many pull-request contributions. Thanks to all those who have contributed their time, thoughts, and code. The Ghidra user community thanks you too!

NOTE: Please note that any programs imported with a Ghidra beta versions or code built directly from source outside of a release tag may not be compatible and may have flaws that have been corrected. Any programs analyzed with a beta should be considered experimental and re-imported and analyzed with a release version. As an example, Ghidra 10.1 beta had an import flaw affecting symbol de-mangling that was not correctable. Programs imported with previous release versions should upgrade correctly through various automatic upgrade mechanisms. Any program you will continue to reverse engineer should be imported fresh with a release version or a build you trust with the latest code fixes.

NOTE: Ghidra Server: The Ghidra 10.1 server is compatible with Ghidra 9.2 and later Ghidra clients. Ghidra 10.1 clients are compatible with all 9.x servers.

Distribution

The Ghidra distribution has been enhanced to allow building of native executables directly from a release distribution. The distribution currently provides Linux 64-bit, Windows 64-bit, and MacOS x86 binaries. If you have another platform, for example a MacOS M1 based system or a Linux variant, the support/buildNatives script can build the decompiler, demangler, and legacy PDB executables for your plaform. You will need gradle that supports building for your platform and a working compiler for your environment. Not every platform can be supported, as a pre-requisite is support by gradle. Ghidra has been tested to build additional native executables for Linux ARM 64-bit, Linux x86 variants, and macOS ARM 64-bit.

Please see the "Building Ghidra Native Components" section of the Installation Guide for additional information.

Debugger

Pure Emulation

There's a new action Emulate Program (next to the Debug Program button) to launch the current program in Ghidra's p-code emulator. This is not a new "connector." Rather, it starts a blank trace with the current program mapped in. The user can then step using the usual "Emulate Step" actions in the "Threads" window. In general, this is sufficient to run simple experiments or step through local regions of code. To modify emulated machine state, use the "Watches" window. At the moment, no other provider can modify emulated machine state.

This is also very useful in combination with the "P-code Stepper" window (this plugin must be added manually via File->Configure). A language developer can, for example, assemble an instruction that needs testing, start emulating with the cursor at that instruction, and then step individual p-code ops in the "P-code Stepper" window.

Raw Hex for Live Memory

We've added a variant of the "Bytes" window within dynamic trace, allowing viewing live memory as hex, ascii, etc. The window includes the same background coloring, navigation, and tracking actions as the "Dynamic Listing". To open this window, select Window -> Bytes -> Memory.

LLDB Support

Working toward debugging macOS targets, we've added support for LLDB. Currently, some effort is required on the user's end to clone, patch, and build LLDB with language bindings for Java. Once done, the new connectors for LLDB can be used in the normal fashion. While intended for macOS, these connectors also work on Linux, and may work on Windows, too. This offers an alternative for those who prefer lldb to gdb.

Decompiler

Many improvements have been made to the decompiler output to improve readability. These include the production of else-if syntax in control flow, and the reduction of casting when typedefs are involved. In addition, pointer calculation during sub-expression elimination has been improved, and a new API for iterating and accessing the decompiler output syntax tokens has been added.

Data Types

Support for zero-length data types and components has been improved, although such types will continue to report a non-zero length using the DataType.getLength() method. For code/features that can support zero-length data types the DataType.isZeroLength() method must be used to identify this case. The DataType.isZeroLength() is no longer synonymous with DataType.isNotYetDefined() which is intended to identify data types (i.e., structures and unions) whose components have not yet be specified. Along these same lines, Ghidra now allows zero-element arrays to be defined. The API methods supporting a trailing flex-array on structures have been removed in favor of using zero-element array components. Existing flex-array instances will be upgraded accordinagly within Programs and Data Type Archives. The static method DataTypeComponent.usesZeroLengthComponent(DataType) may be used to determine if a zero-length component will be used for a specific data type. Due to the overlapping behavior of zero-length components, a data type which returns true for isNotYetDefined() will not produce a zero-length component.

Improved parsing of C header files to correctly extract data type definitions, including corrected sizeof() handling, expression simplification to a constant for many types such as array size and enumeration value, and handling of type declarations within function and structure declarations. We have re-parsed most of the included data type archives to take advantage of the changes, and plan to update the archives to more recent versions of the header files in the near future.

Mach-O Binary Import

Mach-O binary import has been greatly improved, including handling of relocation pointer chains, support for newer Objective-C class structures with RelativePointers, additional load commands, and support for more recent versions of dyld and kernel caches including split-file dyld_shared_cache variants.

Android

Import and analysis of the entire existing set (almost) of Android binaries up to version 12.x is now supported. The type of binaries supported include: Android Run-Time (ART), Ahead-of-Time (OAT)/ELF, Dalvik Executables (DEX), Compact DEX (CDEX), Verified DEX (VEX), Boot Image, and Boot Loader formats. Also included are Sleigh modules for DEX files covering each major release of Android; the optimized instructions vary across versions. Now when importing DEX files, you can select the Dalvik language appropriate to the Android release, which will result in better analysis.

Performance Improvements

There have been many performance improvements to import, analysis, program data base access, many API calls, and the user interface.

Symbol performance in Ghidra was significantly improved. Specifically, new database indexes were created to improve finding primary symbols as well as improving lookups by combinations of name, namespace, and address.

Processors

Improvements and bug fixes to many processors to include: X86, ARM, AARCH64, SPARC, PPC, SH4, RISC-V, and 6502.

DWARF

Support for loading DWARF debug information from a separate file during import has been added. In addition data type information contained in the separate debug file can be loaded without application to a program, enabling the use of debug information from a related version of the binary.

Bug Fixes and Enhancements

Numerous other bug fixes and improvements are fully listed in the ChangeHistory file.

https://www.nsa.gov/ghidra