This document discusses the mechanisms within Ghidra which mitigate
the impact of language modifications to existing user program
files. There are two general classes of language modifications
which can be supported by the language translation capabilities within
Ghidra :
Any program opened within Ghidra whose language has had a version change or has been replaced by a new implementation will be forced to upgrade. This will prevent such a program file from being opened as immutable and will impose a delay due to the necessary re-disassembly of all instructions.
In addition to a forced upgrade, Ghidra's Set Language capability will allow
a user to make certain transitions between similar language
implementations. Such transitions are generally facilitated via a
default translator, although certain limitations are imposed based upon
address space sizes and register mappings.
A language's version is specified as a <major>.<minor> number pair (e.g.,
1.0). The decision to advance the major or minor version number
should be based upon the following criteria:
Anytime the major version number is advanced, the minor version
number should be reset to zero.
Only major version changes utilize a Language
Translator to facilitate the language transition.
When eliminating an old language the following must be accomplished:
Before eliminating a language a corresponding "old" language file
must be generated and stored somewhere within Ghidra's languages
directory (core/languages/old
directory has been established for this purpose). In
addition, a simple or custom Language Translator
must be established to facilitate the language migration to the
replacement language.
An old-language file may be generated automatically while the
language still exists using the GenerateOldLanguagePlugin
configured into Ghidra's project window. In addition, if
appropriate, a draft simple Language Translator specification can
generated provided the replacement language is also available.
To generate an old-language file and optionally a draft simple
translator specification:
An old-language specification file is used to describe the essential
elements of a language needed to instantiate an old program using that
language and to facilitate translation to a replacement language.
The specification file is an XML file which identifies a language's description, address spaces and named registers. Since it should be generated using the GenerateOldLanguagePlugin, its syntax is not defined here.
Sample
Old-Language Specification File:
<?xml version="1.0" encoding="UTF-8"?>
<language version="1" endian="big">
<description>
<name>MyOldProcessorLanguage</name>
<processor>MyOldProcessor</processor>
<family>Motorola</family>
<alias>MyOldProcessorLanguageAlias1</alias>
<alias>MyOldProcessorLanguageAlias2</alias>
</description>
<spaces>
<space name="ram" type="ram" size="4" default="yes" />
<space name="register" type="register" size="4" />
<space name="data" type="code" size="4" />
</spaces>
<registers>
<context_register name="contextreg" offset="0x40" bitsize="8">
<field name="ctxbit1" range="1,1" />
<field name="ctxbit0" range="0,0" />
</context_register>
<register name="r0" offset="0x0" bitsize="32" />
<register name="r1" offset="0x4" bitsize="32" />
<register name="r2" offset="0x8" bitsize="32" />
<register name="r3" offset="0xc" bitsize="32" />
<register name="r4" offset="0x10" bitsize="32" />
</registers>
</language>
A language translator facilitates the renaming of address spaces,
and relocation/renaming of registers. In addition, stored
register values can be transformed - although limited knowledge is
available for decision making. Through the process of
re-disassembly, language changes in instruction and subconstructor
pattern matching is handled. Three forms of translators are
supported:
Sample Simple Translator Specification File:
<?xml version="1.0" encoding="UTF-8"?> <language_translation> <from_language version="1">MyOldProcessorLanguage</from_language> <to_language version="1">MyNewProcessorLanguage</to_language> <!-- Obsolete space will be deleted with all code units in that space. --> <delete_space name="data" /> <!-- Spaces whose name has changed can be mapped over --> <map_space from="ram" to="ram" /> <!-- Registers whose name has changed can be mapped (size and offset changes are allowed) The map_register may include a size attribute although it is ignored. --> <map_register from="r0" to="cr0" /> <map_register from="r1" to="cr1" /> <!-- All existing processor context can be cleared --> <clear_all_context/> <!-- A specific context value can be painted across all of program memory NOTE: sets occur after clear_all_context --> <set_context name="ctxbit0" value="1"/> <!-- Force a specific Java class which extends ghidra.program.util.LanguagePostUpgradeInstructionHandler to be invoked following translation and re-disassembly to allow for more complex instruction context transformations/repair. --> <post_upgrade_handler class="ghidra.program.language.MyOldNewProcessorInstructionRepair" /> </language_translation>
Translator Limitations
The current translation mechanism does not handle the potential need
for complete re-disassembly and associated auto-analysis.