-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update READMEs and documentation (#87)
Co-authored-by: jonschz <jonschz@users.noreply.github.com>
- Loading branch information
Showing
7 changed files
with
353 additions
and
233 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Contributing | ||
|
||
Your contributions are very much appreciated! If you want to work on this tool, we recommend you do the following: | ||
1. Set up a virtual environment in this directory. | ||
2. Install this project within itself in editable mode: `pip install -e .` | ||
3. Install the dev requirements: `pip install -r requirements-tests.txt` | ||
|
||
If you also have a decompilation project, we recommend the following: | ||
1. Set up a _separate_ virtual environment in your decompilation project. | ||
2. Inside that virtual environment, `pip install -e path/to/your/local/reccmp/repository`. | ||
|
||
This way, you can easily run your latest `reccmp` changes against your decompilation project. | ||
|
||
## Testing | ||
|
||
`isledecomp` comes with a suite of tests based on `pytest`. A number of them can be run out of the box: | ||
```bash | ||
pytest . | ||
``` | ||
|
||
As of this writing, some of the tests still depend on the [LEGO Island decompilation project](https://github.com/isledecomp/isle). You will need a copy of the _original_ binaries for LEGO Island in order to execute all tests. This can be done by | ||
```bash | ||
pytest . --lego1=/path/to/LEGO1.DLL | ||
``` | ||
|
||
## Linting and formatting | ||
|
||
In order to keep the Python code clean and consistent, we use `pylint` and `black`: | ||
|
||
* Run `pylint`: `pylint reccmp` | ||
* Check formatting without making changes: `black --check reccmp` | ||
* Apply formatting: `black reccmp` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,255 +1,76 @@ | ||
# LEGO Island Decompilation Tools | ||
# Reccmp Decompilation Toolchain | ||
|
||
Accuracy to the game's original code is the main goal of the [decompilation project](https://github.com/isledecomp/isle). To facilitate the decompilation effort and maintain overall quality, we have devised a set of annotations, to be embedded in the source code, which allow us to automatically verify the accuracy of re-compiled functions' assembly, virtual tables, variable offsets and more. | ||
|
||
In order for contributions to be accepted, the annotations must be used in accordance to the rules outlined here. Proper use is enforced by [GitHub Actions](/.github/workflows) which run the Python tools found in this folder. It is recommended to integrate these tools into your local development workflow as well. | ||
|
||
# Overview | ||
|
||
We are continually working on extending the capabilities of our "decompilation language" and the toolset around it. Some of the following annotations have not made it into formal verification and thus are not technically enforced on the source code level yet (marked as **WIP**). Nevertheless, it is recommended to use them since it is highly likely they will eventually be fully integrated. | ||
|
||
## Functions | ||
|
||
All non-inlined functions in the code base with the exception of [3rd party code](https://github.com/isledecomp/isle/tree/master/3rdparty) must be annotated with one of the following markers, which include the module name and address of the function as found in the original binaries. This information is then used to compare the recompiled assembly with the original assembly, resulting in an accuracy score. Functions in a given compilation unit must be ordered by their address in ascending order. | ||
|
||
The annotations can be attached to the function implementation, which is the most common case, or use the "comment" syntax (see examples below) for functions that cannot be referred to directly (such as templated, synthetic or non-inlined inline functions). The latter should only ever appear in `.h` files. | ||
|
||
### `FUNCTION` | ||
|
||
Functions with a reasonably complete implementation which are not templated or synthetic (see below) should be annotated with `FUNCTION`. | ||
|
||
``` | ||
`reccmp` (recompilation comparison) is a collection of tools for decompilation projects. It was born from the [decompilation of LEGO Island](https://github.com/isledecomp/isle). Functions and data are matched based on comments in the source code. For example: | ||
```cpp | ||
// FUNCTION: LEGO1 0x100b12c0 | ||
MxCore* MxObjectFactory::Create(const char* p_name) | ||
{ | ||
// implementation | ||
} | ||
// FUNCTION: LEGO1 0x100140d0 | ||
// MxCore::IsA | ||
``` | ||
|
||
### `STUB` | ||
|
||
Functions with no or a very incomplete implementation should be annotated with `STUB`. These will not be compared to the original assembly. | ||
|
||
``` | ||
// STUB: LEGO1 0x10011d50 | ||
LegoCameraController::LegoCameraController() | ||
{ | ||
// TODO | ||
} | ||
``` | ||
|
||
### `TEMPLATE` | ||
|
||
Templated functions should be annotated with `TEMPLATE`. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of a templated class. | ||
|
||
``` | ||
// TEMPLATE: LEGO1 0x100c0ee0 | ||
// list<MxNextActionDataStart *,allocator<MxNextActionDataStart *> >::_Buynode | ||
This allows you to automatically verify the accuracy of re-compiled functions, virtual tables, variable offsets and more. See [here](docs/annotations.md) for the full syntax. | ||
// TEMPLATE: LEGO1 0x100c0fc0 | ||
// MxStreamListMxDSSubscriber::~MxStreamListMxDSSubscriber | ||
At the moment, C++ compiled to 32-bit x86 with old versions of MSVC (like 4.20) is supported. Work on support for newer MSVC versions is in progress - testing and bug reports are greatly appreciated. Other compilers, languages and architectures are not supported at the moment, but feel free to contribute if you wish to do so! | ||
// TEMPLATE: LEGO1 0x100c1010 | ||
// MxStreamListMxDSAction::~MxStreamListMxDSAction | ||
``` | ||
|
||
### `SYNTHETIC` | ||
|
||
Synthetic functions should be annotated with `SYNTHETIC`. A synthetic function is generated by the compiler; most common is the "scalar deleting destructor" found in virtual tables. Other cases include default destructors and assignment operators. Note: `SYNTHETIC` takes precedence over `TEMPLATE`. | ||
|
||
``` | ||
// SYNTHETIC: LEGO1 0x10003210 | ||
// Helicopter::`scalar deleting destructor' | ||
// SYNTHETIC: LEGO1 0x100c4f50 | ||
// MxCollection<MxRegionLeftRight *>::`scalar deleting destructor' | ||
// SYNTHETIC: LEGO1 0x100c4fc0 | ||
// MxList<MxRegionLeftRight *>::`scalar deleting destructor' | ||
``` | ||
|
||
### `LIBRARY` | ||
|
||
Functions located in 3rd party libraries should be annotated with `LIBRARY`. Since the goal is to eventually have a full accounting of all the functions present in the binaries, please make an effort to find and annotate every function of every statically linked library, including the MSVC standard libraries. | ||
|
||
``` | ||
// LIBRARY: ISLE 0x4061b0 | ||
// _MemPoolInit@4 | ||
## Getting started | ||
// LIBRARY: ISLE 0x406520 | ||
// _MemPoolSetPageSize@8 | ||
### Installing / upgrading `reccmp` | ||
1. (Recommended) Set up and activate a virtual Python environment in the directory of your recompilation project (this is different for different operating systems and shells). | ||
2. Install `reccmp`: `pip install https://github.com/isledecomp/reccmp` | ||
// LIBRARY: ISLE 0x406630 | ||
// _MemPoolSetBlockSizeFS@8 | ||
``` | ||
The next steps differ based on what kind of project you have. | ||
## Virtual tables | ||
### Contributing to a project that already uses `reccmp` | ||
1. Compile the C++ project. | ||
2. Run `reccmp-project detect --search-path path/to/folder/with/original/binaries`. | ||
3. If there is no `reccmp-build.yml` after building: Navigate to the recompiled binaries folder and run `reccmp-project detect --what recompiled`. | ||
4. Look into `reccmp-project.yml` to see what the target is called. | ||
5. Run `reccmp-reccmp --target <YOURTARGET>`. You should see a list of functions and others together with their match percentage. | ||
Classes with a virtual table should be annotated using the `VTABLE` marker, which includes the module name and address of the virtual table. Additionally, virtual function declarations should be annotated with a comment indicating their relative offset. Please use the following example as a reference. | ||
### Setting up an existing decompilation project that has not used `reccmp` before | ||
``` | ||
// VTABLE: LEGO1 0x100dc900 | ||
class MxEventManager : public MxMediaManager { | ||
public: | ||
MxEventManager(); | ||
virtual ~MxEventManager() override; | ||
virtual void Destroy() override; // vtable+0x18 | ||
virtual MxResult Create(MxU32 p_frequencyMS, MxBool p_createThread); // vtable+0x28 | ||
``` | ||
1. Run `reccmp-project create --originals path/to/original --scm`. This generates two files `reccmp-project.yml` and `reccmp-user.yml`; the latter will automatically be added to the `.gitignore`. | ||
2. Annotate one function of your existing project as shown above and recompile. Note that the recompiled binary should have the same name file name as the original. | ||
3. Navigate to your recompiled binary and run `reccmp-project detect --what recompiled`. A file `reccmp-build.yml` will be generated. This file should also be user-specific (see below on how to auto-generate this file by the build toolchain). | ||
4. Look into `reccmp-project.yml` to see what the target is called. | ||
5. Run `reccmp-reccmp --target <YOURTARGET>` from the same directory. If all goes well, you will see match percentage of the function you annotated above. | ||
## Class size | ||
### Fresh project | ||
Classes should be annotated using the `SIZE` marker to indicate their size. If you are unsure about the class size in the original binary, please use the currently available information (known member variables) and detail the circumstances in an extra comment if necessary. | ||
1. Run `reccmp-project create --originals path/to/original/binary --cmake-project` | ||
2. You will see a lot of new files. Set up your C++ compiler and compile the project defined by `CMakeLists.txt`, ideally into a sub-directory like `./build`. Advice on building with old MSVC versions can be found at the [LEGO Island Decompilation project](https://github.com/isledecomp/isle). | ||
3. Look into `reccmp-project.yml` to see what the target is called. | ||
4. Navigate to the build directory and run `reccmp-reccmp --target <YOURTARGET>`. | ||
``` | ||
// SIZE 0x1c | ||
class MxCriticalSection { | ||
public: | ||
MxCriticalSection(); | ||
~MxCriticalSection(); | ||
static void SetDoMutex(); | ||
``` | ||
## Tooling | ||
Furthermore, add `DECOMP_SIZE_ASSERT(MxCriticalSection, 0x1c)` to the respective `.cpp` file (if the class has no dedicated `.cpp` file, use any appropriate `.cpp` file where the class is used). | ||
|
||
## Member variables | ||
|
||
Member variables should be annotated with their relative offsets. | ||
|
||
``` | ||
class MxDSObject : public MxCore { | ||
private: | ||
MxU32 m_sizeOnDisk; // 0x8 | ||
MxU16 m_type; // 0xc | ||
char* m_sourceName; // 0x10 | ||
undefined4 m_unk0x14; // 0x14 | ||
``` | ||
|
||
## Global variables | ||
|
||
Global variables should be annotated using the `GLOBAL` marker, which includes the module name and address of the variable. | ||
|
||
``` | ||
// GLOBAL: LEGO1 0x100f456c | ||
MxAtomId* g_jukeboxScript = NULL; | ||
// GLOBAL: LEGO1 0x100f4570 | ||
MxAtomId* g_pz5Script = NULL; | ||
// GLOBAL: LEGO1 0x100f4574 | ||
MxAtomId* g_introScript = NULL; | ||
``` | ||
|
||
## Strings | ||
|
||
String values should be annotated using the `STRING` marker, which includes the module name and address of the string. | ||
|
||
``` | ||
inline virtual const char* ClassName() const override // vtable+0x0c | ||
{ | ||
// STRING: LEGO1 0x100f03fc | ||
return "Act2PoliceStation"; | ||
} | ||
``` | ||
|
||
# Tooling | ||
|
||
Use `pip` to install the required packages to be able to use the Python tools found in this folder: | ||
|
||
``` | ||
pip install -e . | ||
``` | ||
|
||
All scripts will become available to use in your terminal with the `reccmp-` prefix. The example usages below assume that the retail binaries have been copied to `./legobin`. | ||
All scripts will become available to use in your terminal with the `reccmp-` prefix. Note that these scripts need to be executed in the directory where `reccmp-build.yml` is located. | ||
* [`decomplint`](/reccmp/tools/decomplint.py): Checks the decompilation annotations (see above) | ||
* e.g. `reccmp-decomplint --module LEGO1 LEGO1` | ||
* [`isledecomp`](/reccmp/isledecomp): A library that implements a parser to identify the decompilation annotations (see above) | ||
* [`reccmp`](/reccmp/reccmp): Compares an original binary with a recompiled binary, provided a PDB file. For example: | ||
* Display the diff for a single function: `reccmp-reccmp --verbose 0x100ae1a0 legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* Generate an HTML report: `reccmp-reccmp --html output.html legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* Create a base file for diffs: `reccmp-reccmp --json base.json --silent legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* Diff against a base file: `reccmp-reccmp --diff base.json legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* [`reccmp`](/reccmp/tools/asmcmp.py): Compares an original binary with a recompiled binary, provided a PDB file. For example: | ||
* Display the diff for a single function: `reccmp-reccmp --target LEGO1 --verbose 0x100ae1a0` | ||
* Generate an HTML report: `reccmp-reccmp --target LEGO1 --html output.html` | ||
* Create a base file for diffs: `reccmp-reccmp --target LEGO1 --json base.json --silent` | ||
* Diff against a base file: `reccmp-reccmp --target LEGO1 --diff base.json` | ||
* [`stackcmp`](/reccmp/tools/stackcmp.py): Compares the stack layout for a given function that almost matches. | ||
* e.g. `reccmp-stackcmp legobin/BETA10.DLL build_debug/LEGO1.DLL build_debug/LEGO1.pdb . 0x1007165d` | ||
* e.g. `reccmp-stackcmp --target BETA10 0x1007165d` | ||
* [`roadmap`](/reccmp/tools/roadmap.py): Compares symbol locations in an original binary with the same symbol locations of a recompiled binary | ||
* [`verexp`](/reccmp/tools/verexp.py): Verifies exports by comparing the exports of the original DLL and the recompiled DLL | ||
* [`vtable`](/reccmp/tools/vtable.py): Asserts virtual table correctness by comparing a recompiled binary with the original | ||
* e.g. `reccmp-vtable legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* e.g. `reccmp-vtable --target LEGO1` | ||
* [`datacmp`](/reccmp/tools/datacmp.py): Compares global data found in the original with the recompiled version | ||
* e.g. `reccmp-datacmp legobin/LEGO1.DLL build/LEGO1.DLL build/LEGO1.PDB .` | ||
* e.g. `reccmp-datacmp --target LEGO1` | ||
## Testing | ||
## Ghidra Import | ||
`isledecomp` comes with a suite of tests. Install `requirements-tests.txt` and run it like this: | ||
There are existing scripts to import the information from the decompilation into [Ghidra](https://github.com/NationalSecurityAgency/ghidra). See the relevant [README](reccmp/ghidra_scripts/README.md) for additional information. | ||
``` | ||
pip install -r requirements-tests.txt | ||
pytest . | ||
``` | ||
|
||
## Tool Development | ||
|
||
In order to keep the Python code clean and consistent, we use `pylint` and `black`: | ||
|
||
`pip install -r requirements-tests.txt` | ||
|
||
### Run pylint (ignores build and virtualenv) | ||
|
||
`pylint reccmp` | ||
|
||
### Check Python code formatting without rewriting files | ||
|
||
`black --check reccmp` | ||
|
||
### Apply Python code formatting | ||
|
||
`black reccmp` | ||
|
||
# Modules | ||
The following is a list of all the modules found in the annotations (e.g. `// FUNCTION: [module] [address]`) and which binaries they refer to. See [this list of all known versions of the game](https://www.legoisland.org/wiki/LEGO_Island#Download). | ||
|
||
## Retail v1.1.0.0 (v1.1) | ||
* `LEGO1` -> `LEGO1.DLL` | ||
* `CONFIG`-> `CONFIG.EXE` | ||
* `ISLE` -> `ISLE.EXE` | ||
|
||
These modules are the most important ones and refer to the English retail version 1.1.0.0 (often shortened to v1.1), which is the most widely released one. These are the ones we attempt to decompile and match as best as possible. | ||
|
||
## BETA v1.0 | ||
|
||
* `BETA10` -> `LEGO1D.DLL` | ||
|
||
The Beta 1.0 version contains a debug build of the game. While it does not have debug symbols, it still has a number of benefits: | ||
* It is built with less or no optimisation, leading to better decompilations in Ghidra | ||
* Far fewer functions are inlined by the compiler, so it can be used to recognise inlined functions | ||
* It contains assertions that tell us original variable names and code file paths | ||
|
||
It is therefore advisable to search for the corresponding function in `BETA10` when decompiling a function in `LEGO1`. Finding the correct function can be tricky, but is usually worth it, especially for longer functions. | ||
|
||
Unfortunately, some code has been changed after this beta version was created. Therefore, we are not aiming for a perfect binary match of `BETA10`. In case of discrepancies, `LEGO1` (as defined above) is our "gold standard" for matching. | ||
|
||
### Re-compiling a beta build (**WIP**) | ||
|
||
If you want to match the code against `BETA10`, use the following `cmake` setup to create a debug build: | ||
``` | ||
cmake <path-to-source> -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_BUILD_TYPE=Debug -DISLE_USE_SMARTHEAP=OFF | ||
``` | ||
**TODO**: If you can figure out how to make a debug build with SmartHeap enabled, please add it here. | ||
## Best practices | ||
If you want to run scripts to compare your debug build to `BETA10` (e.g. `reccmp`), it is advisable to add a copy of `LEGO1D.DLL` to `/legobin` and rename it to `BETA10.DLL`. | ||
We have established some best practices that have no impact on `reccmp`'s output, but have made a positive impact on the LEGO Island decompilation. We have listed them [here](docs/recommendations.md) for convenience. | ||
### Finding matching functions | ||
This is not a recipe, but rather a list of things you can try. | ||
* If you are working on a virtual function in a class, try to find the class' vtable. Many (but not all) classes implement `ClassName()`. These functions are usually easy to find by searching the memory for the string consisting of the class name. Keep in mind that not all child classes overwrite this function, so if the function you found is used in multiple vtables (or if you found multiple `ClassName()`-like functions), make sure you actually have the parent's vtable. | ||
* If that does not help, you can try to walk up the call tree and try to locate a function that calls the function you are interested in. | ||
* Assertions can also help you - most `.cpp` file names have already been matched based on `BETA10`, so you can search for the name of your `.cpp` file and check all the assertions in that file. While that does not find all functions in a given source file, it usually finds the more complex ones. | ||
* _If you have found any other strategies, please add them here._ | ||
## Contributing | ||
## Others (**WIP**) | ||
* `ALPHA` (only used twice) | ||
Feel free to contribute to this project if you are interested! More information can be found at [CONTRIBUTING.md](./CONTRIBUTING.md). |
Oops, something went wrong.