Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved behavior for indirect name lookup #89

Merged
merged 2 commits into from
Mar 2, 2025

Conversation

disinvite
Copy link
Collaborator

There are a few different types of "indirect" calls:

1. call dword ptr [g_unk0x101013b0 (DATA)]
2. call dword ptr [->Matrix4::Clear (FUNCTION)]
3. call dword ptr [->GDI32.DLL:CreateFontA (FUNCTION)]

The path to get the name depends on what we're looking at:

In 1, we are calling the function pointer at the given variable, which may or may not have an initial value. 2 is a virtual method but we are calling it this way instead of loading the vptr base into ecx and using an offset. 3 is an imported function from a DLL.

In all cases, the address in the instruction is a pointer to the actual address we want to call. In 1, we have the variable in our database and it's more useful to use that as the name. (The initial value may be irrelevant.) For 2, we can get to the annotated function by reading the pointer address from the binary and looking up that value in our database.

We can do the same for 3, except that the pointer location does not have an address. It is a relative virtual address. When the image is loaded, this value should be updated to point at the location of the imported function, but in the file it looks like this:

                    ********************************************
                    *       POINTER to EXTERNAL FUNCTION       *
                    ********************************************
                    undefined DirectDrawCreate()
         undefined    AL:1      <RETURN>
                    5  DirectDrawCreate  <<not bound>>
                    PTR_DirectDrawCreate_1010b32c     XREF[1]: DirectDrawCreate:100d3...
   1010b32c 1e be      addr    DDRAW.DLL::DirectDrawCreate
            10 00

0x10be1e is the RVA to the IMAGE_IMPORT_BY_NAME struct for this function.

This currently works for asm sanitize by the hack of loading the RVA as if it were an address. The problem comes later when we want to iterate through all entities in the database. We have to filter out the fake addresses so they are ignored by roadmap or the ghidra import scripts.

We could patch the binary (in memory) so the RVA becomes a real address, then use that as the entity with the DLL name. What I did instead is change the name replacement function to recognize an indirect lookup and use a different attribute of the import entity for this situation.

That change required these steps:

  • Keep indirect and direct lookups separate during asm sanitize. The reason is that the same address can produce two different names depending on the context. For example:
mov esi, dword ptr [__imp__timeGetTime@0 (IMPORT)]

versus

call dword ptr [->WINMM.DLL:timeGetTime (FUNCTION)]
  • The binary lookup was a dependency of asm sanitize. This is now used directly by the name replacement function and it specifically returns a pointer instead of allowing a read of any size.

  • We now have EntityType.IMPORT specifically for imports from .idata. Previously we used POINTER as the type. We also set the import_name attribute to contain the DLL name and function name. This attribute is used by the name replacement function.

  • Loading and matching of imports is a lot simpler because we don't need to set up the aforementioned hack with the RVA.

We also have a lot of new comments and tests surrounding name replacement and asm sanitize. The only changes to the generated assembly are:

  • POINTER is now IMPORT for the various __imp__ entities.
  • The placeholder offset numbers are bumped by one in functions with indirect calls where we can't come up with a name. Previously the placeholder was shared between direct and indirect references if there was not a better name to use.

BETA10 had one minor drop in accuracy on the library function _amsg_exit. No change to LEGO1.

@jonschz
Copy link
Collaborator

jonschz commented Mar 2, 2025

I currently get an AssertionError on BETA10 from the following steps:

  • Build the current isle master
  • Create a JSON base file from the current reccmp master
  • Switch to this PR branch
  • Run a diff against the previous file.

It come from core.py, line 584. The orig_func that has no orig_func.name is given by

func_addr ='270538476'
orig_addr = '270616384'
kvstore = '{"import_name": "KERNEL32.DLL:SetConsoleMode", "size": 4, "type": 7}'

@jonschz
Copy link
Collaborator

jonschz commented Mar 2, 2025

Update: Looks like this can be fixed by doing the following in core.py, line 584:

            # Check whether the thunk destination is a matched symbol
            if orig_func.recomp_addr not in recomp_thunks:
                best_name = orig_func.best_name()
                assert best_name is not None
                self._db.create_orig_thunk(orig_thunk, best_name)
                continue

And then changing db.py, line 88, to

        for key in ("computed_name", "name", "import_name"):

Not sure if that has any unintended side effects, I couldn't find any.

@jonschz
Copy link
Collaborator

jonschz commented Mar 2, 2025

BETA10 had one minor drop in accuracy on the library function _amsg_exit. No change to LEGO1.

The regression can be fixed by adding

// GLOBAL: BETA10 0x101faf70
// _aexit_rtn

to library_msvc.h.

@disinvite
Copy link
Collaborator Author

I think we can solve it very simply by just giving the import entity a name. The recomp imports in the PDB already have names like __imp__InterlockedIncrement@4. I would guess that the number is the stack adjustment corresponding to the number of parameters.

We don't have the granularity in the DB API to set the name attribute only if it is not already set. That means we would lose the PDB values for now, but could still have: __imp__InterlockedIncrement.

@jonschz
Copy link
Collaborator

jonschz commented Mar 2, 2025

Sounds reasonable to me. Feel free to merge after addressing this issue.

Copy link
Collaborator

@jonschz jonschz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. While testing I found another issue that is already present on master - I'll open a separate ticket for that.

@disinvite disinvite merged commit c51e57b into isledecomp:master Mar 2, 2025
11 checks passed
@disinvite disinvite deleted the indirect3 branch March 2, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants