Skip to content

pfelf: add (*Section||*Prog).MmapData() #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

pfelf: add (*Section||*Prog).MmapData() #450

wants to merge 1 commit into from

Conversation

florianl
Copy link
Contributor

Add MmapData() that returns mmap backed Data of the respective Prog or Section.

Possible option as described in #352 (comment) to mmap .gopclntab.

Add MmapData() that returns mmap backed Data of the respective Prog or Section.

Possible option as described in #352 (comment) to mmap `.gopclntab`.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@fabled
Copy link
Contributor

fabled commented Apr 26, 2025

Add MmapData() that returns mmap backed Data of the respective Prog or Section.

Possible option as described in #352 (comment) to mmap .gopclntab.

So designing the API for mmapping is the critical portion of the problem. And I would prefer not to special case it to .gopclntab handling. I believe mmaping any shared object (also PE for dotnet unwinder) should be favored if:

  • the file is large
  • we know that there will be several reads done on the file
  • mmap can be done

In the above case it would probably make sense to mmap the whole file in one go, and have all of pfelf and its existing methods return slices to the mmapped memory. This would significantly reduce memory overhead on handling any large ELF file. Also the pfelf internally should use mmapped data for symbol lookups etc, if possible. IIRC, just converting the current virtual memory read APIs to read from mmap area should be sufficient for this.

In otherwords, the caller of pfelf should not care if mmap or read is being used to access the file. That should be internal implementation detail. And the high level API should work identically regardless of which method of access is used.

Or does anyone see problems with the above approach?

I believe the main issues to sort would also be:

  • using lot of address space (but should be still insignificant amount on 64-bit architectures, and I think we are not looking to support 32-bits?)
  • not sure if there is usage that the pfelf returned byte arrays are kept beyond calling Close on the pfelf? (this should not be a problem if using exp/mmap which uses finalizer to unmap)
  • do we need to care if there's two separate pfelf instances mmapping the same file?

Also unix.Mmap internally requires locking and registering mappings in a go map. I wonder if we could instead use exp/mmap (https://pkg.go.dev/golang.org/x/exp/mmap) which does not require this and seems to provide a bit more useful API in the first place.

@florianl
Copy link
Contributor Author

florianl commented May 2, 2025

Thanks for the feedback 🙏

I agree, that the API of pfelf should not change and if the file is mmaped or not should be an internal implementation detail. Also switching to exp/mmap sounds reasonable to me and also expanding the approach so that other use cases can benefit from it as well. The idea with this first approach was, to figure out the constraints, limits and potential issues with mmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants