Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SegFault from Double Freeing Converted Columns #224

Open
1 task done
samf25 opened this issue Feb 26, 2025 · 14 comments
Open
1 task done

SegFault from Double Freeing Converted Columns #224

samf25 opened this issue Feb 26, 2025 · 14 comments
Labels
bug Something isn't working

Comments

@samf25
Copy link

samf25 commented Feb 26, 2025

Check duplicate issues.

  • Checked for duplicates

Goal

I want to preface this by saying my reasoning could be wrong here since the seg fault was quite vague.

The destructor of one of my k4FWCore::Transformers seg faults when it is run in the same steering file as a Wrapped Marlin Processor. The Transformer is pretty simple, it tries to remove duplicate Tracks made by a Tracking algorithm. In doing so, it adds the non-duplicates to a subset Collection. (Here's the Repo)

Later on in the steering file, I have a Wrapped Processor with a EDM4hep2LcioConv attached. That converter converts the subset collection and, at some point during the running of the code, I get a segmentation fault that refers to the destructor of ACTSDuplicateRemoval.

My best guess is that somehow the converter is moving the collection and when the destructor goes through to clean up, it tries to free something that no longer exists.

Operating System and Version

Alma Linux 9

compiler

gcc 13

The version of the key4hep stack

k4fwcore-1.2

Package Version

k4marlinwrapper-0.11

Reproducer

git clone https://github.com/samf25/TrackPerfWorkspace.git
cd TrackPerfWorkspace
git checkout switchK4FWC
cd packages
rm -r ACTSTracking TrackPerf
git clone https://github.com/samf25/ACTSTracking.git
cd ACTSTracking
git checkout multiK4FWC
cd ../..

apptainer shell docker://ghcr.io/muoncollidersoft/mucoll-sim:master
source /opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/mucoll-stack-master-fldyu2usa43rdect3x4xyibuzww5ptwz/setup.sh

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
cmake --build . -j $(nproc) -t install
cd ..

export PATH=$PWD/install/bin:$PATH
export LD_LIBRARY_PATH=$PWD/install/lib:install/lib64:$LD_LIBRARY_PATH
export ROOT_INCLUDE_PATH=$PWD/install/include:$ROOT_INCLUDE_PATH
export PYTHONPATH=$PWD/install/python:$PYTHONPATH
export CMAKE_PREFIX_PATH=$PWD/install:$CMAKE_PREFIX_PATH

export LD_LIBRARY_PATH=/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/acts-32.1.0-yacvft6qr5l5ra67k7ss4mnvfbju6qza/lib:/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/vdt-0.4.4-243z3wcxk4gkd7j2jlxcrdfa27bq46az/lib:/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/xerces-c-3.3.0-7udplkbcss57oci6gkpochnq4yatplgw/lib:/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/gsl-2.8-w5snzrtm5dmifsgmjycqz6ldj5ofuul6/lib:/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/clhep-2.4.7.1-pi7m3sbrmz5ex4audlodc5vaqc2o4zg7/lib/:$LD_LIBRARY_PATH

cd example
k4run reproduceError.py

Additional context

No response

@samf25 samf25 added the bug Something isn't working label Feb 26, 2025
@andresailer
Copy link
Collaborator

Dear @samf25

Thank you for your report, however, your description is a bit vague.

Can you
a) Post the segfault you get (and the log file from your run)
b) post all the commands we would need to run the reproducer so that we could eventually reproduce this without effort?

Step-by-step instructions to reproduce the issue.
If possible, as a self-contained list of instructions starting from a clean shell
git checkout, setup environment (Geant4/ROOT version, LCG/iLCSoft/Key4hep release), cmake, build, run...
Don't forget to attach any required input files

Thanks!

@samf25
Copy link
Author

samf25 commented Feb 27, 2025

I've updated the reproducer part of the post. There were so many seg faults that the first few were cleared as the terminal updated. Here is (from what I can understand) the import part of the stack trace:

#90 0x00007f765ae780c2 in ACTSDuplicateRemoval::~ACTSDuplicateRemoval() () from /home/sferrar2/Mar2Gau/TrackPerfWorkspace/digiInstall/lib/libACTSTrackingPlugins.so
#91 0x00007f7676a16e2d in __run_exit_handlers () from /lib64/libc.so.6
#92 0x00007f7676a16f70 in exit () from /lib64/libc.so.6
#93 0x00007f7676f6a689 in Py_Exit (sts=3) at Python/pylifecycle.c:2944
#94 0x00007f7676f6feaf in handle_system_exit () at Python/pythonrun.c:771
#95 _PyErr_PrintEx (tstate=0x7f7677250738 <_PyRuntime+166328>, set_sys_last_vars=set_sys_last_vars
entry=1) at Python/pythonrun.c:781
#96 0x00007f7676f6fee5 in PyErr_PrintEx (set_sys_last_vars=set_sys_last_vars
entry=1) at Python/pythonrun.c:876
#97 0x00007f7676f6fefa in PyErr_Print () at Python/pythonrun.c:882
#98 0x00007f7676f705b3 in _PyRun_SimpleFileObject (fp=fp
entry=0x65ea10, filename=filename
entry=0x7f767651a330, closeit=closeit
entry=1, flags=flags
entry=0x7fffdbffab28) at Python/pythonrun.c:446
#99 0x00007f7676f708ab in _PyRun_AnyFileObject (fp=0x65ea10, filename=filename
entry=0x7f767651a330, closeit=closeit
entry=1, flags=flags
entry=0x7fffdbffab28) at Python/pythonrun.c:79
#100 0x00007f7676f8fd00 in pymain_run_file_obj (skip_source_first_line=0, filename=0x7f767651a330, program_name=0x7f76764fb730) at Modules/main.c:360
#101 pymain_run_file (config=0x7f7677236780 <_PyRuntime+59904>) at Modules/main.c:379
#102 pymain_run_python (exitcode=0x7fffdbffab24) at Modules/main.c:601
#103 Py_RunMain () at Modules/main.c:680
#104 0x00007f7676f90257 in pymain_main (args=0x7fffdbffac30) at Modules/main.c:710
#105 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:734
#106 0x00007f76769ff5d0 in __libc_start_call_main () from /lib64/libc.so.6
#107 0x00007f76769ff680 in __libc_start_main_impl () from /lib64/libc.so.6
#108 0x0000000000401075 in _start ()
===========================================================


double free or corruption (!prev)

@andresailer
Copy link
Collaborator

andresailer commented Feb 28, 2025

After fixing a typo in the second line in your reproducer and then running the rest of the commands I get

Apptainer> k4run reproduceError.py
Traceback (most recent call last):
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/bin/k4run", line 258, in <module>
    main()
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/bin/k4run", line 184, in main
    load_file(file)
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/python/k4FWCore/utils.py", line 85, in load_file
    exec(code, globals())
  File "<string>", line 52, in <module>
ImportError: cannot import name 'ACTSDuplicateRemoval' from 'Gaudi.Configurables' (unknown location)
Apptainer> 

@tmadlener
Copy link
Contributor

I think one bit of important information missing from Sam is the fact that this comes from a muon collider context, i.e. inside the http://ghcr.io/muoncollidersoft/mucoll-sim:master container.

@andresailer
Copy link
Collaborator

I think one bit of important information missing from Sam is the fact that this comes from a muon collider context, i.e. inside the http://ghcr.io/muoncollidersoft/mucoll-sim:master container.

No, that part is there in the apptainer command. and I just waited 50 minutes while downloading that container.

If there is one thing I am good at, it is copying and pasting commands into the terminal :D

@andresailer
Copy link
Collaborator

Now I added a bunch of $PWD to the environment exports.

And I get

Warning, the events category wasn't found in the input file
Traceback (most recent call last):
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/bin/k4run", line 258, in <module>
    main()
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/bin/k4run", line 237, in main
    ApplicationMgr().fix_properties()
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/k4fwcore-1.2-z6mprl6resh6siyzmlzoplhdulcyplmy/python/k4FWCore/ApplicationMgr.py", line 100, in fix_properties
    frame = podio_reader.get("events")[0]
            ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/opt/spack/opt/spack/linux-almalinux9-x86_64/gcc-11.5.0/podio-1.2-quistl6hwrmp2oaqsktfau6no2irlay2/lib/python3.11/site-packages/podio/frame_iterator.py", line 69, in __getitem__
    raise IndexError
IndexError

@jmcarcell
Copy link
Member

Empty input file?

@tmadlener
Copy link
Contributor

No, that part is there in the apptainer command.

Mea culpa.


The IndexError looks as if there are no events in the file? Which would also be consistent with.

Warning, the events category wasn't found in the input file

Not sure what goes wrong to get to that and not bail out earlier due to a missing file.


The ACTSDuplicateRemoval might be code that is not yet part of the image? @samf25

@andresailer
Copy link
Collaborator

andresailer commented Feb 28, 2025

Empty input file?

No input file at all!(?)

(looked at the wrong python file)

Apptainer> podio-dump output_test.edm4hep.root
input file: output_test.edm4hep.root
            (written with podio version: 1.2.0)

datamodel model definitions stored in this file: 

Frame categories in this file:
Name                      Entries
----------------------  ---------
metadata                        1
configuration_metadata          1
ERROR: Cannot print category 'events' (not present in file)

The ACTSDuplicateRemoval might be code that is not yet part of the image? @samf25

The reproducer was not correctly exporting the location of the newly compiled code.

Image

@andresailer
Copy link
Collaborator

Hi @samf25

Please provide the input file and the log file from your own running of the reproducer.
Please the complete log file not only a snippet.

There were so many seg faults that the first few were cleared as the terminal updated

You can redirect terminal output (stdout and stderr) into a file: https://stackoverflow.com/a/6674348
in bash

k4run reproduceError.py &> output.log

@samf25
Copy link
Author

samf25 commented Feb 28, 2025

Shoot -- sorry! I'm glad you worked out those issues. I tried to process the input file so that it was the most minimal example that would reproduce the error. But clearly that didn't work. The original file is too large to push to github. I'll make another attempt once the server I was using is back up (should be sometime this morning (EST))

@samf25
Copy link
Author

samf25 commented Feb 28, 2025

Here's the log.

output.log

@andresailer
Copy link
Collaborator

File can be obtained from here now: https://cernbox.cern.ch/s/rIYPzWXlbFxuSQQ

@andresailer
Copy link
Collaborator

I have a feeling that the actual issue is this:

Deduper             ERROR DataObjectHandle<AnyDataWrapper<T>>::put : Error in put of DedupedTracks
Deduper             ERROR Maximum number of errors ( 'ErrorMax':1) reached.
k4FWCore__Algs      ERROR Maximum number of errors ( 'ErrorMax':1) reached.
k4FWCore__Seque...  ERROR Maximum number of errors ( 'ErrorMax':1) reached.

And this is independent of whether there is a MarlinWrapped processor or not, just that for whatever reason bad things happen if there is a MarlinWrapper?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants