Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly limit the collections that are read #290

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

tmadlener
Copy link
Contributor

@tmadlener tmadlener commented Feb 19, 2025

BEGINRELEASENOTES

  • Make the PodioInput.collections, IOSvc.CollectionNames and Reader.InputCollections properties actually work like expected. They now properly limit the collections that are read and no other collections will be available. This requires building against podio > v1.2`, otherwiset the current behavior will be used, where collections that are not requested will still be available.

ENDRELEASENOTES

Fixes #150

Especially for the IOSvc + Reader side of things there are quite a few handles to change which collections should be read and they have to be kept somewhat consistent. I think we should try to find a way to harmonize that a bit, but I am not entirely sure which way is the one that should be kept. @jmcarcell any preference there, resp. do you see more ways this could be made more robust?

@tmadlener tmadlener force-pushed the limit-coll-reading branch 2 times, most recently from 3ab05db to c01ecd5 Compare February 19, 2025 14:19
@jmcarcell
Copy link
Member

Especially for the IOSvc + Reader side of things there are quite a few handles to change which collections should be read and they have to be kept somewhat consistent. I think we should try to find a way to harmonize that a bit, but I am not entirely sure which way is the one that should be kept. @jmcarcell any preference there, resp. do you see more ways this could be made more robust?

I think once the setReadingCollections is either removed (it has never been used) it simplifies since the reader gets the collections through Python and IOSvc gets its through its property, both values being obtained from the IOSvc's property.

@tmadlener
Copy link
Contributor Author

I have refactored the fix_properties slightly. Mainly I have lifted the setup concerning the reader into a _setup_reader method that effectively does the same thing as before. However, now there are a few more checks whether import podio is really necessary. That will only happen now if EvtMax == -1 or CollectionNames is empty.

The on thing that can become inconsistent now is when a user instantiates their own Reader in python and sets a different InputCollections on that than they set on CollectionNames for the IOSvc. I am not sure how likely that scenario is though (we don't even have an example with it AFAICT).

@tmadlener
Copy link
Contributor Author

The problems in the writer should also be fixed now. They have become necessary because the changes to the our python wrapper around the ApplicationMgr have started to fill the InputCollections of the Reader and that in turn means that not all of the collections from the Frame get pushed to the TES. However, the Writer tried to remove all of the Frame collections unconditionally. I have made the early returns simple continues in the loop and demoted the error to warning which makes everything work again. I have also added a bit more debug output to the writer since I needed it in any case to figure out what was going on. I think it could be useful for future debugging via logs, but I can also remove it again.

Comment on lines 252 to 253
warning() << "Failed to retrieve collection " << coll << endmsg;
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are not needed for this PR. I think originally I put them there and this code never runs so there was never the need to change it, but it should never happen. Throwing is probably the intended usage here. This first one should never fail because that would mean there are collections that are read in a Frame that are not in the store so something should have before in the Reader. The one below would happen if the unregisterObject call would fail which is probably only when the object is not in the store, already checked above. Maybe not checking this second one would also be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked again and these are indeed not needed. I am not sure any longer why I thought they were necessary in the first place, but it looks like things are working as expected and we can change this in another PR if we want to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Now I think I really remember why I did this in the first place. If podio doesn't yet support filling a frame with only a subset of all collections, the Writer crashes (see this CI run). Changing the early return to a continue fixes that issue and we would not need to bump the minimal podio version necessary for k4FWCore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok now I see that is necessary. I don't think it is related to writing only a subset of collections, that has been there for a while and was working before this PR. I think the explanation is related to a mismatch in the collections that are returned with getAvailableCollections and the ones that are actually in the store, which is something that couldn't happen before because all the collections were being read. I think the messages should not be warnings then, because imagine for a frame with many collections of which only a single one is read, then there will be a warning for every collection that is not read for every event. Alternatively, if there is a way from obtaining only the collections that are being read from podio (taking into account this doesn't change for different events), then that could solve the issue.

Make the IOSvc only read the collections that are actually requested by
the user, or all of them if not set
Lift into its own function and only import podio if strictly necessary
The IOSvc machinery has enough functionality to emulate selective
reading and only reading a limited set of collections is effectively an
optimization at this point.
Co-authored-by: Juan Miguel Carceller <22276694+jmcarcell@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PodioInput does not limit the collections that are read
2 participants