-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main interface is slow/unresponsive when multi-shot routines are running #17
Comments
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Huh. I was going to say that it's slow because pickling is slow, and it's unresponsive because pickling holds the GIL. But...I just grabbed the dataframe from your running copy of lyse (which took 7 seconds), and pickling and unpickling it locally on my computer is almost instantaneous. And it's only 19MB, and we're connected by gigabit Ethernet...I don't get it. Looks like it should be fast. Profiling is in order. |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Progress! The dataframe in lyse is slow to pickle. After the processing lyse.data() does on it, the resulting dataframe is fast to pickle. The processing that appears to make pickling fast is df.convert_objects(). Before this call, one has a slow pickling object, after this call, one has a fast pickling object. My impression is that convert_objects is replacing big hulking Python object containing column objects with fast, nimble, array datatype columns, where possible. Since most of the data in columns is numeric, it can replace them all with float/int/whatever arrays and this is much faster for pickle than considering each item a Python object of an arbitrary datatype. Committing a fix to krb fork for testing. |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
Merged in monashkrb/lyse (pull request #3) → <<cset 8c50661>> |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Fixes #17 by speeding up dataframe transfers. Moved call to df.convert_objects to the lyse server, which makes dataframes more For the large, mostly numeric dataframe tested, this speeds Also set convert_numeric=False in the call to df.convert_objects. It was → <<cset c12b48e>> |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Merged in monashkrb/lyse (pull request #3) → <<cset 8c50661>> |
Moved call to df.convert_objects to the lyse server, which makes dataframes more efficiently picklable by replacing columns containing only a single array datatype with an array of that datatype. Previously df.convert_obects was only called after the transfer, so the preceding pickling was slow. For the large, mostly numeric dataframe tested, this speeds up pickling and unpickling by approx a factor of 5. Also set convert_numeric=False in the call to df.convert_objects. It was previously set to True, but I don't think this is what we want. With it False, all-numeric columns are still converted as desired. With it True, *strings* are converted to numbers even if only a single string is convertible to a numeric value, with all non-convertable values in the column becoming NaNs. I don't think this is what we want. If the user asked for a string, the data should remain a string. It is presumptuous to assume otherwise and I suspect a misunderstanding by whoever wrote this line originally.
…suite#17) Update Dataframe without Readoperations Approved-by: Chris Billington <chrisjbillington@gmail.com> Approved-by: Shaun Johnstone <shaun.johnstone@monash.edu>
Original report (archived issue) by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).
The main lyse window can be very slow and hard to use while multi-shot analysis scripts are running. I'm not sure if this gets worse when there are more shots in the dataframe.
This is especially bad due to issue #14, where the multi-shot scripts still run when the queue is paused if there is a problem. Sometimes it is easiest to pause the BLACS queue to stop new shots from being sent to lyse, as it is impossible to resolve whatever issue has paused the queue in lyse while the multi-shot scripts are running as each new shot comes in.
The text was updated successfully, but these errors were encountered: