Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main interface is slow/unresponsive when multi-shot routines are running #17

Closed
philipstarkey opened this issue Jul 28, 2015 · 5 comments
Labels
bug Something isn't working major

Comments

@philipstarkey
Copy link
Member

Original report (archived issue) by Shaun Johnstone (Bitbucket: shjohnst, GitHub: shjohnst).


The main lyse window can be very slow and hard to use while multi-shot analysis scripts are running. I'm not sure if this gets worse when there are more shots in the dataframe.

This is especially bad due to issue #14, where the multi-shot scripts still run when the queue is paused if there is a problem. Sometimes it is easiest to pause the BLACS queue to stop new shots from being sent to lyse, as it is impossible to resolve whatever issue has paused the queue in lyse while the multi-shot scripts are running as each new shot comes in.

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Huh. I was going to say that it's slow because pickling is slow, and it's unresponsive because pickling holds the GIL. But...I just grabbed the dataframe from your running copy of lyse (which took 7 seconds), and pickling and unpickling it locally on my computer is almost instantaneous. And it's only 19MB, and we're connected by gigabit Ethernet...I don't get it. Looks like it should be fast. Profiling is in order.

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Progress! The dataframe in lyse is slow to pickle. After the processing lyse.data() does on it, the resulting dataframe is fast to pickle. The processing that appears to make pickling fast is df.convert_objects(). Before this call, one has a slow pickling object, after this call, one has a fast pickling object.

My impression is that convert_objects is replacing big hulking Python object containing column objects with fast, nimble, array datatype columns, where possible. Since most of the data in columns is numeric, it can replace them all with float/int/whatever arrays and this is much faster for pickle than considering each item a Python object of an arbitrary datatype.

Committing a fix to krb fork for testing.

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


  • changed state from "new" to "resolved"

Merged in monashkrb/lyse (pull request #3)

Fixes #16 and #17

→ <<cset 8c50661>>

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Fixes #17 by speeding up dataframe transfers.

Moved call to df.convert_objects to the lyse server, which makes dataframes more
efficiently picklable by replacing columns containing only a single array datatype
with an array of that datatype. Previously df.convert_obects was only called
after the transfer, so the preceding pickling was slow.

For the large, mostly numeric dataframe tested, this speeds
up pickling and unpickling by approx a factor of 5.

Also set convert_numeric=False in the call to df.convert_objects. It was
previously set to True, but I don't think this is what we want. With it False,
all-numeric columns are still converted as desired. With it True, strings are
converted to numbers even if only a single string is convertible to a numeric
value, with all non-convertable values in the column becoming NaNs. I don't
think this is what we want. If the user asked for a string, the data should
remain a string. It is presumptuous to assume otherwise and I suspect a
misunderstanding by whoever wrote this line originally.

→ <<cset c12b48e>>

@philipstarkey
Copy link
Member Author

Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).


Merged in monashkrb/lyse (pull request #3)

Fixes #16 and #17

→ <<cset 8c50661>>

@philipstarkey philipstarkey added major bug Something isn't working labels Apr 5, 2020
Loki27182 pushed a commit to Loki27182/lyse that referenced this issue Oct 9, 2023
Moved call to df.convert_objects to the lyse server, which makes dataframes more
efficiently picklable by replacing columns containing only a single array datatype
with an array of that datatype. Previously df.convert_obects was only called
after the transfer, so the preceding pickling was slow.

For the large, mostly numeric dataframe tested, this speeds
up pickling and unpickling by approx a factor of 5.

Also set convert_numeric=False in the call to df.convert_objects. It was
previously set to True, but I don't think this is what we want. With it False,
all-numeric columns are still converted as desired. With it True, *strings* are
converted to numbers even if only a single string is convertible to a numeric
value, with all non-convertable values in the column becoming NaNs. I don't
think this is what we want. If the user asked for a string, the data should
remain a string. It is presumptuous to assume otherwise and I suspect a
misunderstanding by whoever wrote this line originally.
Loki27182 pushed a commit to Loki27182/lyse that referenced this issue Oct 9, 2023
…suite#17)

Update Dataframe without Readoperations

Approved-by: Chris Billington <chrisjbillington@gmail.com>
Approved-by: Shaun Johnstone <shaun.johnstone@monash.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

1 participant