Skip to content

Commit c12b48e

Browse files
Fixes labscript-suite#17 by speeding up dataframe transfers.
Moved call to df.convert_objects to the lyse server, which makes dataframes more efficiently picklable by replacing columns containing only a single array datatype with an array of that datatype. Previously df.convert_obects was only called after the transfer, so the preceding pickling was slow. For the large, mostly numeric dataframe tested, this speeds up pickling and unpickling by approx a factor of 5. Also set convert_numeric=False in the call to df.convert_objects. It was previously set to True, but I don't think this is what we want. With it False, all-numeric columns are still converted as desired. With it True, *strings* are converted to numbers even if only a single string is convertible to a numeric value, with all non-convertable values in the column becoming NaNs. I don't think this is what we want. If the user asked for a string, the data should remain a string. It is presumptuous to assume otherwise and I suspect a misunderstanding by whoever wrote this line originally.
1 parent 681fce3 commit c12b48e

File tree

2 files changed

+10
-2
lines changed

2 files changed

+10
-2
lines changed

__init__.py

-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,6 @@ def data(filepath=None, host='localhost', timeout=5):
4747
else:
4848
port = 42519
4949
df = zmq_get(port, host, 'get dataframe', timeout)
50-
df = df.convert_objects(convert_numeric=True, convert_dates=False)
5150
try:
5251
padding = ('',)*(df.columns.nlevels - 1)
5352
df.set_index([('sequence',) + padding,('run time',) + padding], inplace=True, drop=False)

__main__.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,16 @@ def handler(self, request_data):
169169
if request_data == 'hello':
170170
return 'hello'
171171
elif request_data == 'get dataframe':
172-
return app.filebox.shots_model.dataframe
172+
# convert_objects() picks fixed datatypes for columns that are
173+
# compatible with fixed datatypes, dramatically speeding up
174+
# pickling. But we don't impose fixed datatypes earlier than now
175+
# because the user is free to use mixed datatypes in a column, and
176+
# we won't want to prevent values of a different type being added
177+
# in the future. All kwargs False because we don't want to coerce
178+
# strings to numbers or anything - just choose the correct
179+
# datatype for columns that are already a single datatype:
180+
return app.filebox.shots_model.dataframe.convert_objects(
181+
convert_dates=False, convert_numeric=False, convert_timedeltas=False)
173182
elif isinstance(request_data, dict):
174183
if 'filepath' in request_data:
175184
h5_filepath = shared_drive.path_to_local(request_data['filepath'])

0 commit comments

Comments
 (0)