Fixes labscript-suite#17 by speeding up dataframe transfers.

chrisjbillington · chrisjbillington · commit c12b48e1bf92 · 2015-07-28T19:38:25.000+10:00
Moved call to df.convert_objects to the lyse server, which makes dataframes more
efficiently picklable by replacing columns containing only a single array datatype
with an array of that datatype. Previously df.convert_obects was only called
after the transfer, so the preceding pickling was slow.

For the large, mostly numeric dataframe tested, this speeds
up pickling and unpickling by approx a factor of 5.

Also set convert_numeric=False in the call to df.convert_objects. It was
previously set to True, but I don't think this is what we want. With it False,
all-numeric columns are still converted as desired. With it True, *strings* are
converted to numbers even if only a single string is convertible to a numeric
value, with all non-convertable values in the column becoming NaNs. I don't
think this is what we want. If the user asked for a string, the data should
remain a string. It is presumptuous to assume otherwise and I suspect a
misunderstanding by whoever wrote this line originally.
diff --git a/__init__.py b/__init__.py
@@ -47,7 +47,6 @@ def data(filepath=None, host='localhost', timeout=5):
     else:
         port = 42519
         df = zmq_get(port, host, 'get dataframe', timeout)
-        df = df.convert_objects(convert_numeric=True, convert_dates=False)
         try:
             padding = ('',)*(df.columns.nlevels - 1)
             df.set_index([('sequence',) + padding,('run time',) + padding], inplace=True, drop=False)
diff --git a/__main__.py b/__main__.py
@@ -169,7 +169,16 @@ def handler(self, request_data):
         if request_data == 'hello':
             return 'hello'
         elif request_data == 'get dataframe':
-            return app.filebox.shots_model.dataframe
+            # convert_objects() picks fixed datatypes for columns that are
+            # compatible with fixed datatypes, dramatically speeding up
+            # pickling. But we don't impose fixed datatypes earlier than now
+            # because the user is free to use mixed datatypes in a column, and
+            # we won't want to prevent values of a different type being added
+            # in the future. All kwargs False because we don't want to coerce
+            # strings to numbers or anything - just choose the correct
+            # datatype for columns that are already a single datatype:
+            return app.filebox.shots_model.dataframe.convert_objects(
+                       convert_dates=False, convert_numeric=False, convert_timedeltas=False)
         elif isinstance(request_data, dict):
             if 'filepath' in request_data:
                 h5_filepath = shared_drive.path_to_local(request_data['filepath'])