Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling of labkey.post #36

Open
juyeongkim opened this issue Aug 7, 2019 · 0 comments
Open

Profiling of labkey.post #36

juyeongkim opened this issue Aug 7, 2019 · 0 comments

Comments

@juyeongkim
Copy link

I've noticed that fetching a large table from labkey via Rlabkey is significantly slower than other API client like JavaScript, so I profiled the labkey.selectRows call for a large table (182779 rows) from DataSpace.

profvis::profvis(Rlabkey::labkey.selectRows(
  baseUrl = "https://dataspace.cavd.org",
  folderPath = "/CAVD",
  schemaName = "study",
  queryName = "ICS" # 182779 rows
))

image

As we can see, actual fetching of data via POST only takes fraction of time in labkey.SelectRows call, and the majority of time is spent processing the response (processResponse) and creating a data.frame object (makeDF).

We can break it down in 5 steps:

  1. fetch raw data via POST
  2. parse json (with simplifying to data.frame) to a list to check status
  3. parse text from raw
  4. parse json (without simplifying to data.frame) to a list from text
  5. make a data.frame from list via c++ code

We can see that there are redundancies in this process.

  • We are parsing parsing json twice (step 2 and step 4)
  • We are creating data.frame twice (step 2 and step 5)

Another thing we should note is that jsonlite::fromJSON(simplifyDataFrame=TRUE) is more efficient in creating a data.frame than Rlabkey:::listToMatrix.

Can you please take a look into this and make changes accordingly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant