Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Open
Akram-Mendez opened this issue Jan 2, 2025 · 2 comments

Comments

@Akram-Mendez
Copy link

Dear development team,

I am writing to request your feedback on an issue we encountered when using summariseResult() from PatientProfiles v1.2.0. We noticed that the count and percentage columns yield incorrect results when summarising a drug utilisation cohort (please see 'falls_cohort_substance' and 'summary' code below). The correct counts and percentages are only obtained after collecting the cohort table using dplyr::collect(). We observed the issue from different DPs executing the code also using v1.2.0.

This issue appears to be resolved in version 1.2.3, where the correct results are obtained without needing to apply the dplyr::collect() function. We would appreciate any insights you could provide on why the output in version 1.2.0 worked correctly after collecting the cohort table.

Thank you!
Akram


cli::cli_alert("Getting ingredient/substance information from drug concepts")
# Drug utilisation cohort intersected with drug concepts for 395 ingredients:

falls_cohort_substance <-  cdm$new_user_cohort_falls |> 
  PatientProfiles::addConceptIntersectFlag(
    conceptSet = drug_codes,
    indexDate = "cohort_start_date", 
    censorDate = "cohort_end_date", 
    window = list(
      c(0, 0)      
    )
  ) |>
  PatientProfiles::addCohortName() |> 
  **dplyr::collect()**  # If used,  PatientProfiles::summariseResult() v1.2.0 yields correct results, not required when using latest version of PatientProfiles v1.2.3


cli::cli_alert("Summarising substances -- New users with recurrent falls")
# summary table with count and percentages per drug class and drug ingredients:

summary <- PatientProfiles::summariseResult(falls_cohort_substance ,     
                                            group = list("cohort_name"), 
                                            estimates = c("count", "percentage")) 
                                            
                                            
                                            ```
@catalamarti
Copy link
Collaborator

Hi @Akram-Mendez, thanks for reporting this issue, this issue was originally detected with duckdb (1.0.0) and we've seen that it only affected certain DBMSs. It was corrected with release 1.2.1 (although now I've seen it is not specified in the notes explicitly).
Basically if you collect before hand It will always work fine, but without collecting it may have some problems interchanging strata levels in certain DBMSs if you use a version before 1.2.1.
It was fixed adding a compute:

dplyr::compute()

Out of curiosity, which was the DBMS that you used?

@Akram-Mendez
Copy link
Author

Hi @catalamarti , thank you so much for your response. We use the Snowflake DBMS, although we observed a similar behaviour with other DPs using Microsoft SQL server and Postgres.

Could the issue also be related to the size of the query when using release 1.2.0? When intersecting the drug utilization cohort with about 400 ingredients (as in ‘falls_cohort_substance’ in the example above), we received the following warning:

“Warning messages: 1: Your SQL query is over 20,000 characters which can cause issues on some database platforms! Try calling computeQuery earlier in your pipeline.”

It is good to know that the compute() function was added in 1.2.1 release to the summariseResult function so it collects the data beforehand solving the issue from 1.2.0. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants