Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Akram-Mendez · 2025-01-02T15:16:26Z

Dear development team,

I am writing to request your feedback on an issue we encountered when using summariseResult() from PatientProfiles v1.2.0. We noticed that the count and percentage columns yield incorrect results when summarising a drug utilisation cohort (please see 'falls_cohort_substance' and 'summary' code below). The correct counts and percentages are only obtained after collecting the cohort table using dplyr::collect(). We observed the issue from different DPs executing the code also using v1.2.0.

This issue appears to be resolved in version 1.2.3, where the correct results are obtained without needing to apply the dplyr::collect() function. We would appreciate any insights you could provide on why the output in version 1.2.0 worked correctly after collecting the cohort table.

Thank you!
Akram


cli::cli_alert("Getting ingredient/substance information from drug concepts")
# Drug utilisation cohort intersected with drug concepts for 395 ingredients:

falls_cohort_substance <-  cdm$new_user_cohort_falls |> 
  PatientProfiles::addConceptIntersectFlag(
    conceptSet = drug_codes,
    indexDate = "cohort_start_date", 
    censorDate = "cohort_end_date", 
    window = list(
      c(0, 0)      
    )
  ) |>
  PatientProfiles::addCohortName() |> 
  **dplyr::collect()**  # If used,  PatientProfiles::summariseResult() v1.2.0 yields correct results, not required when using latest version of PatientProfiles v1.2.3


cli::cli_alert("Summarising substances -- New users with recurrent falls")
# summary table with count and percentages per drug class and drug ingredients:

summary <- PatientProfiles::summariseResult(falls_cohort_substance ,     
                                            group = list("cohort_name"), 
                                            estimates = c("count", "percentage")) 
                                            
                                            
                                            ```

The text was updated successfully, but these errors were encountered:

catalamarti · 2025-01-07T10:23:18Z

Hi @Akram-Mendez, thanks for reporting this issue, this issue was originally detected with duckdb (1.0.0) and we've seen that it only affected certain DBMSs. It was corrected with release 1.2.1 (although now I've seen it is not specified in the notes explicitly).
Basically if you collect before hand It will always work fine, but without collecting it may have some problems interchanging strata levels in certain DBMSs if you use a version before 1.2.1.
It was fixed adding a compute:

PatientProfiles/R/summariseResult.R

Line 241 in ea69eac

dplyr::compute()

Out of curiosity, which was the DBMS that you used?

Akram-Mendez · 2025-01-08T10:08:59Z

Hi @catalamarti , thank you so much for your response. We use the Snowflake DBMS, although we observed a similar behaviour with other DPs using Microsoft SQL server and Postgres.

Could the issue also be related to the size of the query when using release 1.2.0? When intersecting the drug utilization cohort with about 400 ingredients (as in ‘falls_cohort_substance’ in the example above), we received the following warning:

“Warning messages: 1: Your SQL query is over 20,000 characters which can cause issues on some database platforms! Try calling computeQuery earlier in your pipeline.”

It is good to know that the compute() function was added in 1.2.1 release to the summariseResult function so it collects the data beforehand solving the issue from 1.2.0. Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Akram-Mendez commented Jan 2, 2025

catalamarti commented Jan 7, 2025

Akram-Mendez commented Jan 8, 2025

Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Different summariseResult() results with collected/uncollected CDM cohort table (v1.2.0) #2

Comments

Akram-Mendez commented Jan 2, 2025

catalamarti commented Jan 7, 2025

Akram-Mendez commented Jan 8, 2025