Test using the sample data from the Leicester OMOP extract #42

razekmh · 2024-11-11T12:25:28Z

I want to:

Download the files to the DSD
Download the concept files from the UCLH SAFEHR repo to the DSD
Open in them a version of R
Match the concepts in the drug_exposure table to the concepts table
Match the concepts to the ATC codes used by ramses in computing the DDD

The text was updated successfully, but these errors were encountered:

razekmh · 2024-11-11T14:11:56Z

Download the files to the DSD

This works fine. I needed to login and find the data on OneDrive.

razekmh · 2024-11-11T14:15:35Z

Download the concept files from the UCLH SAFEHR repo to the DSD

This is a little bit more complicated. I had to download the individual files as mentioned in #41

razekmh · 2024-11-11T14:17:51Z

Open in them a version of R

Parquet files can be parsed using arrow apache package. You will need to install it and load it and then run read_parquet

razekmh · 2024-11-12T12:29:57Z

I read the concept and drug_exposure tables

drug_exposure_df <- read_parquet("data/OMOP/DRUG_EXPOSURE.parquet")
concept_df <- read_parquet("data/OMOP/concept.parquet")

and then merge the two tables on the drug id to find which exposures from the drug_exposure table have a match in the concept table.

drug_names_exposure_df <- merge(x=drug_exposure_df, y=concept_df, by.x = "drug_concept_id", by.y = "concept_id", all.x = T)

razekmh · 2024-11-12T12:36:37Z

I wanted to count the number of matches per drug, so first I count the hits per concept name in the new merged table

drug_names_count <- drug_names_exposure_df |> count(concept_name)

and then sort and display the top three

 head(drug_names_count[order(drug_names_count$n, decreasing = TRUE), ],3 )

I got

                                           concept_name      n
712                                 No matching concept 323452
264 1000 ML Sodium Chloride 9 MG/ML Injectable Solution  82653
232   100 ML Acetaminophen 10 MG/ML Injectable Solution  27254

so we have 323,452 out of 1,190,373 with drugs that are not listed in the concept list. This is clearly a problem but we will try to address it later. Let's press forward.

Now I want to filter our the No matching concept so, we create a new table.

drug_expousre_matching <- drug_names_exposure_df[drug_names_exposure_df$concept_name != "No matching concept",]

razekmh · 2024-11-12T20:43:45Z

Now let's take a look at the match between the drug_strength table and the new drug_expousre_matching table.

First we read in the drug_strength table

drug_strength_df <- read_parquet("data/OMOP/drug_strength.parquet")

Then we merge the two tables together

drug_strength_names_exposure_df <- merge(x=drug_expousre_matching, y=drug_strength_df , by.x = "drug_concept_id", by.y = "drug_concept_id", all.x = T)

razekmh · 2024-11-12T21:23:33Z

Checking the resulting table I can see that the drug strengths are represented in two ways amount and numerator/ denominator . The following table describes the two sets of columns.

Field	Required	Type	Description
amount_value	No	float	The numeric value associated with the amount of active ingredient contained within the product.
amount_unit_concept_id	No	integer	A foreign key to the Concept in the CONCEPT table representing the identifier for the Unit for the absolute amount of active ingredient.
numerator_value	No	float	The numeric value associated with the concentration of the active ingredient contained in the product
numerator_unit_concept_id	No	integer	A foreign key to the Concept in the CONCEPT table representing the identifier for the numerator Unit for the concentration of active ingredient.
denominator_value (V5.0.1)	No	float	The amount of total liquid (or other divisible product, such as ointment, gel, spray, etc.).
denominator_unit_concept_id	No	integer	A foreign key to the Concept in the CONCEPT table representing the identifier for the denominator Unit for the concentration of active ingredient.

Some of the rows have none the fields and will be dropped for now

razekmh · 2024-11-12T21:29:46Z

Remember that our main target; compute_DDDs requires:

ATC_code: a character vector of ATC codes (can be easily obtained from the ab_atc() function)
ATC_administration: a character vector indicating the ATC route of administration (see Details)
dose: a numeric vector indicating the total dose for the drug serving as the ATC DDD basis of strength. For prescriptions, provide the total dose to be given in one day, rather than per administration.
unit: a character vector coercible to a units object with as_units().

We will need to convert the strength on each exposure to the dose in units accepted by as_units(). According to the validate article daily_dose = ( strength * daily_frequency )

razekmh · 2024-11-12T23:45:29Z

Dose and Unit seem to be possible to extract from either amount and numerator/ denominator combination.
For a test purpose, I am installing the units library and then setting the value of the active ingredient and then converting it to the unit

x = 22.5/2.5
units(x) <- as_units("mg/mL")

Now testing the value of x

> x
9 [mg/mL]

Next, I would like to try an example of compute_DDDs with a random row from the dataset. However extracting the ATC code is a bit complicated

razekmh · 2024-11-12T23:49:22Z

First step of finding the ATC codes is to examine the drug standard used in the dataset.

unique(drug_expousre_matching$vocabulary_id)
[1] "RxNorm Extension" "RxNorm"

Matching the RxNorm to ATC is the next task, then

razekmh · 2024-12-05T16:13:22Z

I am back on getting the ATC code out of the OMOP data that we have. The standards that we have are RxNorm Extension and RxNorm. I did some research into this conversion RxNorm -> ATC and found that this seems to be a problem that has not been fully resolved. I found a short tutorial which was useful in understanding the concept_ancestor table, and how it might be useful in solving the mapping.

razekmh · 2024-12-05T16:23:20Z

Let's take a random drug concept

  concept_id concept_name                                 domain_id vocabulary_id    concept_class_id    standard_concept concept_code valid_start_date valid_end_date invalid_reason
       <int> <chr>                                        <chr>     <chr>            <chr>               <chr>            <chr>        <date>           <date>         <chr>         
1   36809522 2.5 ML Albuterol 1 MG/ML Inhalation Solution Drug      RxNorm Extension Quant Clinical Drug S                OMOP4823558  2019-01-28       2099-12-31     NA

If we filter the concept_ancestor table by descendant_concept_id we will find that there is a list of concepts which are "ancestors" to the drug we selected.

> filter(concept_ancestor_df, concept_ancestor_df$descendant_concept_id == 36809522)
# A tibble: 12 × 4
   ancestor_concept_id descendant_concept_id min_levels_of_separation max_levels_of_separation
                 <int>                 <int>                    <int>                    <int>
 1            36812024              36809522                        0                        0
 2            21603248              36809522                        5                        6
 3            21603255              36809522                        3                        4
 4            21605007              36809522                        6                        7
 5             1154343              36809522                        2                        3
 6             1356108              36809522                        1                        1
 7            36217207              36809522                        3                        3
 8            19093833              36809522                        1                        1
 9            21603249              36809522                        4                        5
10            36215522              36809522                        2                        2
11            36809522              36809522                        0                        0
12            21603256              36809522                        2                        3

razekmh · 2024-12-05T16:33:57Z

Now let's get the info of each of these concepts ancestors from the concept table. We can see that ATC is an ancestor to RxNorm

> filter(concept_df, concept_df$concept_id %in% c(filter(concept_ancestor_df, concept_ancestor_df$descendant_concept_id == 36809522)$ancestor_concept_id) )
# A tibble: 12 × 10
   concept_id concept_name                                 domain_id vocabulary_id    concept_class_id    standard_concept concept_code valid_start_date valid_end_date invalid_reason
        <int> <chr>                                        <chr>     <chr>            <chr>               <chr>            <chr>        <date>           <date>         <chr>         
 1   21605007 RESPIRATORY SYSTEM                           Drug      ATC              ATC 1st             C                R            1970-01-01       2099-12-31     NA            
 2   21603248 DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES        Drug      ATC              ATC 2nd             C                R03          1970-01-01       2099-12-31     NA            
 3   21603249 ADRENERGICS, INHALANTS                       Drug      ATC              ATC 3rd             C                R03A         1970-01-01       2099-12-31     NA            
 4   21603255 Selective beta-2-adrenoreceptor agonists     Drug      ATC              ATC 4th             C                R03AC        1970-01-01       2099-12-31     NA            
 5   21603256 salbutamol; inhalant                         Drug      ATC              ATC 5th             C                R03AC02      1970-01-01       2099-12-31     NA            
 6   36217207 Inhalant Product                             Drug      RxNorm           Dose Form Group     C                1151123      2016-08-01       2099-12-31     NA            
 7   36215522 albuterol Inhalant Product                   Drug      RxNorm           Clinical Dose Group C                1154602      2016-08-01       2099-12-31     NA            
 8    1356108 albuterol Inhalation Solution                Drug      RxNorm           Clinical Drug Form  S                2108226      2019-02-04       2099-12-31     NA            
 9   19093833 albuterol 1 MG/ML                            Drug      RxNorm           Clinical Drug Comp  S                340169       1970-01-01       2099-12-31     NA            
10    1154343 albuterol                                    Drug      RxNorm           Ingredient          S                435          1970-01-01       2099-12-31     NA            
11   36809522 2.5 ML Albuterol 1 MG/ML Inhalation Solution Drug      RxNorm Extension Quant Clinical Drug S                OMOP4823558  2019-01-28       2099-12-31     NA            
12   36812024 Albuterol 1 MG/ML Inhalation Solution        Drug      RxNorm Extension Clinical Drug       S                OMOP4826058  2019-01-28       2099-12-31     NA

We could get the ATC 5th ancestor of our RxNorm code using

> filter(concept_df, concept_df$concept_id %in% c(filter(concept_ancestor_df, concept_ancestor_df$descendant_concept_id == 36809522)$ancestor_concept_id), concept_df$concept_class_id == 'ATC 5th')
# A tibble: 1 × 10
  concept_id concept_name         domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
       <int> <chr>                <chr>     <chr>         <chr>            <chr>            <chr>        <date>           <date>         <chr>         
1   21603256 salbutamol; inhalant Drug      ATC           ATC 5th          C                R03AC02      1970-01-01       2099-12-31     NA

razekmh · 2024-12-05T16:38:11Z

Now this means we could do:

a function that finds the ATC code for each instance of RxNorm and go through the drug_exposure table per row.
merge the tables and filter to get the same result as the first solution.
build a mapping table of RxNorm -> ATC 5th

each solution has advantages and disadvantages. For now I will try to build the mapping.

razekmh · 2024-12-05T18:00:04Z

First I will collect all the unique drug_concept_id from the exposure table. The idea is not to spend time on mapping concept ids which are not used within our dataset. The clear downside is that we will need to check any new data for possible new concept ids which are not mapped. Quite similar to building an index in fact.

rxnorm_unique_drug_ids <- map(unique(drug_exposure_df$drug_concept_id), ~ tibble(rxnorm_id = .x)) %>%   bind_rows()

This result in a dataframe of one column with a length of 1,742. Next I will focus on the ATC 5th concepts and filter these out of the concept table.

atc_5th_concepts <- filter(concept_df, concept_df$concept_class_id == 'ATC 5th')

This gives us 5,452 concepts. Now, I will use these ATC 5th concepts to filter the concept_ancestors table to get the relationships which ATC 5th is an ancestor in.

atc_5th_as_ancestors <- filter(concept_ancestor_df, concept_ancestor_df$ancestor_concept_id %in% c(atc_5th_concepts$concept_id), concept_ancestor_df$max_levels_of_separation > 0)

This result in 2,101,373 relationships. Clearly there are a lot of branching. Now let's match the rxnorm concepts with the atc 5th concepts

rxnorm_to_atc_concept_code <- merge(x=rxnorm_unique_drug_ids, y=atc_5th_as_ancestors, by.x='rxnorm_id', by.y='descendant_concept_id', all.x = TRUE) %>% select(rxnorm_id, ancestor_concept_id)

This gives us 2,033 rows. This is coming from 1,742. Strange! Further if we check the most common merges, we find that NA is the most common with 90 hits

> count(rxnorm_to_atc_concept_code, ancestor_concept_id) %>% arrange(desc(n)) %>% head(5)
  ancestor_concept_id  n
1                  NA 90

let's keep going forward to the last step and then check these results after. I will now replace the concept id of the ATC 5th with the concept code from the ATC standard which could be used in the DDD function in Ramses

rxnorm_concept_id_to_atc_code <- merge(x=rxnorm_to_atc_concept_code , y=concept_df, by.x ='ancestor_concept_id', by.y='concept_id', all.x = TRUE) %>% select(rxnorm_id, concept_code)

We finally arrive to table we wanted

> head(rxnorm_concept_id_to_atc_code)
  rxnorm_id concept_code
1  21111050      S01AA28
2  42683461      J05AR24
3  36065445      B01AX07
4  36065920      J05AR25
5  36121030      L04AC18
6   1361296      L02BB06

razekmh · 2024-12-16T10:30:51Z

one note for the future is that the DDD of a drug could change based on the admission route. For example, morphine

ATC code	Name	DDD	U	Adm.R
N02AA01	morphine	0.1	g	O
		30	mg	P
		30	mg	R

razekmh · 2024-12-16T13:57:58Z

The relationship between RxNorm and ATC goes through the following steps. These steps are reflected in the concept_ancestors table

graph BT;
    A("RxNorm: Clinical Drug")-->B("RxNorm: Ingredient");
    B-->C(ATC 5th);
    C-->D(ATC 4th);
    D-->E(ATC 3th);
    E-->F(ATC 2th);
    F-->G(ATC 1th);

However the connection between RxNorm -> ATC is not perfect. There are three main problems with this link:

1- Missing links:

We ca see this clearly in the pervious step where 90 out of 1,742 of the used RxNorm have n equivalent ATC.

> count(rxnorm_to_atc_concept_code, ancestor_concept_id) %>% arrange(desc(n)) %>% head(5)
  ancestor_concept_id  n
1                  NA 90

2- Multiple ancestors Multiple descendants:

The relationship between RxNorm -> ATC is not one -> one but rather many -> many. This means some ATC 5th codes has many RxNorm descendants and some RxNorm codes has many ATC ancestors. For example:

graph BT;
    A("OMOP809928")-->B("L01BA01")
    A-->C("L04AX03")
    D("OMOP2603855")-->B
    D-->C

This is partially a side effect of the third point.

> filter(concept_df, concept_df$concept_id %in% c(43809444, 41405893))
# A tibble: 2 × 10
  concept_id concept_name                                     domain_id vocabulary_id    concept_class_id    standard_concept concept_code valid_start_date valid_end_date invalid_reason
       <int> <chr>                                            <chr>     <chr>            <chr>               <chr>            <chr>        <date>           <date>         <chr>         
1   41405893 0.3 ML Methotrexate 50 MG/ML Injectable Solution Drug      RxNorm Extension Quant Clinical Drug S                OMOP2603855  2017-08-24       2099-12-31     NA            
2   43809444 50 ML Methotrexate 100 MG/ML Injectable Solution Drug      RxNorm Extension Quant Clinical Drug S                OMOP809928   2017-08-24       2099-12-31     NA

3- Repeated Concepts:

ATC 5th concepts are not unique. Some medications are mentioned multiple times under different classification with different codes. For example: methotrexate is mentioned under
L01BA01 and L04AX03. However only L04AX03 has recorded DDD

razekmh · 2025-01-27T22:09:59Z

Here is a more serious example of the problem

> filter(filtered_drug_lookup, drug_concept_id == "46276153")
# A tibble: 3 × 7
  drug_concept_name       drug_concept_id drug_concept_class_id ATC_level ATC_concept_name                                      ATC_code ATC_concept_id
  <chr>                             <int> <chr>                 <chr>     <chr>                                                 <chr>             <int>
1 sodium chloride 9 MG/ML        46276153 Clinical Drug Comp    5         cefoperazone and beta-lactamase inhibitor; parenteral J01DD62        21602913
2 sodium chloride 9 MG/ML        46276153 Clinical Drug Comp    5         ceftriaxone, combinations; systemic                   J01DD54        21602912
3 sodium chloride 9 MG/ML        46276153 Clinical Drug Comp    5         imipenem and cilastatin; parenteral                   J01DH51        21602925

the code 46276153 of OMOP RxNorm/RxNorm Extension matches three ATC codes. Further two of the ATC codes have the same route code. Meaning they cannot be filtered based on the route code. The third ATC is missing data

ATC code	Name	DDD	U	Adm.R	Note
J01DH51	imipenem and cilastatin	2	g	P	Refers to imipenem
J01DD62	cefoperazone and beta-lactamase inhibitor	4	g	P	Refers to cefoperazone
J01DD54	ceftriaxone, combinations

razekmh · 2025-01-27T23:52:33Z

one more interesting problem is mapping the route of administration form the OMOP used standard to the ATC one.
OMOP data does not use RxNorm for this. I am not sure if there is RxNorm for the drug administration.

I found a route called body cavity which is not really clear. Let's try to understand what type of drugs uses this route. Please excuse the bad r code

> filter(collected_concept, collected_concept$concept_id %in% filter(omde, omde$route_concept_id == 4222254)$drug_concept_id)
# A tibble: 4 × 10
  concept_id concept_name                  domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
       <int> <chr>                         <chr>     <chr>         <chr>            <chr>            <chr>        <date>           <date>         <chr>         
1          0 No matching concept           Metadata  None          Undefined        NA               No matching… 1970-01-01       2099-12-31     NA            
2   21078067 5 ML Tranexamic Acid 100 MG/… Drug      RxNorm Exten… Quant Clinical … S                OMOP307613   2017-08-24       2099-12-31     NA            
3   42479436 1000 ML Sodium Chloride 9 MG… Drug      RxNorm Exten… Quant Clinical … S                OMOP416795   2017-08-24       2099-12-31     NA            
4   42939099 Sodium Chloride 9 MG/ML Irri… Drug      RxNorm Exten… Clinical Drug    S                OMOP4665764  2018-08-21       2099-12-31     NA            
> filter(collected_concept, collected_concept$concept_id %in% filter(omde, omde$route_concept_id == 4222254)$drug_concept_id) |> select(concept_class_id, concept_name)
# A tibble: 4 × 2
  concept_class_id    concept_name                                       
  <chr>               <chr>                                              
1 Undefined           No matching concept                                
2 Quant Clinical Drug 5 ML Tranexamic Acid 100 MG/ML Prefilled Syringe   
3 Quant Clinical Drug 1000 ML Sodium Chloride 9 MG/ML Injectable Solution
4 Clinical Drug       Sodium Chloride 9 MG/ML Irrigation Solution

razekmh added spike Self contained exploratory work ARC labels Nov 11, 2024

razekmh self-assigned this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test using the sample data from the Leicester OMOP extract #42

Test using the sample data from the Leicester OMOP extract #42

razekmh commented Nov 11, 2024 •

edited

Loading

razekmh commented Nov 11, 2024

razekmh commented Nov 11, 2024

razekmh commented Nov 11, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024 •

edited

Loading

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024 •

edited

Loading

razekmh commented Dec 16, 2024

razekmh commented Dec 16, 2024 •

edited

Loading

razekmh commented Jan 27, 2025

razekmh commented Jan 27, 2025

Test using the sample data from the Leicester OMOP extract #42

Test using the sample data from the Leicester OMOP extract #42

Comments

razekmh commented Nov 11, 2024 • edited Loading

razekmh commented Nov 11, 2024

razekmh commented Nov 11, 2024

razekmh commented Nov 11, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024 • edited Loading

razekmh commented Nov 12, 2024

razekmh commented Nov 12, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024

razekmh commented Dec 5, 2024 • edited Loading

razekmh commented Dec 16, 2024

razekmh commented Dec 16, 2024 • edited Loading

1- Missing links:

2- Multiple ancestors Multiple descendants:

3- Repeated Concepts:

razekmh commented Jan 27, 2025

razekmh commented Jan 27, 2025

razekmh commented Nov 11, 2024 •

edited

Loading

razekmh commented Nov 12, 2024 •

edited

Loading

razekmh commented Dec 5, 2024 •

edited

Loading

razekmh commented Dec 16, 2024 •

edited

Loading