Skip to content
Dylan Jay edited this page Jun 3, 2021 · 55 revisions

Understanding Thailands Covid Positive Rate

This is the rabbit hole I went down to answer the question, what is thailands positive rate and how much testing is actually happening. The end result is a daily automated scrape of all the various sources of Covid data combined and downloadable for you convenience.

My conclusions are

  • "Total number of laboratory tests" in Situation reports is mislabelled
  • More confirmed cases than positive PCR tests is strange and unexplained.
  • Only PCR testing data is available and cases have been confirmed in the past without PCR tests. It's unclear if this will continue to be the case. There is however an argument that proactive testing shouldn't be included in a positive rate as its not random.
  • Positive rate should tell you if enough testing is happening but if the sick aren't equally likely to get tested then it becomes less of a useful measure.
  • Thailand 2nd wave occurred in part because of groups that were less likely to get tested (Migrant workers)
  • Not all provinces have the same positive rate, esp over time.

"Total number of laboratory tests" in Situation reports is mislabelled

I'd long suspected PUI counts is a good proxy for testing numbers. PUI stands for Person Under Investigation and represents someone MOPH has determined is a high risk of having Covid. There is a formal criteria which is available and is a mix of symptoms and who you have had contact with. Then I started seeing Test numbers appear next to daily PUI numbers so I started by trying find an existing graph or data source for these daily reported numbers. MOPH apis don't report this. Just cases, deaths, hospitalisations and recoveries. Turns out that there is a daily situation report in PDF format with tables a cumulative PUI number among other data. There are Thai versions and English translations (delayed by a few days). I'm parsing both getting numbers back to April 2020.

It turns out the "Tests" number in the situation reports is unlikely to mean "Tests Performed" but rather a measure of the number people tested including those who did not meet PUI criteria. This number is not very useful as it seems to just be the daily PUI number with a couple of dumps additional numbers mid year (possibly everyone privately tested up to that point).

Finding a source of actual PCR test data

After this I discovered OurWorldInData was also graphing Thailand testing data. However at that time it was at least a month out of date.

Our World in Data Testing Graphs

After talking to OurWorldInData about if they could use the PUI numbers instead to get more up to date testing data they said that MOPH had made available an XLSX on their site previously that had actual testing data and this is what they preferred to use.

Note they are using the XLSX data which is turns out is only the public testing data. Private tests would include quarantine hotels and anyone paying for a test for fit to fly or because they don't qualify as a PUI.

A few days later they let me know the XLSX data was available again and updated.

MOPH PCR Test reports

Next I parsed the XLSX with testing data from the MOPH shared folder. It contains a Pos number (presumably Positive test results) and a Total number (presumably total number of tests performed. In also also numbers for data that didn't have an assigned date up until the 3rd April. In the tests graph above I've included this data by distributing it relative to existing data before the 3rd of April.

Next I discovered there was additional testing information is a series of powerpoints (in both PPTX and PDF formats). This broke the tests down by health area and by private vs public. It also includes information of which hospitals performed the tests (not yet parsed). Due to missing files I ended up having to parse all these different formats.

Understanding the testing data.

The PUI's follow the testing numbers reasonably well (except for April and Jan which we will discuss later), but there are a lot more tests performed than PUI. Even during the period between the waves, PUI's numbers dropped but tests remained at about 8000 a day. In the graph above I've included the total tests (public+private) and just the private for comparison. The situation report "tests" number (blue) is mostly hidden as its the same as the PUI each day except for a couple of "catch-up" periods?

How many confirmed cases and positive tests?

You'd think this number would be the easiest to get and understand but there seems to be some big differences between positive tests and confirmed cases.

From the situation reports you can also get a breakdown of cases that helps us understand what is going on

April: Why so many more positive results than confirmed cases?

In the first wave (April) there seemed to be a lot more positive test results than confirmed cases. Even the number of private positive results was greater than confirmed cases. A single case could be tested multiple times so you would expect positive results to be larger than confirmed cases, however in April it was up 8 times higher (if including public and private results).

More cases than positive results?

During the second wave positive results and cases were closer up until early February. At this time there was a government initiative to do large amounts of proactive testing in factories with migrant workers. This resulted in a big jump in confirmed cases however it didn't seem to result in a similar jump in positive test results. For some reason these tests seem to be excluded from the testing data. At the same time there isn't a large jump in numbers of tests either.

From the test data by area we can see which areas have more cases than public positive test results. A positive value means more cases than tests. NOTE the district testing data averaged over a week so comparing it on a day basis like this might not be accurate.

There are few possibilities

  • Maybe some Anit-body tests were used without PCR test confirmation afterwards

    • Reports seem to indicate some cases are "historical" which could mean antibody tests were used which would be unusual.
    • The MOPH testing data seems to only be for PCR tests
    • Bangkok Post article on 26th Dec 2020 where the governor of Samut Sakhon refers to using blood tests.

      Samut Sakhon governor Veerasak Vijitsaengsri said on Friday that he has ordered the testing method to be changed to ensure quick results and cover all at-risk groups of Thais and migrants in the province. The current method of inserting a swab into the nose to get a fluid sample will be replaced by taking blood samples from people, the governor said, adding that results from the new method come back in 30 minutes and costs are cheaper.

    • WHO report 26 Jan 2020

      Rapid tests that require a blood sample and that can provide results within 30 minutes will be used in high risk groups (both Thai and migrants) in Samut Sakhon to supplement the nasopharyngeal swab

    • WHO report on 31 Jan 2020 seems to indicate antibody tests were used to confirm cases in migrant workers

      The high daily case count of around 700 new cases per day for the last 6 days reflects the surveillance strategy of targeted active surveillance amongst the workers in the 4,000 plus factories located in Samut Sakhon. These large counts reflect the testing methods (see Sitrep 123 for an explanation) and represent both current and historical infections, and therefore the large daily case counts are not necessarily all new infections.

  • Maybe Antigen tests were used without PCR test confirmation

    • No evidence rapid tests were used to confirm cases other than they are available and used by provinces for allowing travel in some circumstances.
  • Maybe Combinational Group Testing was sometimes used

  • Maybe Simple Group Testing was used

    • This seems unlikely since a positive group result means further followup testing of each patient in the group.
  • Maybe Proactive PCR tests are excluded from Test reports completely

    • Since positive rate is reported in these reports and proactive testing results in a "false" positive rate than a more random walk-in approach perhaps the decision was made to exclude some test results? OWID for example try to exclude proactive testing from positive rate calculation.

      "we don't think it is necessary to include tests or cases from proactive test finding in our calculation of the Positive Rate. This is because we use the Positive Rate as an estimator for testing capacity within a country, not the fraction of the population that is infected. Including tests from proactive case finding breaks the condition of random sampling, as they would obviously have a much higher probability of being positive, and so would lead to a biased estimator."

    • Both tests and positives don't seem to rise during times of proactive testing.

At the moment my conclusion is that most likely antibody tests since there is some evidence this in Samut Sakhon but there is little evidence for or against any of these possibilities.

This means a few things

  • if antibody tests were used this would be unusual to confirm a case without an additional PCR test
  • if antibody tests were used the confirmed cases are not correct. Historical cases could have occurred months earlier
  • if antigen tests were used and not verified using PCR then this could have been done to save money. The positive rate is still inaccurate.
  • There is an argument that proactive testing should be excluded from a positive rate for example, as a positive rate is meant to represent a random sampling to show what the likelihood of finding more if you tested more. Proactive testing isn't random. It's generally done when you know there is a cluster and are expecting to find lots of cases in a specific location.
  • Our world in data exclude proactive testing for this reason

Is enough testing being done (Positive Rate)?

One way to work out if enough testing is being done is to measure positive rate, or the share of positive results of tests being performed. Since we also have an idea of the share of positive people compared to the people who were tested (at least for free/public) we can also compare this rate. This should answer the question "if we test more will we find more", because if we are currently testing and only finding 1 in 100 positive then testing more might not that much more.

WHO recommends a positivity rate of under 3% saying this is a sign the country is doing enough testing.

Since we aren't sure on the confirmed cases in April or the positive test data in Feb it makes it hard to know which positivity metric is more correct. However

  • The April rate is similar if you use confirmed cases/PUI or positive results/tests (what you are supposed to use?). It shows not enough testing was being done. This is not surprising given the test capacity was in the process of being ramped up like in most other countries.
  • The mid year positive rate is good. Testing was happening despite no cases. Even if you take out the private test data (which might include more ASQ tests?).
  • Mid dec we see a worse positive rate as the SS cluster emerged. But a lot worse if looking at confirmed cases/PUI - so possibly antibody tests were being used here too? Feb saw an even larger difference due to the use of antibody tests. What this means is you can't really rely on the positive rate from dec to show whats going on.
  • Positive rate doesn't tell the whole story. It assumes people are equally likely to be tested or that the most at risk are likely to be tested. Is testing equally spread out across the country? Migrant workers had perhaps had disincentives to not get tested (lack of insurance, illegal immigration status, not much money or time to go to the doctor, fear of losing income by being quarantined etc). There could be other groups who also have a disincentive.

Note OurWorldInData seems to be using both the public and private testing data to determine their positive rate for thailand

Is it suspicious cases aren't going down/up?

TLDR; No. This happens in many countries when the interventions are not enough to bring the cases down but are enough to stop increasing.

r0 is many much each person spreads the virus too if no one changes their behaviour. Interventions reduce that so the r-number becomes r(eff) r effective. That can be 1 or (close to it) and means on average 1 person will on average infect 1 other person leading to stable case numbers. Of course this virus is very clustered so 1 person might give it to 20 and 19 more give it to no one and we would still see flat daily cases when averaged out.

Also, as of June 2021, Walk-ins are going down, Proactive testing is going up and overall cases are going up. Nothing is staying very flat. What can seem like the same numbers at the small scale can hide a large trend viewed over time.

Cases by where tested

Is there a testing capacity problem?

TLDR; Maybe but it's hard to tell

If there was a testing capacity problem you would expect to see

  • Positive rate increase
    • Assumes those more at risk are more likely to be prioritised
    • it did increase but stayed below 4% which still seems a reasonable number
  • deaths not matching cases
    • TODO
  • greater proportion of symptomatic cases than asympotamatic than other times
    • TODO
  • There are 314 labs (and rising).
    • but we don't know estimated capacity

Are there a lot more infections than confirmed cases?

TLDR; yes but maybe not as many as people think.

Every country with covid cases has infections greater than confirmed cases, but the amount varies and can only be estimated.

A low positive rate is a good indication that infections is not that much greater than cases. It does assume that the way in which people gets tested random. ie two people with an equal chance of being sick will each have the same opportunity and inclination to get tested. If high risk people are skipped you can have undetected clusters.

A very simple model can be used to get a lower estimate by taking global research on the viruses infection fatality rates for different ages and applying it to difference province demographics in thailand and then applying it to the known covid deaths shows estimated infections with a similar pattern to the confirmed cases.

Simple Infections estimate

However the following would all push up the infections estimate

  • many untested covid deaths
  • the elderly in thailand being less exposed than the global average.
  • lower prevalence of diseases that increase the risk of dying
  • better health care and quicker detection than global average
  • less deadly virus variants being dominant
  • Bangkok's real mean age being younger than the 2019 census data

Other more in depth models estimate infections to be higher

  • IHME Model estimates around 6x.
    • uses estimated total deaths
  • ICL Model estimates around 4x.
    • uses reported deaths It's unclear how much, if at all the specifics of the way thailand does its testing and reports its numbers is taken into account. It's not clear which extra factors result in the much higher estimates than the simpler IFR model above.

Testing wastewater is another way to estimate infections.

  • TODO: use case ages to refine model (assumes case ages more closely represents infected ages than population).
  • TODO: use extrapolated IFR model rather than steps. so far this almost doubles the IFR but not sure its correct yet. Depends a lot on how much the rate increases over 85. Some estimates its doubling every 8 years.

Why is the median age so low?

Estimates of IFR applied to Thailands population result in a predicted median age of around 80. The median age during the 3rd wave seems to 65-70. Why?

Age of covid related deaths

Not really sure why this is yet.

  • TODO: Use the age IFR to estimate deaths from cases and compare to actual deaths. Should be able to see how factory and prison clusters might reduce the deaths. Would also result in a different estimate of median age as it will be based on cases ages, not the population.

There is some interesting info in situation-no515-010664.pdf showing CFR for different ages groups across different periods. The ratio for elderly was lower in wave 2 than wave 1 or 3 which perhaps reflects the worker related cases found. The time to treatment was also shorter. Given the small sample sizes and potential issues with testing it probably just shows that the latest wave as more useful data.

Are covid deaths undercounted?

TLDR; yes but maybe not by much

Is there a reluctance to get tested?

TODO

Is testing concentrated only in some areas?

During the second wave there was a worry from some that testing was only happening in SS and known clusters. I discovered a source for that data so I've aggregated this over time.

This data is taken from the weekly summary of testing across the various Thailand health districts. The data is aggregated in date ranges so for this graph I've averaged the value across those dates. There is also one period of missing data in Nov. The data seems to match against the public testing data totals so it seems likely private tests are not included. For the labelling of the Thailand health districts I am unsure on District 13 found in the data as I couldn't find a definition for it. I've assumed its Bangkok but this could be wrong.

Where are the cases/positive results

This graph comes from the public MOPH testing data. It's not clear if this is where the people who tested positive live or just where the labs where the testing took place. It's possible some tests were sent to labs in different areas to be processed. The high number of positives in Bangkok during jan/feb suggest that some tests might have related to cases in other provinces?

As previously noted this seems to be missing positive results and tests from Feb due to the not included antibody data so this time period is inaccurate.

Is enough testing being done in each part of thailand?

Positive rate is calculated as pos/tests for each area and then scaled according to the total positive rate for that time period.

As previously noted this seems to be missing positive results and tests from Feb due to the not included antibody data so this time period is inaccurate.

Clone this wiki locally