Atlas Donor Matching Algorithm Technical Validation

This README is intended to give an overview of the technical validation framework at a non-technical level.

This validation suite only covers "matching" and "scoring" logic performed by the Atlas matching algorithm - it does not cover match prediction capabilities.

Note for developers: It should remain situated in the Features folder, such that all non technical aspects of the framework are co-located.

Any technical documentation additions should not go in this file - please add them to the overall README, or the code itself

Feature Files

Test cases are written in feature files - i.e. any file with a type of .feature. Files ending in .feature.cs are automatically generated by the test runner, and can be ignored.

Each feature file should contain tests pertaining to one feature of the algorithm. They are split primarily by search type, and within each search type by shared areas of functionality e.g. different typing resolutions, etc.

The language used in the feature files is called Cucumber. (https://cucumber.io/)

The Structure of a feature file

Feature

All feature files will start with a block in this format:

Feature: Name of the feature
    As a user
    I want the feature to do what I want

Name

The first line of the file must start with Feature:
Following this is the name of the feature itself. It can be named anything, though aim to keep this quite short
The name selected will show up in test reports

Description

The description should be phrased in the following format:
(1) As a <role>
(2) I want to <action>
(3) So that <reason>

e.g.

As a member of the search team
I want to be able to run a 10/10 search
So that I can match donors

Scenarios

The rest of a feature file is made up of scenarios. Each scenario represents a single test case.

They are written in the following format:

Scenario: Name of the scenario
    Given some setup
    And some additional setup
    And some further additional setup
    When I do a thing
    Then the desired outcome should occur

Name

The first line of the scenario must start with Scenario:
Following this is the name of the scenario
The name will show up in test reports
All names must be unique within a scenario

Steps

IMPORTANT : Unlike the names of features/scenarios, the phrasing using in steps is important. All steps are mapped to code written by developers - if a mapping has not been written, the test will not run. If you think the phrasing should be tweaked, or would like to add new steps that do not exist, talk to a developer.

In general, tweaking phrasing will be a trivial code change, but changing meaning / adding more steps will require some development effort.

Steps will start with one of four keywords, in the following order:

(1) Given
- This is the first step of any scenario
- Given describes some set up necessary before we run a search - this will include all information needed to create the search criteria
- It will generally describe whether we're dealing with one or more patients
(2) And
- This is functionally the same as Given, used to make the tests read more fluently
- A secenario will generally have multiple And lines
(3) When
- This triggers the functionality under test
- In almost all cases, this will be running a search
(4) Then
- This is where we assert the test has passed.
- These will generally involve asserting that the expected donor(s) have been returned from the search

An overview of our test data

Our primary test data source is a set of alleles taken from TGS typed donors from the SOLAR database. They are separated by allele, and by number of fields: 2-field, 3-field, and 4-field alleles are included, and we can specify how many fields are required if necessary.

Unless otherwise specified, test donors will be created with a genotype consisting of randomly selected (from the aforementioned set of alleles) 2, 3, or 4 field single alleles across all 6 loci.

e.g.

A_1	A_2	B_1	B_2	C_1	C_2	DPB1_1	DPB1_2	DQB1_1	DQB1_2	DRB1_1	DRB1_2
*02:17:02	*03:01:01:01	*15:25:01	*39:06:02:01	*05:53	*02:10:01:01	*01:01:01:04	*01:01	*03:05:01	*03:01:01:01	*11:01:08	*15:01:01:01

If lower typing resolutions are specified, this TGS typed dataset will be 'dumbed down' to get the required typing resolution.

The typing resolutions available are:

TGS derived data
- As above
TGS derived data at four-field resolution
- As above, with all alleles required to be 4-field
TGS derived data at three-field resolution
- As above, with all alleles required to be 3-field
TGS derived data at two-field resolution
- As above, with all alleles required to be 2-field
Three field truncated
- A four field allele will be truncated to give a 3-field one
Two field truncated
- A three or four field allele will be truncated to give a 2-field one
XX code
- The XX code corresponding to the TGS derived alleles' first field
NMDP code
- An NMDP corresponding to the TGS derived allele
- A single NMDP code has been selected for each TGS derived allele, from the SOLAR database
Serology
- The serology value corresponding to the selected TGS derived allele
- (No serology data has been found for DPB1 alleles)
Untyped
- No data will be set for the corresponding locus

Multiple typing resolutions may be specified for each genotype

Exceptions

In some specific cases, the test data described above is not sufficient, so specific test data has been curated for those cases.

e.g. Matches at a p-group, but not g-group, level.

In these cases the test data will be selected from the relevent (smaller) test dataset.

Generating Test Data

When adding new tests, relevant test data for that scenario may not yet exist. If this is a case, a developer will need to add the appropriate test donors.

Specific test data scenarios

In some cases, we may want to test some specific hla values, rather than allowing the system to choose values for us based on criteria. For such cases, step definitions have been created to allow for specifying hla at a scenario level

e.g.

And the matching donor has the following HLA:
       |A_1    |A_2    |B_1    |B_2    |DRB1_1 |DRB1_2 |
       |*02:09 |*01:01 |*15:01 |*15:11 |*15:03 |*03:01 | 
And the patient has the following HLA:
       |A_1          |A_2    |B_1    |B_2    |DRB1_1 |DRB1_2 |
       |*02:09:01:01 |*01:01 |*15:01 |*15:11 |*15:03 |*03:01 |

Default Values

To keep the test cases as short and readable as possible, certain values of the test data have some default values, that will be used if nothing else specified.

These defaults include:

Donor Type = Adult
- Both the expected matching donor, and search type default to Adult
Typing Resolution = TGS derived data
- For both donor and patient HLA values
- The number of fields of the TGS derived data will be arbitrarily chosen by default
Match Level = Allele
- By default, exact allele matches will be selected when a match is required.
Match Count = 10/10 (or equivalent best possible match)
- If a match is required, the expected donor will have no mismatches by default

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_MatchingValidationTests.md

README_MatchingValidationTests.md

Atlas Donor Matching Algorithm Technical Validation

Feature Files

The Structure of a feature file

Feature

Scenarios

An overview of our test data

Default Values

Files

README_MatchingValidationTests.md

Latest commit

History

README_MatchingValidationTests.md

File metadata and controls

Atlas Donor Matching Algorithm Technical Validation

Feature Files

The Structure of a feature file

Feature

Scenarios

An overview of our test data

Default Values