This README is intended to give an overview of the technical validation framework at a non-technical level.
This validation suite only covers "matching" and "scoring" logic performed by the Atlas matching algorithm - it does not cover match prediction capabilities.
Note for developers: It should remain situated in the Features
folder, such that all non technical aspects of the framework are co-located.
Any technical documentation additions should not go in this file - please add them to the overall README, or the code itself
Test cases are written in feature files - i.e. any file with a type of .feature
. Files ending in .feature.cs
are automatically generated by the
test runner, and can be ignored.
Each feature file should contain tests pertaining to one feature of the algorithm. They are split primarily by search type, and within each search type by shared areas of functionality e.g. different typing resolutions, etc.
The language used in the feature files is called Cucumber. (https://cucumber.io/)
All feature files will start with a block in this format:
Feature: Name of the feature
As a user
I want the feature to do what I want
Name
- The first line of the file must start with
Feature:
- Following this is the name of the feature itself. It can be named anything, though aim to keep this quite short
- The name selected will show up in test reports
Description
-
The description should be phrased in the following format:
-
(1) As a <role>
-
(2) I want to <action>
-
(3) So that <reason>
e.g.
As a member of the search team
I want to be able to run a 10/10 search
So that I can match donors
The rest of a feature file is made up of scenarios. Each scenario represents a single test case.
They are written in the following format:
Scenario: Name of the scenario
Given some setup
And some additional setup
And some further additional setup
When I do a thing
Then the desired outcome should occur
Name
- The first line of the scenario must start with
Scenario:
- Following this is the name of the scenario
- The name will show up in test reports
- All names must be unique within a scenario
Steps
IMPORTANT : Unlike the names of features/scenarios, the phrasing using in steps is important. All steps are mapped to code written by developers - if a mapping has not been written, the test will not run. If you think the phrasing should be tweaked, or would like to add new steps that do not exist, talk to a developer.
In general, tweaking phrasing will be a trivial code change, but changing meaning / adding more steps will require some development effort.
Steps will start with one of four keywords, in the following order:
- (1)
Given
- This is the first step of any scenario
- Given describes some set up necessary before we run a search - this will include all information needed to create the search criteria
- It will generally describe whether we're dealing with one or more patients
- (2)
And
- This is functionally the same as
Given
, used to make the tests read more fluently - A secenario will generally have multiple
And
lines
- This is functionally the same as
- (3)
When
- This triggers the functionality under test
- In almost all cases, this will be running a search
- (4)
Then
- This is where we assert the test has passed.
- These will generally involve asserting that the expected donor(s) have been returned from the search
Our primary test data source is a set of alleles taken from TGS typed donors from the SOLAR database. They are separated by allele, and by number of fields: 2-field, 3-field, and 4-field alleles are included, and we can specify how many fields are required if necessary.
Unless otherwise specified, test donors will be created with a genotype consisting of randomly selected (from the aforementioned set of alleles) 2, 3, or 4 field single alleles across all 6 loci.
e.g.
A_1 | A_2 | B_1 | B_2 | C_1 | C_2 | DPB1_1 | DPB1_2 | DQB1_1 | DQB1_2 | DRB1_1 | DRB1_2 |
---|---|---|---|---|---|---|---|---|---|---|---|
*02:17:02 | *03:01:01:01 | *15:25:01 | *39:06:02:01 | *05:53 | *02:10:01:01 | *01:01:01:04 | *01:01 | *03:05:01 | *03:01:01:01 | *11:01:08 | *15:01:01:01 |
If lower typing resolutions are specified, this TGS typed dataset will be 'dumbed down' to get the required typing resolution.
The typing resolutions available are:
- TGS derived data
- As above
- TGS derived data at four-field resolution
- As above, with all alleles required to be 4-field
- TGS derived data at three-field resolution
- As above, with all alleles required to be 3-field
- TGS derived data at two-field resolution
- As above, with all alleles required to be 2-field
- Three field truncated
- A four field allele will be truncated to give a 3-field one
- Two field truncated
- A three or four field allele will be truncated to give a 2-field one
- XX code
- The XX code corresponding to the TGS derived alleles' first field
- NMDP code
- An NMDP corresponding to the TGS derived allele
- A single NMDP code has been selected for each TGS derived allele, from the SOLAR database
- Serology
- The serology value corresponding to the selected TGS derived allele
- (No serology data has been found for DPB1 alleles)
- Untyped
- No data will be set for the corresponding locus
Multiple typing resolutions may be specified for each genotype
Exceptions
In some specific cases, the test data described above is not sufficient, so specific test data has been curated for those cases.
e.g. Matches at a p-group, but not g-group, level.
In these cases the test data will be selected from the relevent (smaller) test dataset.
Generating Test Data
When adding new tests, relevant test data for that scenario may not yet exist. If this is a case, a developer will need to add the appropriate test donors.
Specific test data scenarios
In some cases, we may want to test some specific hla values, rather than allowing the system to choose values for us based on criteria. For such cases, step definitions have been created to allow for specifying hla at a scenario level
e.g.
And the matching donor has the following HLA:
|A_1 |A_2 |B_1 |B_2 |DRB1_1 |DRB1_2 |
|*02:09 |*01:01 |*15:01 |*15:11 |*15:03 |*03:01 |
And the patient has the following HLA:
|A_1 |A_2 |B_1 |B_2 |DRB1_1 |DRB1_2 |
|*02:09:01:01 |*01:01 |*15:01 |*15:11 |*15:03 |*03:01 |
To keep the test cases as short and readable as possible, certain values of the test data have some default values, that will be used if nothing else specified.
These defaults include:
- Donor Type =
Adult
- Both the expected matching donor, and search type default to
Adult
- Both the expected matching donor, and search type default to
- Typing Resolution =
TGS derived data
- For both donor and patient HLA values
- The number of fields of the TGS derived data will be arbitrarily chosen by default
- Match Level =
Allele
- By default, exact allele matches will be selected when a match is required.
- Match Count =
10/10
(or equivalent best possible match)- If a match is required, the expected donor will have no mismatches by default