Skip to content

Commit

Permalink
Simplify DNA shotgun sequencing to remove fragment reversal
Browse files Browse the repository at this point in the history
  • Loading branch information
jdtournier committed Jan 3, 2025
1 parent a00bf77 commit c1935eb
Show file tree
Hide file tree
Showing 14 changed files with 584 additions and 1,169 deletions.
6 changes: 3 additions & 3 deletions projects/DNA_shotgun_sequencing/assignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ There has been a huge increase in the use of genetic information for both resear

---

The task in this project is to implement a simple 'shotgun sequencing' algorithm capable of reconstructing a complete DNA sequence based on the measured fragments. For simplicity, we assume there were **no errors** in the sequencing of the fragments (in a real-world situations, some form of error detection and correction would be necessary). You will also need to take into account the possibility that **fragments may be reversed** – in other words, their sequence may have been measured in the reverse order relative to the complete sequence.
The task in this project is to implement a simple 'shotgun sequencing' algorithm capable of reconstructing a complete DNA sequence based on the measured fragments. For simplicity, we assume there were **no errors** in the sequencing of the fragments (in a real-world situations, some form of error detection and correction would be necessary).

Your algorithm will need to perform the following steps:

Expand All @@ -30,7 +30,7 @@ Further details for each step are provided below.

Your task in this coursework is to write a C++ program which meets the requirements described above. Three example datasets are provided, each comprising of an input file with all the measured fragments (`fragments-N.txt`), along with the expected solution for each case (`solution-N.txt`). Your program should be written to handle any data file provided in the expected format (see description below).

To help with your initial implementation, you are also provided with three equivalent datasets where none of the fragments have been reversed (`fragments-no-reverse-N.txt` along with the corresponding solutions `solution-no-reverse-N.txt`). These data are provided only to allow you to test your program at an earlier stage in development than otherwise. A complete implementation should be able to process *all* datasets provided.
To help with your initial implementation, you are also provided with three equivalent datasets (`fragments-N.txt` along with the corresponding solutions `solution-N.txt`). These data are provided only to allow you to test your program at an earlier stage in development than otherwise. A complete implementation should be able to process *all* datasets provided, and any other unseen datasets provided in the same format.


### Fragment data
Expand All @@ -39,7 +39,7 @@ The fragment data come in the form a simple text file, which each fragment as a

### Detecting overlap

The overlap between two fragments is computed by shifting one fragment relative to the other, and finding the offset that provides the longest run of identical bases between the two fragments without mismatch. Briefly, the process consists of (also illustrated in the figure below):
The overlap between two fragments is computed by shifting one fragment relative to the other, and finding the offset that provides the longest run of identical bases between the two fragments without mismatch. The process can be conceptualised as outlined below, and illustrated in the figure below:

1. set the longer fragment as the reference, and set the offset of the shorter fragment to its lowest possible value (single character overlap on the left of the reference).
2. check whether all bases in the overlap match between fragments; if they do, then if this is largest overlap observed so far, record the size of the overlap and its corresponding offset.
Expand Down
390 changes: 190 additions & 200 deletions projects/DNA_shotgun_sequencing/data/fragments-1.txt

Large diffs are not rendered by default.

375 changes: 187 additions & 188 deletions projects/DNA_shotgun_sequencing/data/fragments-2.txt

Large diffs are not rendered by default.

389 changes: 195 additions & 194 deletions projects/DNA_shotgun_sequencing/data/fragments-3.txt

Large diffs are not rendered by default.

190 changes: 0 additions & 190 deletions projects/DNA_shotgun_sequencing/data/fragments-no-reverse-1.txt

This file was deleted.

187 changes: 0 additions & 187 deletions projects/DNA_shotgun_sequencing/data/fragments-no-reverse-2.txt

This file was deleted.

195 changes: 0 additions & 195 deletions projects/DNA_shotgun_sequencing/data/fragments-no-reverse-3.txt

This file was deleted.

2 changes: 1 addition & 1 deletion projects/DNA_shotgun_sequencing/data/solution-1.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion projects/DNA_shotgun_sequencing/data/solution-2.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion projects/DNA_shotgun_sequencing/data/solution-3.txt

Large diffs are not rendered by default.

This file was deleted.

This file was deleted.

This file was deleted.

12 changes: 6 additions & 6 deletions projects/DNA_shotgun_sequencing/gen_data.m
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,12 @@

% flip sequences at random:
flip=zeros(numel(split),1);
for n = 1:numel(split)
if randi(2,1,1) == 1
split{n} = fliplr (split{n});
flip(n) = 1;
end
end
%for n = 1:numel(split)
% if randi(2,1,1) == 1
% split{n} = fliplr (split{n});
% flip(n) = 1;
% end
%end

k = randperm(numel(split));

Expand Down

0 comments on commit c1935eb

Please sign in to comment.