Skip to content

CFD updates #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ lib_wfa2 = { git = "https://github.com/AndreaGuarracino/lib_wfa2", rev = "c608c4
rand = { version = "0.9.0", features = ["small_rng"] }
rayon = "1.10.0"
flate2 = "1.1.0"
lazy_static = "1.4"
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ CRISPRapido is a reference-free tool for comprehensive detection of CRISPR off-t
- Automatic reverse complement scanning
- PAF-format output compatible with downstream analysis tools
- Multi-threaded processing for improved performance

- CFD (Cutting Frequency Determination) scoring for off-targets
## Installation

You need to build `WFA2-lib` first, which is a submodule of this repository. To do so, run:
Expand Down Expand Up @@ -50,14 +50,14 @@ env -i bash -c 'WFA2LIB_PATH="./WFA2-lib" PATH=/usr/local/bin:/usr/bin:/bin ~/.c
## Usage

```bash
crisprapido -r <reference.fa> -g <guide_sequence> [OPTIONS]
crisprapido -r <reference.fa> -g <guide_sequence> -p <pam_sequence> [OPTIONS]
```

### Required Arguments

- `-r, --reference <FILE>`: Input reference FASTA file (supports .fa and .fa.gz)
- `-g, --guide <SEQUENCE>`: Guide RNA sequence (without PAM)

- `-p, --pam <SEQUENCE>` : PAM sequence for CFD
### Optional Arguments

- `-m, --max-mismatches <NUM>`: Maximum number of mismatches allowed (default: 4)
Expand Down Expand Up @@ -95,11 +95,26 @@ Additionally, CRISPRapido includes these custom tags:
| `ng:i` | Number of gaps (indels) |
| `bs:i` | Biggest gap size in bases |
| `cg:Z` | CIGAR string representing alignment details |
| `cf:f` | CFD score


### CFD Score

The Cutting Frequency Determination (CFD) score estimates the likelihood of a guide RNA cutting at an off-target site.
The score ranges from 0.0 to 1.0, taking into account:

- Position-specific mismatch penalties
- PAM sequence efficiency
- Bulge and gap effects

This implementation requires two data files:

- `mismatch_scores.txt` : Position-specific mismatch penalties
- `pam_scores.txt` : Efficiency scores for different PAM sequences
### Example Output

```
Guide 20 0 20 + chr1 248956422 10050 10070 19 21 255 as:i:6 nm:i:1 ng:i:0 bs:i:0 cg:Z:19=1X
Guide 20 0 20 + chr1 248956422 10050 10070 19 21 255 as:i:6 nm:i:1 ng:i:0 bs:i:0 cg:Z:19=1X cf:f:0.0549
```

This indicates:
Expand All @@ -117,7 +132,7 @@ For more details on the PAF format, see the [official specification](https://git
## Example

```bash
crisprapido -r genome.fa -g ATCGATCGATCG -m 3 -b 1 -z 2
crisprapido -r genome.fa -g ATCGATCGATCG -p GG -m 3 -b 1 -z 2
```

## Testing
Expand All @@ -144,3 +159,4 @@ See LICENSE file
## Citation

Stay tuned!

Loading