Home

Live News Linux/Mac: Windows:

The R data.table package provides an in-memory columnar structure just like base R's data.frame since 1997 (the structure is ideal and unchanged) but with the following enhancements :

fast and friendly file reader: fread. This accepts system commands directly such as grep, gunzip, etc.
fast and parallelised file writer: fwrite, from v1.9.8
parallelised row subsets from v1.9.8 - See this benchmark for timings
fast aggregation of large data; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
fast add/update/delete columns by reference by group using no copies at all
fast ordered joins; e.g. rolling forwards, backwards, nearest and limited staleness
fast overlapping range joins; similar to findOverlaps function from IRanges/GenomicRanges Bioconductor packages, but not limited to genomic (integer) intervals.
fast non-equi (or conditional) joins, i.e., joins using operators >, >=, <, <= as well, available from v1.9.8+
a fast primary ordered index; e.g. setkey(DT,col1,col2)
automatic secondary indexing; e.g. DT[col==val,] and DT[col %in% vals,]
fast and memory efficient combined join and group by; by=.EACHI
fast reshape2 methods (dcast and melt) without needing reshape2 and its dependency chain installed or loaded
group summary results may be many rows (e.g. first and last row by group) and each cell value may itself be a vector/object/function (e.g. unique ids by group as a list column of varying length vectors - this is pretty printed with commas)
automatic row numbers built in and exposed via symbol .I
convenience symbol .N for the number of rows (usually by group) without the overhead of a function call
any R function from any R package can be used in queries not just the subset of functions made available by a database backend
has no dependencies at all other than base R itself, for simpler production/maintenance
the R dependency is as old as possible for as long as possible and we test against that version; e.g. next release v1.9.8 will bump dependency up from 4.5 year old R 2.14.0 to 3 year old R 3.0.0.

It has a natural syntax:
DT[where, select|update|do, by]
These queries can be chained together just by adding another one on the end:
DT[...][...].
See data.table compared to dplyr on Stack Overflow and Quora.

NB : We moved from R-Forge to GitHub in June 2014. Commit and issue history was imported.
Guidelines for filing issues / pull requests: Contribution Guidelines.

As of 11 Mar 2016, data.table continues to be the 2nd largest tag about an R package and the 7th most starred R package on GitHub. It has over 180 CRAN and Bioconductor packages depending on or importing it.

Home
Articles
Videos & Slides
Installation
Support
Getting started
?data.table ?fread
fread for small data
Benchmarks : Grouping
Tips and Tricks
Do's and Don'ts
#rdatatable
@MattDowle
@arun_sriniv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally