A review of data.table (1.2)


Fast Splitting/Sorting Operations in Frames

No other package saved me as much time as data.table.

***Whom does it help? Anyone, who has large data frames or lists which need splitting or column operations. It is a self-explaining for experienced SQL users. The knowledge is not necessary, but it helps.

***What are the biggest benefits? It is much quicker than any split / subset and it allows to preform any calculations on a column within one line. This all is simple, you just have to decide by which column you want to split. Furthermore, it allows operations "within subsets of a subset" in one line, which safes a lot of coding and thinking (.SD..) This function is hidden in the examples. Finally, this is not obvious and also a bit hidden in one of the functions: timeline operations are really quick.

***Examples *subset If you want a subset of a big list, you converted into a data table just need: dt[row=rowvalue,] and there are at least two other ways to do it dependent on want you want to do later. *split If you want the average of column dependent on values of another one: dt[,ave(some column), by="other column"] The list will be split or grouped by the individual values of other column.

***Downside Right now, the documentation favours those, who know SQL very well. However, basic operations are simple and if you just follow the examples and understand that i means row and j means column... For any other step, the learning curve might be a bit flat in the beginning, but it pays off.

***Personal Experience Once having big loops and splitting, I could reduce the computation time from 62minutes to six...and my code now has only three levels instead of six.