ff (2.0.0)

memory-efficient storage of large atomic vectors and arrays on disk and fast access functions.


The ff package provides atomic data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports atomic data types 'double', 'logical', 'raw' and 'integer' and close-to-atomic types 'factor', 'POSIXct' and custom close-to-atomic types. ff now has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, the atomic data gets stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). NOTE: A professional extension is available from the authors, which integrates additional high-performance features neatly into the ff package. The extension allows efficient handling of symmetric matrices and supports more packed data types: boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic.

Author(s): Daniel Adler <dadler@uni-goettingen.de>, Christian Gläser <christian_glaeser@gmx.de>, Oleg Nenadic <onenadi@uni-goettingen.de>, Jens Oehlschlägel <Jens.Oehlschlaegel@truecluster.com>, Walter Zucchini <wzucchi@uni-goettingen.de>

License: file LICENSE

Uses: Does not use any package
Reverse depends: biglars, darch, ETLUtils, ffbase, geoTS, IQMNMR, IsoGene, PopGenome, propagate, RecordLinkage, sprint, tabplot
Reverse suggests: bamlss, blackbox, cMonkey, credsubs, Hmisc, kza, LinkedMatrix, nat.nblast, PerformanceAnalytics, RMETAR, RMOA, SpaDES.tools, spaMM, sprint, Stack, symDMatrix, tabplot, tabplotGTK
Reverse enhances: prediction

Released almost 12 years ago.