Pervasive DataRush Core Libraries
Pervasive provides two core libraries to help the developer quickly deliver their application and take advantage of the Pervasive DataRush Parallel Dataflow Engine:
While these libraries are often sufficient to build your application, the Pervasive DataRush Java SDK allows the developer to build custom operators. In fact, the Pervasive DataRush Core Libraries are built using this same, powerful SDK.
DataRush Product Architecture
Pervasive DataRush Core Library Capabilities
For Data Preparation
- Full array of data preparation operators including standard data processing functionality such as: sort, join, aggregation (data grouping), and transformations.
- Operators support connectivity to delimited text, fixed text, databases (JDBC), and proprietary Pervasive DataRush data-staging files.
- The means to stage data to disk in a very efficient format that supports parallel writing and reading. This is useful for staging data between phases of execution and can be a useful way of communicating large data between software components.
- A full data profiling library of operators including the means to create a complex set of metrics to execute against input data.
For Analytics
- Core set of parallelized data mining algorithms built on the Pervasive DataRush engine.
- Algorithms are data scalable and built to work with any size of data, from a few thousand rows to many billions or more. There is no requirement to load all data into memory.
- Several classifiers including Naïve-Bayes, KNN, and decision trees.
- Clustering using a k-means algorithm implementation.
- Association rule mining (ARM), implemented with the FP-Growth algorithm which requires only two passes over the input data, delivering exceptional performance on large datasets.
- Linear regression, logistic regression, multiple regression and polynomial regression.
- PMML model support.