We welcome contributions to the Apache DataFu. If you're interested, please read the following guide:
Common tasks for working in the DataFu code can be found below. For information on how to contribute patches, please follow the wiki link above.
If you haven't done so already:
git clone https://git-wip-us.apache.org/repos/asf/datafu.git cd datafu
The following command generates the necessary files to load the project in Eclipse:
To clean up the eclipse files:
Note that you may run out of heap when executing tests in Eclipse. To fix this adjust your heap settings for the TestNG plugin. Go to Eclipse->Preferences. Select TestNG->Run/Debug. Add "-Xmx1G" to the JVM args.
All the JARs for the project can be built with the following command:
This builds SNAPSHOT versions of the JARs for DataFu Pig, Spark and Hourglass. The built JARs can be found under
A single project - for example, DataFu Pig - may be built by running the command below.
Tests can be run with the following command:
All the tests can also be run from within eclipse.
To run a single project's test - for example, for DataFu Pig only:
To run a specific set of tests from the command line, you can define the
test.single system property with a value matching the test class you want to run. For example, to run all tests defined in the
QuantileTests test class for DataFu Pig:
./gradlew :datafu-pig:test -Dtest.single=QuantileTests
You can similarly run a specific Hourglass test like so:
./gradlew :datafu-hourglass:test -Dtest.single=PartitionCollapsingTests