Apache DataFu™

Getting Started

DataFu Pig Docs

DataFu Hourglass Docs


Apache Software Foundation

Getting Started


Apache DataFu is available for download as a source release and as compiled artifacts stored in a Maven repository.

Source Releases

The latest source release can be found here:

Previous releases:


It is important to validate the release using either the PGP signature (.asc file) or hashes (.md5 or .sha512 files). For more information on verification of Apache releases, see here. The KEYS file can be found here.

Once fetched, the KEYS file can be imported and the .asc file can used to verify the release.

gpg --import KEYS
gpg --verify apache-datafu-incubating-sources-1.3.3.tgz.asc apache-datafu-incubating-sources-1.3.3.tgz

The md5sum tool can be used to compute an MD5 hash that can be compared against the .md5 file:

md5sum apache-datafu-incubating-sources-1.3.3.tgz
cat apache-datafu-incubating-sources-1.3.3.tgz.md5

The sha512sum tool can be used to compute a SHA-512 hash that can be compared against the .sha512 file:

sha512sum apache-datafu-incubating-sources-1.3.3.tgz
cat apache-datafu-incubating-sources-1.3.3.tgz.md5

Note that the hashes are only intended to check that the file has been downloaded correctly. They do not provide guarantees on the authenticity of the file. The signature should instead be used for this purpose. For more information see here.


Make sure you have Gradle installed. Extract the source and bootstrap the gradlew script that's used for building. The gradlew script uses the specific version of Gradle that DataFu is intended to be built with.

tar xvf apache-datafu-incubating-sources-1.3.3.tgz
cd apache-datafu-incubating-sources-1.3.3
gradle -b bootstrap.gradle

To build the JARs, run:

./gradlew assemble

This will produce JARs in the following directories:

  • datafu-pig/build/libs
  • datafu-hourglass/build/libs

Local Maven Install

DataFu artifacts can be installed to your local maven repository like so:

./gradlew install

Assuming your local maven repository is at ~/.m2, you should see the DataFu artifacts under ~/.m2/repository/org/apache/datafu/. You should now be able to declare a dependency on DataFu artifacts as shown below.


The latest release can be found in Apache's Maven Repository for DataFu:

You can also use a dependency management system to download the DataFu artifacts and all their dependencies.


compile "org.apache.datafu:datafu-pig-incubating:1.3.3"
compile "org.apache.datafu:datafu-hourglass-incubating:1.3.3"


<dependency org="org.apache.datafu" name="datafu-pig-incubating" rev="1.3.3"/>
<dependency org="org.apache.datafu" name="datafu-hourglass-incubating" rev="1.3.3"/>



Next Steps

See the following guides for next steps: