datafu.hourglass.fs
Class PathUtils

java.lang.Object
  extended by datafu.hourglass.fs.PathUtils

public class PathUtils
extends java.lang.Object

A collection of utility methods for dealing with files and paths.

Author:
"Matthew Hayes"

Field Summary
static java.text.SimpleDateFormat datedPathFormat
           
static java.text.SimpleDateFormat nestedDatedPathFormat
           
static org.apache.hadoop.fs.PathFilter nonHiddenPathFilter
          Filters out paths starting with "." and "_".
static java.util.TimeZone timeZone
           
 
Constructor Summary
PathUtils()
           
 
Method Summary
static long countBytes(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Sums the size of all files listed under a given path.
static java.util.List<DatePath> findDatedPaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          List all paths matching the "yyyyMMdd" format under a given path.
static java.util.List<DatePath> findNestedDatedPaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path input)
          List all paths matching the "yyyy/MM/dd" format under a given path.
static java.util.Date getDateForDatedPath(org.apache.hadoop.fs.Path path)
          Gets the date for a path in the "yyyyMMdd" format.
static java.util.Date getDateForNestedDatedPath(org.apache.hadoop.fs.Path path)
          Gets the date for a path in the "yyyy/MM/dd" format.
static org.apache.hadoop.fs.Path getNestedPathRoot(org.apache.hadoop.fs.Path path)
          Gets the root path for a path in the "yyyy/MM/dd" format.
static org.apache.avro.Schema getSchemaFromFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Gets the schema from a given Avro data file.
static org.apache.avro.Schema getSchemaFromPath(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path)
          Gets the schema for the first Avro file under the given path.
static void keepLatestDatedPaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, int retentionCount)
          Delete all but the last N days of paths matching the "yyyyMMdd" format.
static void keepLatestNestedDatedPaths(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, int retentionCount)
          Delete all but the last N days of paths matching the "yyyy/MM/dd" format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

timeZone

public static final java.util.TimeZone timeZone

datedPathFormat

public static final java.text.SimpleDateFormat datedPathFormat

nestedDatedPathFormat

public static final java.text.SimpleDateFormat nestedDatedPathFormat

nonHiddenPathFilter

public static final org.apache.hadoop.fs.PathFilter nonHiddenPathFilter
Filters out paths starting with "." and "_".

Constructor Detail

PathUtils

public PathUtils()
Method Detail

keepLatestDatedPaths

public static void keepLatestDatedPaths(org.apache.hadoop.fs.FileSystem fs,
                                        org.apache.hadoop.fs.Path path,
                                        int retentionCount)
                                 throws java.io.IOException
Delete all but the last N days of paths matching the "yyyyMMdd" format.

Parameters:
fs -
path -
retentionCount -
Throws:
java.io.IOException

keepLatestNestedDatedPaths

public static void keepLatestNestedDatedPaths(org.apache.hadoop.fs.FileSystem fs,
                                              org.apache.hadoop.fs.Path path,
                                              int retentionCount)
                                       throws java.io.IOException
Delete all but the last N days of paths matching the "yyyy/MM/dd" format.

Parameters:
fs -
path -
retentionCount -
Throws:
java.io.IOException

findNestedDatedPaths

public static java.util.List<DatePath> findNestedDatedPaths(org.apache.hadoop.fs.FileSystem fs,
                                                            org.apache.hadoop.fs.Path input)
                                                     throws java.io.IOException
List all paths matching the "yyyy/MM/dd" format under a given path.

Parameters:
fs - file system
input - path to search under
Returns:
paths
Throws:
java.io.IOException

findDatedPaths

public static java.util.List<DatePath> findDatedPaths(org.apache.hadoop.fs.FileSystem fs,
                                                      org.apache.hadoop.fs.Path path)
                                               throws java.io.IOException
List all paths matching the "yyyyMMdd" format under a given path.

Parameters:
fs - file system
path - path to search under
Returns:
paths
Throws:
java.io.IOException

getSchemaFromFile

public static org.apache.avro.Schema getSchemaFromFile(org.apache.hadoop.fs.FileSystem fs,
                                                       org.apache.hadoop.fs.Path path)
                                                throws java.io.IOException
Gets the schema from a given Avro data file.

Parameters:
fs -
path -
Returns:
The schema read from the data file's metadata.
Throws:
java.io.IOException

getSchemaFromPath

public static org.apache.avro.Schema getSchemaFromPath(org.apache.hadoop.fs.FileSystem fs,
                                                       org.apache.hadoop.fs.Path path)
                                                throws java.io.IOException
Gets the schema for the first Avro file under the given path.

Parameters:
path - path to fetch schema for
Returns:
Avro schema
Throws:
java.io.IOException

countBytes

public static long countBytes(org.apache.hadoop.fs.FileSystem fs,
                              org.apache.hadoop.fs.Path path)
                       throws java.io.IOException
Sums the size of all files listed under a given path.

Parameters:
fs - file system
path - path to count bytes for
Returns:
total bytes under path
Throws:
java.io.IOException

getDateForDatedPath

public static java.util.Date getDateForDatedPath(org.apache.hadoop.fs.Path path)
Gets the date for a path in the "yyyyMMdd" format.

Parameters:
path - path to check
Returns:
date for path

getDateForNestedDatedPath

public static java.util.Date getDateForNestedDatedPath(org.apache.hadoop.fs.Path path)
Gets the date for a path in the "yyyy/MM/dd" format.

Parameters:
path - path to check
Returns:
date

getNestedPathRoot

public static org.apache.hadoop.fs.Path getNestedPathRoot(org.apache.hadoop.fs.Path path)
Gets the root path for a path in the "yyyy/MM/dd" format. This is part of the path preceding the "yyyy/MM/dd" portion.

Parameters:
path -
Returns:


Matthew Hayes