- CachedFile - Class in datafu.pig.text.opennlp
-
- CachedFile() - Constructor for class datafu.pig.text.opennlp.CachedFile
-
- call(DataBag, Tuple) - Method in class datafu.pig.bags.AppendToBag
-
- call(DataBag, Tuple) - Method in class datafu.pig.bags.FirstTupleFromBag
-
- call(DataBag, Tuple) - Method in class datafu.pig.bags.PrependToBag
-
- call(DataBag) - Method in class datafu.pig.bags.ReverseEnumerate
-
- call(Double, Double, Double, Double) - Method in class datafu.pig.geo.HaversineDistInMiles
-
- call(String) - Method in class datafu.pig.hash.Hasher
-
- call(String) - Method in class datafu.pig.hash.HasherRand
-
Generates the hash for a string value.
- call(String) - Method in class datafu.pig.hash.MD5
-
- call(String) - Method in class datafu.pig.hash.SHA
-
- call(Integer, Integer) - Method in class datafu.pig.random.RandInt
-
- call(DataBag) - Method in class datafu.pig.stats.Quantile
-
- call(Number, Number) - Method in class datafu.pig.stats.WilsonBinConf
-
- call(String) - Method in class datafu.pig.urls.UserAgentClassify
-
- call(String) - Method in class datafu.pig.util.Base64Decode
-
- call(String) - Method in class datafu.pig.util.Base64Encode
-
- call(Boolean) - Method in class datafu.pig.util.BoolToInt
-
- call(Integer) - Method in class datafu.pig.util.IntToBool
-
- CANDIDATE_FIELD_NAME - Static variable in class datafu.pig.sampling.SimpleRandomSampleWithReplacementVote
-
- changed - Variable in class datafu.pig.util.TupleDiff
-
- CHAOSHEN_ESTIMATOR - Static variable in class datafu.pig.stats.entropy.EntropyEstimator
-
- cleanup() - Method in class datafu.org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField
-
- cleanup() - Method in class datafu.pig.bags.CountDistinctUpTo
-
- cleanup() - Method in class datafu.pig.bags.CountEach
-
- cleanup() - Method in class datafu.pig.bags.DistinctBy
-
- cleanup() - Method in class datafu.pig.bags.Enumerate
-
- cleanup() - Method in class datafu.pig.bags.FirstTupleFromBag
-
- cleanup() - Method in class datafu.pig.bags.TupleFromBag
-
- cleanup() - Method in class datafu.pig.linkanalysis.PageRank
-
- cleanup() - Method in class datafu.pig.sampling.ReservoirSample
-
- cleanup() - Method in class datafu.pig.sessions.SessionCount
-
- cleanup() - Method in class datafu.pig.sessions.Sessionize
-
- cleanup() - Method in class datafu.pig.stats.DoubleVAR
-
- cleanup() - Method in class datafu.pig.stats.entropy.CondEntropy
-
- cleanup() - Method in class datafu.pig.stats.entropy.EmpiricalCountEntropy
-
- cleanup() - Method in class datafu.pig.stats.entropy.Entropy
-
- cleanup() - Method in class datafu.pig.stats.FloatVAR
-
- cleanup() - Method in class datafu.pig.stats.IntVAR
-
- cleanup() - Method in class datafu.pig.stats.LongVAR
-
- cleanup() - Method in class datafu.pig.stats.StreamingQuantile
-
- cleanup() - Method in class datafu.pig.stats.VAR
-
- clear() - Method in class datafu.pig.linkanalysis.PageRankImpl
-
- Coalesce - Class in datafu.pig.util
-
Returns the first non-null value from a tuple, just like
COALESCE in SQL.
- Coalesce() - Constructor for class datafu.pig.util.Coalesce
-
- Coalesce(String) - Constructor for class datafu.pig.util.Coalesce
-
- combine(DataBag) - Static method in class datafu.pig.stats.DoubleVAR
-
- combine(DataBag) - Static method in class datafu.pig.stats.entropy.EmpiricalCountEntropy
-
- combine(DataBag) - Static method in class datafu.pig.stats.FloatVAR
-
- combine(DataBag) - Static method in class datafu.pig.stats.IntVAR
-
- combine(DataBag) - Static method in class datafu.pig.stats.LongVAR
-
- combine(DataBag) - Static method in class datafu.pig.stats.VAR
-
- commit(ProgressIndicator) - Method in class datafu.pig.linkanalysis.PageRankImpl
-
- CondEntropy - Class in datafu.pig.stats.entropy
-
Calculate conditional entropy H(Y|X) of random variables X and Y following conditional entropy's
wiki definition,
X is the conditional variable and Y is the variable that conditions on X.
- CondEntropy() - Constructor for class datafu.pig.stats.entropy.CondEntropy
-
- CondEntropy(String) - Constructor for class datafu.pig.stats.entropy.CondEntropy
-
- CondEntropy(String, String) - Constructor for class datafu.pig.stats.entropy.CondEntropy
-
- constructFamily(RandomGenerator) - Method in class datafu.pig.hash.lsh.interfaces.LSHCreator
-
- constructLSH(RandomGenerator) - Method in class datafu.pig.hash.lsh.interfaces.LSHCreator
-
- ContextualEvalFunc<T> - Class in datafu.pig.util
-
An abstract class which enables UDFs to store instance properties
on the front end which will be available on the back end.
- ContextualEvalFunc() - Constructor for class datafu.pig.util.ContextualEvalFunc
-
- convert(Tuple, int) - Method in enum datafu.pig.hash.lsh.util.DataTypeUtil
-
Convert a tuple t into a RealVector of dimension dim.
- Cosine - Class in datafu.pig.hash.lsh.metric
-
A UDF used to find a vector v in a bag such that for query point q, metric m and threshold t
m(v,q) < t.
- Cosine(String) - Constructor for class datafu.pig.hash.lsh.metric.Cosine
-
Create a new Cosine Metric UDF with a given dimension.
- CosineDistanceHash - Class in datafu.pig.hash.lsh
-
- CosineDistanceHash(String, String, String, String) - Constructor for class datafu.pig.hash.lsh.CosineDistanceHash
-
Locality sensitive hash that maps vectors onto 0,1 in such a way that colliding
vectors are "near" one another according to cosine similarity with high probability.
- CosineDistanceHash(String, String, String) - Constructor for class datafu.pig.hash.lsh.CosineDistanceHash
-
- count(Tuple) - Static method in class datafu.pig.stats.DoubleVAR
-
- count(Tuple) - Static method in class datafu.pig.stats.FloatVAR
-
- count(Tuple) - Static method in class datafu.pig.stats.IntVAR
-
- count(Tuple) - Static method in class datafu.pig.stats.LongVAR
-
- count(Tuple) - Static method in class datafu.pig.stats.VAR
-
- countDisctinct(Tuple, int) - Static method in class datafu.pig.stats.HyperLogLogPlusPlus
-
Deprecated.
- CountDistinctUpTo - Class in datafu.pig.bags
-
Generates a count of the number of distinct tuples in a bag, up to a preset limit.
- CountDistinctUpTo(String) - Constructor for class datafu.pig.bags.CountDistinctUpTo
-
- CountDistinctUpTo.Final - Class in datafu.pig.bags
-
Receives output either from initial results or intermediate
Outputs an integer with the number of distinct tuples, up to the maximum desired.
- CountDistinctUpTo.Initial - Class in datafu.pig.bags
-
Outputs a tuple containing a DataBag containing a single tuple T (the original schema) or an empty bag
- CountDistinctUpTo.Intermediate - Class in datafu.pig.bags
-
Receives a bag of bags, each containing a single tuple with the original input schema T
Outputs a bag of distinct tuples each with the original schema T:
- CountEach - Class in datafu.pig.bags
-
Generates a count of the number of times each distinct tuple appears in a bag.
- CountEach() - Constructor for class datafu.pig.bags.CountEach
-
- CountEach(String) - Constructor for class datafu.pig.bags.CountEach
-
- countMatches(PriorityQueue<SetDifference.Pair>) - Method in class datafu.pig.sets.SetDifference
-
Counts how many elements in the priority queue match the
element at the front of the queue, which should be from the first bag.
- createEstimator(String, String) - Static method in class datafu.pig.stats.entropy.EntropyEstimator
-
- createGenerator() - Method in class datafu.pig.hash.lsh.interfaces.LSHCreator
-
- createLSHCreator() - Method in class datafu.pig.hash.lsh.CosineDistanceHash
-
- createLSHCreator() - Method in class datafu.pig.hash.lsh.L1PStableHash
-
- createLSHCreator() - Method in class datafu.pig.hash.lsh.L2PStableHash
-
- createLSHCreator() - Method in class datafu.pig.hash.lsh.LSHFunc
-
- datafu.org.apache.pig.piggybank.evaluation - package datafu.org.apache.pig.piggybank.evaluation
-
- datafu.pig.bags - package datafu.pig.bags
-
A collection of general purpose UDFs for operating on bags.
- datafu.pig.geo - package datafu.pig.geo
-
UDFs for geographic computations.
- datafu.pig.hash - package datafu.pig.hash
-
UDFs for computing hashes from data.
- datafu.pig.hash.lsh - package datafu.pig.hash.lsh
-
- datafu.pig.hash.lsh.cosine - package datafu.pig.hash.lsh.cosine
-
- datafu.pig.hash.lsh.interfaces - package datafu.pig.hash.lsh.interfaces
-
- datafu.pig.hash.lsh.metric - package datafu.pig.hash.lsh.metric
-
UDFs for different
distance functions (and some similarity functions)
used with Locality Sensitive Hashing.
- datafu.pig.hash.lsh.p_stable - package datafu.pig.hash.lsh.p_stable
-
- datafu.pig.hash.lsh.util - package datafu.pig.hash.lsh.util
-
Utility functions for locality sensitive hashes
- datafu.pig.linkanalysis - package datafu.pig.linkanalysis
-
UDFs for performing link analysis, such as PageRank.
- datafu.pig.random - package datafu.pig.random
-
UDFs dealing with randomness.
- datafu.pig.sampling - package datafu.pig.sampling
-
Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.
- datafu.pig.sessions - package datafu.pig.sessions
-
UDFs for sessionizing data.
- datafu.pig.sets - package datafu.pig.sets
-
UDFs for set operations such as intersect and union.
- datafu.pig.stats - package datafu.pig.stats
-
Statistics UDFs for computing median, quantiles, variance, confidence intervals, etc.
- datafu.pig.stats.entropy - package datafu.pig.stats.entropy
-
- datafu.pig.text.opennlp - package datafu.pig.text.opennlp
-
- datafu.pig.urls - package datafu.pig.urls
-
UDFs for processing URLs.
- datafu.pig.util - package datafu.pig.util
-
Other useful utilities.
- DataFuException - Exception in datafu.pig.util
-
- DataFuException() - Constructor for exception datafu.pig.util.DataFuException
-
- DataFuException(String) - Constructor for exception datafu.pig.util.DataFuException
-
- DataFuException(String, Throwable) - Constructor for exception datafu.pig.util.DataFuException
-
- DataFuException(Throwable) - Constructor for exception datafu.pig.util.DataFuException
-
- DataTypeUtil - Enum in datafu.pig.hash.lsh.util
-
A utility function to translate between pig types and vectors.
- dim - Variable in class datafu.pig.hash.lsh.interfaces.LSH
-
- dim - Variable in class datafu.pig.hash.lsh.metric.MetricUDF
-
- disableDanglingNodeHandling() - Method in class datafu.pig.linkanalysis.PageRankImpl
-
Disables dangling node handling (disabled by default).
- disableEdgeDiskCaching() - Method in class datafu.pig.linkanalysis.PageRankImpl
-
Disable disk caching of edges once there are too many (disabled by default).
- disableNodeBiasing() - Method in class datafu.pig.linkanalysis.PageRankImpl
-
- dist(RealVector, RealVector) - Method in class datafu.pig.hash.lsh.metric.Cosine
-
Cosine similarity.
- dist(RealVector, RealVector) - Method in class datafu.pig.hash.lsh.metric.L1
-
- dist(RealVector, RealVector) - Method in class datafu.pig.hash.lsh.metric.L2
-
- dist(RealVector, RealVector) - Method in class datafu.pig.hash.lsh.metric.MetricUDF
-
The distance metric used.
- distance(RealVector, RealVector) - Static method in class datafu.pig.hash.lsh.metric.Cosine
-
Cosine similarity.
- distance(RealVector, RealVector) - Static method in class datafu.pig.hash.lsh.metric.L1
-
- distance(RealVector, RealVector) - Static method in class datafu.pig.hash.lsh.metric.L2
-
- DistinctBy - Class in datafu.pig.bags
-
Get distinct elements in a bag by a given set of field positions.
- DistinctBy(String...) - Constructor for class datafu.pig.bags.DistinctBy
-
- distribute(ProgressIndicator) - Method in class datafu.pig.linkanalysis.PageRankImpl
-
- DoubleVAR - Class in datafu.pig.stats
-
- DoubleVAR() - Constructor for class datafu.pig.stats.DoubleVAR
-
- DoubleVAR.Final - Class in datafu.pig.stats
-
- DoubleVAR.Initial - Class in datafu.pig.stats
-
- DoubleVAR.Intermediate - Class in datafu.pig.stats
-
- L1 - Class in datafu.pig.hash.lsh.metric
-
A UDF used to find a vector v in a bag such that for query point q, metric m and threshold t
m(v,q) < t.
- L1(String) - Constructor for class datafu.pig.hash.lsh.metric.L1
-
Create a new L1 Metric UDF with a given dimension.
- L1LSH - Class in datafu.pig.hash.lsh.p_stable
-
A locality sensitive hash associated with the L1 metric.
- L1LSH(int, double, RandomGenerator) - Constructor for class datafu.pig.hash.lsh.p_stable.L1LSH
-
Constructs a new instance.
- L1PStableHash - Class in datafu.pig.hash.lsh
-
- L1PStableHash(String, String, String, String, String) - Constructor for class datafu.pig.hash.lsh.L1PStableHash
-
Locality sensitive hash that maps vectors onto a long in such a way that colliding
vectors are "near" one another according to cosine similarity with high probability.
- L1PStableHash(String, String, String, String) - Constructor for class datafu.pig.hash.lsh.L1PStableHash
-
- L2 - Class in datafu.pig.hash.lsh.metric
-
A UDF used to find a vector v in a bag such that for query point q, metric m and threshold t
m(v,q) < t.
- L2(String) - Constructor for class datafu.pig.hash.lsh.metric.L2
-
Create a new L2 Metric UDF with a given dimension.
- L2LSH - Class in datafu.pig.hash.lsh.p_stable
-
A locality sensitive hash associated with the L2 metric.
- L2LSH(int, double, RandomGenerator) - Constructor for class datafu.pig.hash.lsh.p_stable.L2LSH
-
Constructs a new instance.
- L2PStableHash - Class in datafu.pig.hash.lsh
-
- L2PStableHash(String, String, String, String, String) - Constructor for class datafu.pig.hash.lsh.L2PStableHash
-
Locality sensitive hash that maps vectors onto a long in such a way that colliding
vectors are "near" one another according to cosine similarity with high probability.
- L2PStableHash(String, String, String, String) - Constructor for class datafu.pig.hash.lsh.L2PStableHash
-
- LOG - Static variable in class datafu.pig.stats.entropy.EntropyUtil
-
- LOG10 - Static variable in class datafu.pig.stats.entropy.EntropyUtil
-
- LOG2 - Static variable in class datafu.pig.stats.entropy.EntropyUtil
-
- logTransform(double, String) - Static method in class datafu.pig.stats.entropy.EntropyUtil
-
- longFromHex(String) - Static method in class datafu.pig.hash.Hasher
-
- LongVAR - Class in datafu.pig.stats
-
- LongVAR() - Constructor for class datafu.pig.stats.LongVAR
-
- LongVAR.Final - Class in datafu.pig.stats
-
- LongVAR.Initial - Class in datafu.pig.stats
-
- LongVAR.Intermediate - Class in datafu.pig.stats
-
- LSH - Class in datafu.pig.hash.lsh.interfaces
-
An abstract class representing a locality sensitive hash.
- LSH(int, RandomGenerator) - Constructor for class datafu.pig.hash.lsh.interfaces.LSH
-
Construct a locality sensitive hash.
- lsh - Variable in class datafu.pig.hash.lsh.LSHFunc
-
- LSHCreator - Class in datafu.pig.hash.lsh.interfaces
-
Create a Locality sensitive hash.
- LSHCreator(int, int, int, long) - Constructor for class datafu.pig.hash.lsh.interfaces.LSHCreator
-
Create a LSHCreator
- LSHFamily - Class in datafu.pig.hash.lsh
-
A family of k locality sensitive hashes.
- LSHFamily(List<LSH>) - Constructor for class datafu.pig.hash.lsh.LSHFamily
-
Construct a family of hashes
- LSHFunc - Class in datafu.pig.hash.lsh
-
The base UDF for locality sensitive hashing.
- LSHFunc(String) - Constructor for class datafu.pig.hash.lsh.LSHFunc
-
- sample(RandomDataImpl) - Method in interface datafu.pig.hash.lsh.interfaces.Sampler
-
Generate a sample
- sample(RandomDataImpl) - Method in class datafu.pig.hash.lsh.p_stable.L1LSH
-
Draw a sample s ~ Cauchy(0,1), which is 1-stable.
- sample(RandomDataImpl) - Method in class datafu.pig.hash.lsh.p_stable.L2LSH
-
Draw a sample s ~ Gaussian(0,1), which is 2-stable.
- SampleByKey - Class in datafu.pig.sampling
-
Provides a way of sampling tuples based on certain fields.
- SampleByKey(String) - Constructor for class datafu.pig.sampling.SampleByKey
-
- SampleByKey(String, String) - Constructor for class datafu.pig.sampling.SampleByKey
-
- Sampler - Interface in datafu.pig.hash.lsh.interfaces
-
A helper interface to sample from a distribution specified by a RandomDataImpl
- SCORE_FIELD_NAME - Static variable in class datafu.pig.sampling.SimpleRandomSampleWithReplacementVote
-
- scoreGen - Variable in class datafu.pig.sampling.ReservoirSample.Initial
-
- scoreGen - Variable in class datafu.pig.sampling.ReservoirSample
-
- seed - Variable in class datafu.pig.hash.lsh.LSHFunc
-
- SEEDED_HASH_NAMES - Static variable in class datafu.pig.hash.Hasher
-
- SelectStringFieldByName - Class in datafu.pig.util
-
Selects the value for a field within a tuple using that field's name.
- SelectStringFieldByName() - Constructor for class datafu.pig.util.SelectStringFieldByName
-
- SentenceDetect - Class in datafu.pig.text.opennlp
-
The OpenNLP SentenceDectectors segment an input paragraph into sentences.
- SentenceDetect(String) - Constructor for class datafu.pig.text.opennlp.SentenceDetect
-
- separator - Variable in class datafu.pig.util.TupleDiff
-
- SessionCount - Class in datafu.pig.sessions
-
Performs a count of events, ignoring events which occur within the
same time window.
- SessionCount(String) - Constructor for class datafu.pig.sessions.SessionCount
-
- Sessionize - Class in datafu.pig.sessions
-
Sessionizes an input stream, appending a session ID to each tuple.
- Sessionize(String) - Constructor for class datafu.pig.sessions.Sessionize
-
- setAlpha(float) - Method in class datafu.pig.linkanalysis.PageRankImpl
-
Sets the page rank alpha value (default is 0.85);
- setData(Object) - Method in exception datafu.pig.util.DataFuException
-
Sets data relevant to this exception.
- SetDifference - Class in datafu.pig.sets
-
Computes the set difference of two or more bags.
- SetDifference() - Constructor for class datafu.pig.sets.SetDifference
-
- setEdgeCachingThreshold(long) - Method in class datafu.pig.linkanalysis.PageRankImpl
-
Set the number of edges past which they will be cached on disk instead of in memory.
- setFieldAliases(Map<String, Integer>) - Method in exception datafu.pig.util.DataFuException
-
Sets field aliases for a UDF which may be relevant to this exception.
- SetIntersect - Class in datafu.pig.sets
-
Computes the set intersection of two or more bags.
- SetIntersect() - Constructor for class datafu.pig.sets.SetIntersect
-
- setNodeBias(int, float) - Method in class datafu.pig.linkanalysis.PageRankImpl
-
- SetOperationsBase - Class in datafu.pig.sets
-
Base class for set operations.
- SetOperationsBase() - Constructor for class datafu.pig.sets.SetOperationsBase
-
- setUDFContextSignature(String) - Method in class datafu.pig.sampling.SampleByKey
-
- setUDFContextSignature(String) - Method in class datafu.pig.util.ContextualEvalFunc
-
- SetUnion - Class in datafu.pig.sets
-
Computes the set union of two or more bags.
- SetUnion() - Constructor for class datafu.pig.sets.SetUnion
-
- SHA - Class in datafu.pig.hash
-
- SHA() - Constructor for class datafu.pig.hash.SHA
-
- SHA(String) - Constructor for class datafu.pig.hash.SHA
-
- SimpleEvalFunc<T> - Class in datafu.pig.util
-
Uses reflection to makes writing simple wrapper Pig UDFs easier.
- SimpleEvalFunc() - Constructor for class datafu.pig.util.SimpleEvalFunc
-
- SimpleRandomSample - Class in datafu.pig.sampling
-
Scalable simple random sampling (ScaSRS).
- SimpleRandomSample() - Constructor for class datafu.pig.sampling.SimpleRandomSample
-
- SimpleRandomSample(String) - Constructor for class datafu.pig.sampling.SimpleRandomSample
-
- SimpleRandomSample.Final - Class in datafu.pig.sampling
-
- SimpleRandomSample.Initial - Class in datafu.pig.sampling
-
- SimpleRandomSample.Intermediate - Class in datafu.pig.sampling
-
- SimpleRandomSampleWithReplacementElect - Class in datafu.pig.sampling
-
- SimpleRandomSampleWithReplacementElect() - Constructor for class datafu.pig.sampling.SimpleRandomSampleWithReplacementElect
-
- SimpleRandomSampleWithReplacementElect.Final - Class in datafu.pig.sampling
-
- SimpleRandomSampleWithReplacementElect.Initial - Class in datafu.pig.sampling
-
- SimpleRandomSampleWithReplacementElect.Intermediate - Class in datafu.pig.sampling
-
- SimpleRandomSampleWithReplacementVote - Class in datafu.pig.sampling
-
Scalable simple random sampling with replacement (ScaSRSWR).
- SimpleRandomSampleWithReplacementVote() - Constructor for class datafu.pig.sampling.SimpleRandomSampleWithReplacementVote
-
- StreamingMedian - Class in datafu.pig.stats
-
Computes the approximate
median
for a (not necessarily sorted) input bag, using the Munro-Paterson algorithm.
- StreamingMedian() - Constructor for class datafu.pig.stats.StreamingMedian
-
- StreamingQuantile - Class in datafu.pig.stats
-
Computes approximate
quantiles
for a (not necessarily sorted) input bag, using the Munro-Paterson algorithm.
- StreamingQuantile(String...) - Constructor for class datafu.pig.stats.StreamingQuantile
-
- sum(Tuple) - Static method in class datafu.pig.stats.DoubleVAR
-
- sum(Tuple) - Static method in class datafu.pig.stats.FloatVAR
-
- sum(Tuple) - Static method in class datafu.pig.stats.IntVAR
-
- sum(Tuple) - Static method in class datafu.pig.stats.LongVAR
-
- sum(Tuple) - Static method in class datafu.pig.stats.VAR
-
- sumSquare(Tuple) - Static method in class datafu.pig.stats.DoubleVAR
-
- sumSquare(Tuple) - Static method in class datafu.pig.stats.FloatVAR
-
- sumSquare(Tuple) - Static method in class datafu.pig.stats.IntVAR
-
- sumSquare(Tuple) - Static method in class datafu.pig.stats.LongVAR
-
- sumSquare(Tuple) - Static method in class datafu.pig.stats.VAR
-