apache pig - Combining/separating Pig UDF returns -
suppose pig udf creates 2 different types of data records.
how can pig script process returned list of combined tuples udf in 2 separate ways?
for example:
public tuple exec (tuple input) // input ignored in udf simplicity { tuple t = tuplefactory.getinstance ().newtuple (); if (math.random () < 0.5) t.append ("less half"); else t.append (new date ()); return t; }
the pig script should like:
register ... define myudf ... data = load ...; combinedlist = foreach data generate myudf (data); stringlist = filter combinedlist $0 instanceof java.lang.string; // ?? datelist = filter combinedlists $0 instanceof java.util.date; //?? store stringlist ... ; store datelist ... ;
thank you,
there 2 issues here.
- under no circumstances should ever return different data types udf. against principle of least surprise , couple of other things. if want indicate invalid value, returning
null
or invalid constant more appropriate. - what you're trying not done multiple filters, there
split
operation that. although example of usinginstanceof
within pig wrong, basic usagesplit combinedlist stringlist if $0 instanceof string, datelist if $0 instanceof date
.
Comments
Post a Comment