apache pig - Combining/separating Pig UDF returns -


suppose pig udf creates 2 different types of data records.

how can pig script process returned list of combined tuples udf in 2 separate ways?

for example:

public tuple exec (tuple input)  // input ignored in udf simplicity    {    tuple t = tuplefactory.getinstance ().newtuple ();    if (math.random () < 0.5)       t.append ("less half");    else       t.append (new date ());    return t;    } 

the pig script should like:

register ... define myudf ... data = load ...; combinedlist = foreach data generate myudf (data);  stringlist = filter combinedlist $0 instanceof java.lang.string; // ?? datelist = filter combinedlists $0 instanceof java.util.date; //??  store stringlist ... ; store datelist ... ; 

thank you,

there 2 issues here.

  1. under no circumstances should ever return different data types udf. against principle of least surprise , couple of other things. if want indicate invalid value, returning null or invalid constant more appropriate.
  2. what you're trying not done multiple filters, there split operation that. although example of using instanceof within pig wrong, basic usage split combinedlist stringlist if $0 instanceof string, datelist if $0 instanceof date.

Comments

Popular posts from this blog

monitor web browser programmatically in Android? -

Shrink a YouTube video to responsive width -

wpf - PdfWriter.GetInstance throws System.NullReferenceException -