Filter rows by comparing dates with Pig

One Star

Filter rows by comparing dates with Pig

I'm trying to filter rows with the tPigFilterRow component. My file has a "last_modified" field that will be read as chararray and i want to compare it with a date that is stored in my context. I think i need to change both values to a datetime object with the ToDate() method provided by Pig but i'm not sure if that is the only or the best way.
Is it possible with the tPigMap or the tPigCode component?
Can you help me?
One Star

Re: Filter rows by comparing dates with Pig

Ok, since nobody replied to this thread i tried it myself after some researching and found a solution to this.
I'm sorry that i still can't upload an image but i try to tell you how my solution works.
Before filtering records with >= or <= and dates works you need to modify the dates a bit. So you need a tPigMap component right after loading the results with a tPigLoad component. With the map component you need to modify the date field to be processable (comparable) by Pig.
My expression looks like this (for the las_modified field in the output schema):
ToDate(row1.last_modified, 'yyyy-MM-dd HH:mm:ss.SSS') 

This tells Pig to convert the string into a date. (the .SSS part is needed because it was produced by Sqoop which i used to import RDBMS data from a mysql table and it is the microsends part)
The output of this component can then be connected to a tPigFilterRow component. I my case i wanted to get all records that have been modified since a date in a context variable. So my component settings look like follows:
Logical: AND
Column: last_modified
Operator: greater than
"ToDate('" + TalendDate.formatDate("yyyy-MM-dd HH:mm:ss", context.latest_update) + "', 'yyyy-MM-dd HH:mm:ss')"

This will convert the "latest_update" date variable in my context into a string which then can be converted by Pig into a datetime object.
I hope you can comprehend what i did!


Talend named a Leader.

Get your copy


Kickstart your first data integration and ETL projects.

Download now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences


Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now