One Star

MapReduce exemple

Hi =)
is there an exemple of effective running MapReduce example using talend open studio for big data?
thank you =)

12 REPLIES
Community Manager

Re: MapReduce exemple

Hi,
We are currently working on adding more Big Data and particularly MR examples in our resources (for our next (5.3) release mostlikely) but there is one Big Data example in the studio User Guide: https://help.talend.com/display/TALENDOPENSTUDIOFORBIGDATAUSERGUIDE52EN/B.3.2+Translating+the+scenar...
And you can find some interesting webinars which may help too:http://www.talend.com/resources/webinars
I hope this may help a bit.
If you have precise MR questions, don't hesitate to post again here.
Elisa
(Doc team)
One Star

Re: MapReduce exemple

Thank you for answering Elisa.
i'm hadoop newbie and i'm trying Talend Open Studio for big data since few days, searching how to run mapreduce jobs. ok, so i'll wait for next tutorial! i'm looking forward to try them!
Yes, i'am looking on the example "B.3. Finding out who visit your website most often". some error to fix (hope it will work, i liked the example's topic!)
Community Manager

Re: MapReduce exemple

Now, we assume your Hadoop cluster is already set up and correctly configured (which is not always that easy). So be aware that the Talend documentation does not intend to focus on how to set up Hadoop, but really how to set up Jobs using Talend Hadoop connectors.
Just mentioning that, because I had the question before.
If you have a particular MR use case in mind based on your own needs, feel free to expose it here.
Elisa
Employee

Re: MapReduce exemple

Hi
You should also check out the Youtube videos on big data (http://www.youtube.com/user/TalendChannel).
Enjoy!
One Star

Re: MapReduce exemple

Now, we assume your Hadoop cluster is already set up and correctly configured (which is not always that easy). So be aware that the Talend documentation does not intend to focus on how to set up Hadoop, but really how to set up Jobs using Talend Hadoop connectors.
Just mentioning that, because I had the question before.
If you have a particular MR use case in mind based on your own needs, feel free to expose it here.
Elisa

Elisa,
I am happy to see Talend getting into the MapReduce world. I have a specific use case that I would like some assistance on. I have web log files in flat format that I want to unpack. I would then like to aggregate this data into a new table.
Here is an example of a row within the flat file. Each dimension is delimited by ^. Thank you!!!
Time^UserId^AdvertiserId^OrderId^LineItemId^CreativeId^CreativeVersion^CreativeSize^AdUnitId^CustomTargeting^Domain^CountryId^Country^RegionId^Region^MetroId^Metro^CityId^City^PostalCodeId^PostalCode^BrowserId^Browser^OSId^OS^OSVersion^BandWidth^TimeUsec^AudienceSegmentIds^Product^RequestedAdUnitSizes^BandwidthGroupId^MobileDevice^MobileCapability^MobileCarrier^GfpContentId^IsCompanion
2013-06-03-15:44:00^tEyYz5wJNfF-iCl3IKWT8A^12690422^136588262^26445782^26194754342^1^160x600^55707782^location=bottomleft;login=no;ptype=search;search=zebra_decor;visitorid=38949244799;wm_visit_id=38949244799^bellsouth.net^2840^United States^21158^Mississippi^200647^Greenwood-Greenville MS^1020740^Greenville^0^^500118^Microsoft Internet Explorer 10.Any^501026^Microsoft Windows 8^^cable^1370288640^9483370|9619690|9686410|9686530|10620730|10621210^Ad Server^160x600^4^^^^0^false
One Star

Re: MapReduce exemple

Hi,
I have a very basic question related to this topic..
Does Talend use the Capability and Processing power of Hadoop only in the Pig (MR option) in the Open Studio for Big Data 5.3.0 option?
I assume that if I use a HivevInput option and extract around 10 gig of data and then use a tMAP compent, here I am not using the processing power of Hadoop while doing the mapping.? Is this correct?
Thanks.
Employee

Re: MapReduce exemple

Hello Ganesh,
You're almost right.
In Talend Open Studio for BigData, Talend uses the power of MapReduce with these components:
- PIG (all the components)
- Sqoop (the 3 components)
- The ELT components for Hive. (tELTHiveInput, tELTHiveMap and tELTHiveOutput)
You are right, using the tHiveInput and then a tMap, MapReduce is used only to execute the query you have written in the tHiveInput, but the processing within the tMap is made in Java, locally.
Additionally, in 5.3, Talend Platform (enterprise version) brings the ability to design and execute a M/R transformation using the usual ETL components, you used to use in the previous version. That means you can design anything in M/R using the classical ETL components.
One Star

Re: MapReduce exemple

Hi rdubois,
Thanks for the detailed reply! It helps in designing and saved my time Smiley Happy
One Star

Re: MapReduce exemple

I am currently evaluating Talend to possibly replace our current internally developed ETL framework. We want to move to Hadoop and incorporate MapReduce in our ETL process. Because I am new to both Talend and Hadoop, I wanted to see if anyone could guide me for a particular use case I will use as the basis for a proof of concept.
I have many input forms as part of our ETL (.xml, csv, .dmp) and often they are archived as some form of zip. We would like the raw input to be initially placed in Hadoop using a specific structure, processed in Hadoop using MapReduce, and finally stored in Hadoop using a specific structure.
I am able to connect to my Hadoop instance and understand the Put, Get, Delete, and Copy features which Talend provides. What I don't understand is how to look for any zip file, copy it to a staging area, unpack it, process it, and then save it back to Hadoop.
Thanks.
Employee

Re: MapReduce exemple

Hi Jerry
I'm not sure how far you've gotten with your use-case, but it could be an example with Talend Presales might need to get involved. With TOS4BigData you need to use Pig and Pig UDFs, or one of the existing components. With Talend Enterprise Big Data however we have a number of extensions (through MapReduce and custom functions), which might be required for your immediate need.
Cheers
One Star

Re: MapReduce exemple

We have done some POCs using Hadoop/Talend with Hive and MR using Talend.  Some of the use cases are:
-Slow Changing Dimensions (SCD)
-Change Data capture
-Merge Statements
-Lookups
-Draining Data from Queues and parsing complex xmls etc.
- Logfiles and Error Handling
- Data Pattern Matching across historical transactions.
-Aggregations
etc.
For most of the cases Hive is used.  For some of them MapReduce code is used.  In general Hive is found to be easier to support
One Star

Re: MapReduce exemple

Hi sdhurjati, Can you please share an example for parsing xml using MapReduce Talend job?