One Star

Count Occurrence Word From Social Media

Hi,
I just wonder and need everyone of you on this matter. I required to count the occurrence of word from social media such as blog, facebook etc. But im not sure if there's any freeware than can integrated with Talend to count the occurrences.
I don't think by creating ETL job can counting the occurrence fast and real-time.
Plz help to advice me Smiley Sad

Regards,
Kal
11 REPLIES
Moderator

Re: Count Occurrence Word From Social Media

Hi,
The most important thing is that you need extract the information from Facebook or Social Media by talend, first and then do the action of counting . So I think the Forum 28483 is useful for you.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Count Occurrence Word From Social Media

Hi,
Thanks for the information, after i extract the information from social media/facebook, how do i want to counting it?
Rgds,
Kal
Moderator

Re: Count Occurrence Word From Social Media

Hi,
There is component tFileRowCount.The function is counting the number of rows in a file.
The work flow may be Source file-->tFileInputxx-->tFileRowCount-->tFileOutputxx
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
One Star

Re: Count Occurrence Word From Social Media

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal
Community Manager

Re: Count Occurrence Word From Social Media

Hi,
My source file is SQL Server. How do i wants to connect to tFileRowCount? Also, i wants to count the occurrence of each word. Is that possible?
Thanks,
Kal

Yes, you can count each word of a string, use tNormalize to normalize the data to multiple lines with the separator " ", for example, you have a data like:
"this is an example for tNormalize component"
to:
this
is
an
example
for
tNormalize
component
Then link tNormalize to tAggregateRow to for counting the number of each word with the 'count' operator.
tMSSQLlnput--main--tNormalize--main--tAggregateRow---tLogRow
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Count Occurrence Word From Social Media

Hi,
I've followed your suggestion and it's worked but there's a little issue i faced where a few words are not isolated and i noticed it happened on the first word of sentence after full stop sign "."
For example:
"i like to watch movie. I like eat too"
Expected output:
-------------------
i
like
to
watch
movie
i
like
eat
too
Current output:
-----------------
i
like
to
watch
movie. I \\this is the issue
like
eat
too

Could you figure out the issue?
Community Manager

Re: Count Occurrence Word From Social Media

Hi
Remove the special character such as ",", "." and so on before normalizing the string, for example:
row1.line.replaceAll(".","")
If the string may contains more types of special character, it is better to define a function to handle the special characters in a routine, define a list to add all characters that may exist in the string, then each character and remove it from the string. Then, call the routine to remove all special characters on a tMap for example before tNormalize:
tMSSQLlnput--main--tMap-main-->tNormalize--main--tAggregateRow---tLogRow

Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Count Occurrence Word From Social Media

Hi Shong,
Actually, I did removed special characters including ".". But it returned me like this
Current output:
-----------------
i
like
to
watch
movie I \\this is the issue
like
eat
too
Refer my job design.
Community Manager

Re: Count Occurrence Word From Social Media

Hi
In principle, there should be a space after character in English, however there is no a space after "." in your case, in order to avoid this situation, you can always replace a character with a space, for example:
row1.line.replaceAll("\\."," ")
And then, use a tfiterRow to remove the empty lines.
Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business
One Star

Re: Count Occurrence Word From Social Media

Hi,
I want to replace with ; sign. For example i have a sentence like this
"i like eat. i like drink"
expected output
------------------
"i;like;eat.;i;like;drink"
current output
----------------
"i;like;eat. i;like;drink"
How do i wants to put any function to replace between end of sentence and 1st word of next sentence?
Plz help me Smiley Sad
Community Manager

Re: Count Occurrence Word From Social Media

Hi
Please make sure there is a space after "." in your string, if I use the expression
row1.c.replaceAll(" ",";")
It output the right result:
 connecting to socket on port 3480
connected
i;like;eat.;i;like;drink
disconnected

Shong
----------------------------------------------------------
Talend | Data Agility for Modern Business