I have two inputs a and b I need to do a lookup as the following OR condition a.a1 = b.b1 or a.a1=b.b2 or a.a1=b.b3, if no match in all these 3 fields, go to reject file Can I simply use one TMAP with one lookup to specify this condition to get the matched and reject? Much appreiated
Yes but see this bug. You will either need a field in your lookup that can be set as your join to access the inner join rejects (if you have to, add a dummy field with a fixed value using tJavaRow so that value can be specified as your join in tMap) or you will need an expression filter for your matched output flow and output rejects for the remainder.
Alevy, Thanks so much for your information. I am new to Talend. Could you explain more about the second option? My request is to create an account list where a.a1 != b.b1 and a.a1!=b.b2 and a.a1!=b.b3. Peter
See the attached screen-print. Note that with this data structure Talend will output the "a" row for each "b" row that does not match i.e. multiplying it. This can't be avoided so to have only one instance of each "a" row that does not match any "b" row, you will need to follow the tMap with a tUniqueRow. That requires that each "a" row can be uniquely identified. Alternatively, you would have to have three left-join lookups to "b" with your output condition being: b1.b1==null && b2.b1==null && b3.b1==null
I should add that .equals is case-sensitive. Use .equalsIgnoreCase if you want case-insensitive matching. Also, unless trailing spaces are important (in which case you need to be careful about the length of your strings), you should trim the incoming data (advanced option in tDBInput).
Thanks so much for your inforation. Alevy. I tried, it is working. Please see the image I uploaded. Just a little modification. I have 'a' file 1,11 2,22 3,33 4,44 5,55 b file 11 ,88 ,99 66 , 22,33 77 , ,33 the output is 4;44;0;;;; 5;55;0;;;; This is what I want. for the value 44 and 55, they can't be found in all the 3 fields in the b file. In order to do this, I need to use your expression, but just get the records which be found in any of the 3 fileds in the b file, that is 11, 22, 33, later I use another tMap componet to get the 'a' records excluding 11,22,33. If using your way, for 11 value, it doesn't exist in some record, but it should not go the the output. Becaue it doesn't exist in another record. 1) I think this is the best way. If you can better way, such as do it in only one tMap component, please let me know....... 2) I am wondering the first way you mentioned, how do use tJavaRow to do the same thing? Forgive me asking so many questions. I am just a beginner.
I don't quite understand what you want now? The tMap example I posted should give you the records from a that don't match any field in b in one step... Ignore my previous comment about tJavaRow; it was incorrect. A couple of other comments about your image: - The tReplicate multiplies your flow so it can go to more than one component. It is pointless as you've used it. - You do not need to sort your flow to use tUniqueRow.
Alevy, Thanks for your information. Sorry to try so many times to upload, there is some problem with my screen resolution. Please see the above two screen, originally I did the same set up as you The expression is as the following. row1.a1==null || !(row1.a1.equals(row2.b1) || row1.a1.equals(row2.b2) || row1.a1.equals(row2.b3)) I have 'a' file 1,11 2,22 3,33 4,44 5,55 b file 11 ,88 ,99 66 , 22,33 77 , ,33 The output is as followings 1;11;66;22;33 1;11;77;;33 2;22;11;88;99 2;22;77;;33 3;33;11;88;99 4;44;11;88;99 4;44;66;22;33 4;44;77;;33 5;55;11;88;99 5;55;66;22;33 5;55;77;;33 11, 22, 33 should not be in the output, becasue these value can be found in one of the 3 fields b1, b2, b3 in the b file. 2) Another question regarding tReplicate component in the original screen I sent to you, I tried to drag two rows from tUnique and tReplicate into tMap2, but somehow, the system doesn't allow. If I first drag the row tUnique output into tMap2, then I can't add the output from tReplicate. The same problem if I drag the ouput of tReplicate first. That is why I have to use another delimiter input file ponting to the same a file. Do you know why is that? Hopefully this time I explain my situation clearly and you can help me. Thanks.
OK, I understand what you want (not sure what I was thinking): you want row a in your output once if a1 does not match b1, b2 or b3 in any row b. You'll have to do it with three left-join lookups to "b" with your output condition being: a.a1==null || (b1.b1==null && b2.b1==null && b3.b1==null) Note that if the joining fields are both null, tMap will treat that as a successful join, which is why I've included the test on a.a1==null. If your b file is very large, you can speed up the job (and reduce the memory required) by first reading it into a tHashOutput and then using tHashInputs for your three tMap lookups.
Sorry to try so many times to upload
You can edit any previous post you've made to re-attempt the attachment.
Another question regarding tReplicate component
Talend does not allow flows to be split and rejoined within the same subjob; search the forum for "cycle flow" for many posts regarding this.
Thanks a lot for your information. I got my result using your way now. 1) I found it will only work when the 3 look ups are left outer join, but in your screen, it is showing inner join, is that a typo? 2) I found when using tHash* component as the lookup, we no need to sort before lookup. If you are using a tFileDelimited, you have to sort first. For this example, if we use tFileDelimited, before lookup, I have to sort b1 file by the filed b1, b2 file by b2, b3file by b3. But for tHash, we no need to do that, it is more convenient, can I know it is not required to sort by diffent keys? 3) I look at the cycle flow topic in the forum and understand when Talend doesn't allow this. But I think probably the other tool is better than Talend in this way. I have used Ab Initio for 5 years, in Ab Intio, you can use the replicate componet to do the same thing. In Ab Initio, it automaticly lands the flows temporarily to disk and waits for all data to become available, then proceeds with the merge or look up in the secondary phase. It can even explictly specify differnt phases, what is your opinion for this?
1) The screen-shot doesn't show them as inner-join; the box is not checked? 2) You don't need to sort first when using tFileDelimited (or any other source) for the lookup? 3) I have not used any other products (and only used Talend for less than a year) so cannot compare but bear in mind that Talend is designed to be far more flexible than other products by generating Java code. Talend reads all the data for the lookup so it's available for the join before starting the main flow, which is generally processed one row at a time. So it's not possible to use one input component for both the lookup and main flows. tHash is the way to achieve this. Talend only uses the disk for lookup data if explicitly told to because of the processing hit.
Less than one year to use Talend, but I feel now you are alreday an expert. 1) Yes, I didn't realized the inner join is not checked. 2) Just want to understand more how tHash component works, For my understanding, usually when we hash the file, we will have a hash key, that is why the lookup will be more efficient. In this case, we have 3 different keys for look up. So when we use tHash, which field will the system use to hash by key?
Thanks but my knowledge is limited to a very small part of Talend's potential I think tHash hashes the key fields in the schema but tMap hashes any lookup flow based on the joined fields. This means that there will be additional memory used when tHashInput is a lookup flow rather than a main flow. Essentially, I think you should just view tHash as a temporary table in memory but I'm just speculating; a real Talend person (or someone prepared to wade through the Java) would have to confirm.
Peter and alevy, when you are in Talend click on the component on the canvas and press F1. A quite handy guide should pop up from the right hand side that in most cases has a pretty good and detailed description of the component as well as use cases, examples with step by step guidance. You may also want to consider downloading the user guide(s). However, since this particular component is not in there: the tHashOutput and tHashInput components are temporary tables as alevy figured. They can be used to avoid having to run the same query multiple times for example by storing the results in the Hash table. See one scenario in http://www.talendforge.org/forum/viewtopic.php?id=11239 .
Sorry Alevy. The last question for this topic, just want to confirm, for my understanding, we are not required to specify the hash key in the thashoutput and tHashinput compoent, the key will be specify in the tMap component. Is my understanding right?