Seven Stars

Tuning the query in tSQL in talend BigData Platform

I am using Talend bigdata Platform 6.3 Studio. I am using tSql component to execute a hive query in cluster. Please find below query

 SELECT DISTINCT inVistaNonVistaComb.*,inLeasingProxy.LogicalFeedType as LPLogicalFeedType
FROM inVistaNonVistaComb
LEFT OUTER JOIN inLeasingProxy ON
UPPER(inLeasingProxy.LogicalFeedType) = UPPER(inVistaNonVistaComb.Logical_Feed_Type) AND
UPPER(inLeasingProxy.SalesOrgRegion) = UPPER(inVistaNonVistaComb.Sales_Org_Region) AND
(UPPER(inVistaNonVistaComb.Sales_Org_Sub_Region) = UPPER(inLeasingProxy.SalesOrgSubRegion) OR UPPER(inLeasingProxy.SalesOrgSubRegion)='ALL') AND
(
inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Sales_Rep_Code_Mis OR 
inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.CBN OR 
inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Transaction_Subtype OR
inLeasingProxy.LeasingProxyValue = inVistaNonVistaComb.Ice_Holder5
) AND
(inVistaNonVistaComb.Adjusted_Primary_Transaction_Date BETWEEN inLeasingProxy.EffectiveStartDate AND inLeasingProxy.EffectiveEndDate)

I am doing a left outer join with a reference table called

inLeasingProxy

This is a small file with only 2MB size. However the transaction file called

inVistaNonVistaComb will have a size 1 GB - 2GB

The problem here is that, for some transaction files(1GB), This query will execute very fast (10 minutes). But for some trasaction files(1GB) , this query will run for 2 hours.

How can i tune this query? Please help

1 REPLY
Moderator

Re: Tuning the query in tSQL in talend BigData Platform

Hello,

We have redirected your issue to our Bigdata expert and then come back to you as soon as we can.

Thanks for your time.

Best regards

Sabrina

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.