Four Stars

Weird problem with Hive Input node

Hi,
 
I have been experiencing a weird problem. On a tHiveInput I have a simple select query (select A, B, C from Table). Now let’s assume I have 1000 records in which 4 of those records have A=1. Thus, running this query I would expect the thousand rows from the input table (in which A = 1 appears 4 times). Although that does not happen. There is one of the records from A that is missing (so, A=1 appears only 3 times). This gets even more weird as the record appears when I add to the query the where statement: where A = 1. The query has no filters at all so I do not understand why there is a missing record and specially why there isn't when I introduce the where statement. I double checked the schema and simplified it as much as possible but I still have the same behavior. 

 

Thanks, in advance

 

  • Data Quality
2 REPLIES
Moderator

Re: Weird problem with Hive Input node

Hello,

Would you mind posting your job design screenshot into forum which will be helpful for us to address your issue? Elaborating your case with an example with input and expected output values will be preferred.

Best regards

Sabrina

 

--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.
Four Stars

Re: Weird problem with Hive Input node

Hi,

 

thanks for answering. So the job is quite simple (although I am only putting the part that produces the error). The problem that in the first node which has a query such as the following one:

SELECT 
  a.A, 
  a.B, 
  a.C, 
  a.D

FROM Table_1

the code above produces 3 rows for A = '1'

however, with the following code: 

SELECT 
  a.A, 
  a.B, 
  a.C, 
  a.D

FROM Table_1
where A='1'

The output is 4 rows.

 

Thanks in advance