How can a select count(1) from tablename return a null?

Eight Stars

How can a select count(1) from tablename return a null?

I have a job that is generating and executing SQL on Hive, including generating table names.

 

It populates those tables, and then attempts to get a count from them.

 

The count statement, as shown in the job log, is this:

[tLogRow_3] content: select 'phil', 'iter', '20190920', count(1) from staging.traits_phil_iter_20190920_orc

So it's selecting the elements of the generated table name, and the row count from that table.

 

If I run that in a separate Hive SQL client, it works fine and I get something like this:

phil, iter, 20190920, 2094

If I run it in a standalone job that just runs that hard coded SQL, I get this:

.-----------+----+--------+-----.
|           tLogRow_1           |
|=----------+----+--------+----=|
|region_name|type|date    |count|
|=----------+----+--------+----=|
|phil       |iter|20190920|2094 |
'-----------+----+--------+-----'

However, in the actual job that generates the SQL to create a view, create the table if it does not exist, insert data into it, and then run the count, I get this:

.-----------+----+--------+-----.
|           tLogRow_7           |
|=----------+----+--------+----=|
|region_name|type|date    |count|
|=----------+----+--------+----=|
|phil       |iter|20190920|null |
'-----------+----+--------+-----'

So everything works just fine apart from the count(1) returning null! Select count(1) should never ever return null! If the table does not exist, it should fail. If the table exists but is empty, it should return 0. null should be impossible!

 

Any suggestions?

 

The job structure is that I have three iterate links, which in order do three things: create external tables and views and insert, then the select count(1) written into a tHashOutput, and thirdly reads the results of the select count(1) form the tHashOutput and creates a view if there is any data. 

 

From this screenshot you can see the tLogRow_3 and tLogRow_7 are either side of the tHiveRow_2 component that runs the SQL, which is just running row6.content.

selectcountnull.png

As you can see, the first iterate executes 4 statements, which are the table and view creation and insert. The second iterate executes 1 statement, which is the count from the table that was inserted into. The final iterate creates a view, but only if the count from the table inserted into is greater than zero. The problem is: I get a null from the count!

Tags (1)

Accepted Solutions
Eight Stars

Re: How can a select count(1) from tablename return a null?

I found the problem! It IS running the query, but a tHiveRow component does not return the results of the statement executed, it just runs it.

 

In order to get the results, you need to drop in a tFlowToIterate and connect that to a tHiveInput.

View solution in original post


All Replies
Eight Stars

Re: How can a select count(1) from tablename return a null?

So here are the relevant componets in more detail:

selectcountnull.pngselectcountnull2.png

And the log for the two tLogRow components:

.------------------------------------------------------------------------------------------+-----------+----+--------.
|                                                     tLogRow_3                                                      |
|=-----------------------------------------------------------------------------------------+-----------+----+-------=|
|content                                                                                   |region_name|type|date    |
|=-----------------------------------------------------------------------------------------+-----------+----+-------=|
|select 'phil', 'full', '20190920', count(1) from `staging`.`traits_phil_full_20190920_orc`|phil       |full|20190920|
'------------------------------------------------------------------------------------------+-----------+----+--------'
.-----------+----+--------+-----.
|           tLogRow_7           |
|=----------+----+--------+----=|
|region_name|type|date    |count|
|=----------+----+--------+----=|
|phil       |full|20190920|null |
'-----------+----+--------+-----'
Eight Stars

Re: How can a select count(1) from tablename return a null?

Ok, breaking news.. it isn't actually running the query. The output does not depend on the row6.content at all, the tHiveRow_6 component is just passing the variables through from row6 to row8 and setting the count output column to null.

Eight Stars

Re: How can a select count(1) from tablename return a null?

I found the problem! It IS running the query, but a tHiveRow component does not return the results of the statement executed, it just runs it.

 

In order to get the results, you need to drop in a tFlowToIterate and connect that to a tHiveInput.

View solution in original post

2019 GARTNER MAGIC QUADRANT FOR DATA INTEGRATION TOOL

Talend named a Leader.

Get your copy

OPEN STUDIO FOR DATA INTEGRATION

Kickstart your first data integration and ETL projects.

Download now

Put Massive Amounts of Data to Work

Learn how to make your data more available, reduce costs and cut your build time

Watch Now

How OTTO Utilizes Big Data to Deliver Personalized Experiences

Read about OTTO's experiences with Big Data and Personalized Experiences

Blog

Talend Integration with Databricks

Take a look at this video about Talend Integration with Databricks

Watch Now