A Job using a tSqoopImport component hangs during the Job run

Problem Description

A Job is designed with a tSqoopImport component to import data from Oracle to HDFS. The Job hangs without errors or warnings.

 

Root Cause

Running the jstack utility on the Job process to collect the stack trace, shows that the read call of Oracle is active all the time:

 

"main" prio=6 tid=0x0000000001375000 nid=0x9ac4 runnable [0x00000000012ce000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at oracle.net.ns.Packet.receive(Packet.java:311)
at oracle.net.ns.DataPacket.receive(DataPacket.java:105)
at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:305)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:249)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:171)
at oracle.net.ns.NetInputStream.read(NetInputStream.java:89)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:123)
at oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:79)
at oracle.jdbc.driver.T4CMAREngineStream.unmarshalUB1(T4CMAREngineStream.java:426)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:390)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:249)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:566)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:202)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:45)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:766)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034)
at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1244)
- locked <0x00000000ef836ca0> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:420)

 

Solution

The Job is waiting on an Oracle read call. This is an Oracle database performance issue and is not related to Talend, contact your Database Administrator for assistance.

 

Note:

To further isolate the issue with database and Talend server components, perform the following tests:

  1. Use the telnet, ping, and traceroute commands from the Talend Job server to the database server, and verify that the communication with the database is healthy and remove any latency issues.

  2. Verify that there are no firewall issues. There could be idle sockets established by JDBC connections to the database, which could lead to the socket used by the JDBC driver not closing.

  3. Check the blocking sessions at the database level by using a v$session table. The following query returns a list of active blocking sessions and the sessions that they are blocking:

    select blocking_session,sid,serial#, wait_class,seconds_in_wait from v$session where blocking_session is not NULL order by blocking_session;
  4. Reproduce this outside of Talend by running SQL queries on the Job server machine. You can run the queries using the Oracle sqlplus utility.

Version history
Revision #:
6 of 6
Last update:
‎02-24-2019 11:17 PM
Updated by:
 
Comments
Employee

The stack indicates the problem at the connection level where the main thread is waiting to establish a socket connection.This is not an Oracle DB performance issue but more of network issue where the packets probably are being delayed while invoking an execute query on DB. Can you please describe the solution clearly and what measures one must take to resolve such hogging/stuck threads situations.

Employee

Hi @rpatel,

 

    Thanks for reviewing this article. 

 

    The thread hanging on socketRead() can happen either for network slowdown or a slow DB responses. Please refer the following thread. 

 

    https://www.ibm.com/developerworks/community/blogs/aimsupport/entry/threads_hung_in_socketread_waiti...

 

    In this use case, customer is not having any issues in their network. Hence, we isolated the issue to DB performance. I can provide a note in the article telling that network issues can also shows the same stack trace. 

 

Thanks,

Karthick