Hi, I'm having an issue on a Spark batch job with talend. The job is pretty simple : it reads a file from HDFS, performs a left outer join with another file on HDFS (using a tMap) on a single key, and finally writes the result on HDFS. What I have noticed is weird : the resulting spark job performs a cogroup at one point and tries to gather all the dataset on a single task before writing it into HDFS ! Thus, if the dataset is big enough It results in an OutOfMemory error : java heap space. Why does talend handles the joins that way ? Is it possible to optimise it ? Walid.