Spark partition before join

Four Stars

Spark partition before join

Hi Quick question , in SPARK -Talend .

In spark In order to join data, Spark needs the data that is to be joined (i.e., the data based on each key) to live on the same partition

If we are using Any key based components like  tmap or Join in tsql  is it wise to just use these components without partitioning for small files and rely on spark repartitioning the lookup flow based on mainflow. 

is there a guide line on when we should necessarily partition vs when we can rely on Spark Framework re partitioning .   especually if lookup data is big for broadcast but not too heavy  either like > 1GB.