Using .CSV files with relative paths (and making schema's for them)

One Star

Using .CSV files with relative paths (and making schema's for them)

I'm currently working on a Talend job which will be used for ETL operations between databases.
This job imports values in a .CSV file into a database. These .CSV files are supplied by another talend job which exports the data from a remote server. I've managed to make a working job, but the way the file paths work has been causing a little problem for me.
If you want to generate a schema based for the .CSV files you have to add them as metadata and then use them in the tFileInputDelimited components. When doing this Talend refers to these metadata files with absolute file paths. The problem with this is that the job will be deployed in many different environments, and these paths might not be correct for each enviroment.
For this reason I want to use relatives paths for these files. But when I use relative file paths (by assigning the working directory path to a local variable) I can't generate a schema for each .CSV like in the above example.
Since the .CSV files can differ, I would like to be able to generate these. Typing them by hand would be too much of a chore for each deployment (each deployment might need to be edited).
Right now I use a work-around. I first manually add the .CSV file as metadata to the project and use it in the tFileInputDelimited to generate the schema for the input and output (database output). Afterwards I remove the .CSV from the metadata and chance the path used by tFileInputDelimited to a relative path. This causes it to keep it's recent schema.
This work-around works, but it sounds a bit too convoluted. I was wondering if there is a way to read .CSV files, generate a schema and importing them into a database (tables will be created if they don't exist) using a relative path instead of an absolute path.
Does anyone understand my problem? Does someone have some tips for me?
Four Stars

Re: Using .CSV files with relative paths (and making schema's for them)

I'm not quite sure what you mean by "When doing this Talend refers to these metadata files with absolute file paths"
When you create the schema of a file or database table, Talend stores that metadata in the repository - and from that point, it's apart from the source file that was used to create the schema. Sure - if you run through the wizard, it refers to the file you used to create the schema. The resulting schema is what is stored in Talend's repository - either locally in XML files if you're using Open Studio or in SVN if you're using the Unified Platform. However, you can make changes to the schema once it created, and that schema can now be different from the original file used to create it.
When you go to use the schema, so long as the schema in the repository is what you want for each CSV file, you're good. You'd be selecting the schema from the 'repository'. As shown in the attached screenshot, the repository schema is used, but the file path of the sample file used to create it is irrelevant. However, the path of the file I want to read in now is specified in the File Path using a context variable etc...
Else - if I totally misunderstood your pain - could it be you're looking for a way to dynamically generate schema end to end? Check out https://help.talend.com/pages/viewpage.action?pageId=5671283 For this to work though, you have to be using Talend Enterprise...
Nine Stars

Re: Using .CSV files with relative paths (and making schema's for them)

What type of database are you reading from?
What type of database are you writing to?
Could you share some of the values in a .CSV file so we can see the structure and what type of data you are trying to work with?
I have made jobs which successfully looked at a shared network drive, made a list of the current file names, loaded each filename to a context variable which was then used at different parts later in the job.
One Star

Re: Using .CSV files with relative paths (and making schema's for them)

When you go to use the schema, so long as the schema in the repository is what you want for each CSV file, you're good. You'd be selecting the schema from the 'repository'. As shown in the attached screenshot, the repository schema is used, but the file path of the sample file used to create it is irrelevant. However, the path of the file I want to read in now is specified in the File Path using a context variable etc...

This helped me a lot, I didn't realize I could use schema's from the repository in tFileOutputDelimited components, I thought the component always had to directly point to a Delimited file using a absolute path to use it as a schema.
Thanks for the help!