Problem Definition =============== The problem definition of the ETL job is as mentioned below. 1. The Configuration values like database credentials, log file location and name needs to be kept in a XML file a. Name and Location of the Log File can be changed without modifying the ETL job b. By editing the configuration file, the user can change the database credentials for the source and target database. 2. The ETL job must support both Windows and Unix family operation system 3. Validation of configuration file needs to be done a. Whether the mentioned database credentials in the configuration file is correct or not needs to be informed to the user in the log file. Even the database credentials is correct and still it is not possible to connect to a database because might be the database is down then also the ETL job needs to log about this in the log file. b. The log file path mentioned in the configuration file is correct or not needs to be informed to the user in the console 4. The configuration file needs to be passed from command line because there are more than one instance of the job are expected to be executed at the same time. It means multiple instances of target database are having the same structure. So multiple instances of the same job having different configuration file can migrate the data from the source database in case we need to make the target database values same at the same time. The values in the configuration files like target database name, Ip address must be different in all the configuration files. 5. The command line configuration file name and location needs to be checked by the ETL job and should inform the user, in case it is wrong it must exit from the job. The ETL job can use the console to inform about the wrong command line configuration file name 6. The configuration file should not be loaded each and every time from the disk whenever the values in the configuration file needs to be used by any ETL sub job. It means the configuration file should not be loaded for each sub jobs those use the content of the configuration file. The configuration file must be loaded only once and the values must be kept in the memory and to be used by all sub jobs. 7. There should be a log file and that should tell about the execution of the main job and sub jobs. a. Information about start and end of each sub job and main job with status and time information should be kept in the log file. b. In case any record is rejected while inserting the data, it should be kept in the log file with date, time and with an error message c. Number of records fetched from source database and number of records processed and inserted into the target database must be kept in the log file d. The log files for each instance of the job must be different and the user needs to be advised to do so. The user should not use the same log file for all the instances get executed at the same time. Other wise the log file will contain garbage e. The ETL should manage to create log file according to date. It means the ETL will append the date value with the log file name mentioned in the configuration file.