One Star

How Can I Change the Batch Size

Bonjour,
Is there a way to change the "row batch size" of the records being processed when I run a DQ Analysis Report? When processing, TOS DQ seems to process roughly around 5000 records at a time.
We find it difficult to run a column analysis when we have a 500000 rows in a table.
Any idea how to improve our experience?
Thanks.
PM
5 REPLIES
One Star

Re: How Can I Change the Batch Size

If anyone know the answer Please provide me.
1) Talend takes more disk space during the execution of the job for the ETL process?
2) Talend creates files somewhere that increase the disk space in time?
3) Do you have some suggestions regarding the disk space reservation or a best practice?
-------------------------------------------------------------------------
The Job configuration uses only the following component:
tFileList
tFileinputdelimited
tMap
tAggregateRow
tOracleInput
tUnite

Thanks,
Girish
One Star

Re: How Can I Change the Batch Size

Hi Everyone,
Please find the below query.. I want only columns name can u please help me to how can i create Regular Expression job
Regular Expression Query:-
^({7,15})\s # client_ip
-\s # unused IDENT field
-\s # unused USER field
\\d{4}))\]
# request time :HH:mm:ss -0800
\s"(GET|POST)\s # HTTP verb
(*) # HTTP URI
\sHTTP/1\."\s # HTTP version
(\d{3})\s # HTTP status code
(\d+)\s # bytes returned
"(+)"\s # referrer field
" # User agent parsing, always quoted.
"? # Sometimes if the user spoofs the user_agent, they incorrectly quote it.
( # The UA string
*? # Uninteresting bits
(?:
(?:
rv: # Beginning of the gecko engine version token
(?={3,15}) # ensure version string size
( # Whole gecko version
(\d{1,2}) # version_component_major
\.(\d{1,2}{0,8}) # version_component_minor
(?:\.(\d{1,2}{0,8}))? # version_component_a
(?:\.(\d{1,2}{0,8}))? # version_component_b
)
* # More uninteresting bits
)
|
* # More uninteresting bits
)
) # End of UA string
"?
"
=================================

Capture Group Fields: Add the following all with Type String
i.client_ip
ii.full_request_date
iii.day
iv.month
v.year
vi.hour
vii.minute
viii.second
ix.timezone
x.http_verb
xi.uri
xii.http_status_code
xiii.bytes_returned
xiv.referrer
xv.user_agent
xvi.firefox_gecko_version
xvii.firefox_gecko_version_major
xviii.firefox_gecko_version_minor
xix.firefox_gecko_version_a
xx.firefox_gecko_version_b
Employee

Re: How Can I Change the Batch Size

@ptremblay, column set analysis require indeed a lot of memory when there too many distinct rows.
One way to avoid memory issue is to not store the data in the analysis file (there is an option in the analysis editor for that).
Another way to avoid crashes is to fine tune the memory in the preference page "profiling>analysis tuning"
If you are executing column analyses (not column set analysis), then some indicators also require some memory (either from the DBMS server side or from the studio if you are using the java engine).
Please, provide more details about what you're doing.
@Grirish_Shiva, please open another thread in our TOS for DQ forum for your question as it's not related to the data profiling product.
One Star

Re: How Can I Change the Batch Size

Hi Everyone,
I want to create regex job to using below query. I need only columns name
Please help how to create regex job
Please tell me which transformations i have to use.. if possible please share job screen short..
Thanks a lot..
Regular Expression Query:-
^({7,15})\s # client_ip
-\s # unused IDENT field
-\s # unused USER field
\\d{4}))\]
# request time :HH:mm:ss -0800
\s"(GET|POST)\s # HTTP verb
(*) # HTTP URI
\sHTTP/1\."\s # HTTP version
(\d{3})\s # HTTP status code
(\d+)\s # bytes returned
"(+)"\s # referrer field
" # User agent parsing, always quoted.
"? # Sometimes if the user spoofs the user_agent, they incorrectly quote it.
( # The UA string
*? # Uninteresting bits
(?:
(?:
rv: # Beginning of the gecko engine version token
(?={3,15}[]) # ensure version string size
( # Whole gecko version
(\d{1,2}) # version_component_major
\.(\d{1,2}{0,8}) # version_component_minor
(?:\.(\d{1,2}{0,8}))? # version_component_a
(?:\.(\d{1,2}{0,8}))? # version_component_b
)
* # More uninteresting bits
)
|
* # More uninteresting bits
)
) # End of UA string
"?
"
Moderator

Re: How Can I Change the Batch Size

Hi, Grirish_Shiva
Please create a new topic for your regex job so that more professional guys in forum will see your requirement and help you asap, Thanks a lot.
Best regards
Sabrina
--
Don't forget to give kudos when a reply is helpful and click Accept the solution when you think you're good with it.