Page 1 of 1

How big a file?

Posted: Wed Apr 27, 2011 2:07 pm
by sheridany
I have a file that has a 1MM rows (22 columns csv format )daily that needs to be cleaned extensively. How many rows can TP handle on a single cpu workstation or is that even an option. Whats the best way to utilize TP for a job this big.

Re: How big a file?

Posted: Thu Apr 28, 2011 12:21 pm
by DataMystic Support
TP can handle billions of rows of CSV data. Just point it at the file with a list of filters.

What cleansing does it need?

Re: How big a file?

Posted: Thu Apr 28, 2011 10:55 pm
by sheridany
The usual cleanup some search and replace remove blanks trim leading and trailing etc. The usual TP stuff. From a deployment standpoint and a ETL perspective we would like to load the clean data into a database after TP has processed the file. How might we do that?

Re: How big a file?

Posted: Fri Apr 29, 2011 10:14 am
by DataMystic Support
I assume you want to trim blanks on each field in turn rather than with entire lines, so use Filters\Restrict\Delimited fields (CSV, Tab, Pipe, etc) to restrict to each field in turn, and inside this filter add the trim filters.

You will then need to modify the CSV to add Filters\Add\Left margin of

Code: Select all

insert into tablename () values (
and a Filters\Add\Right margin of

Code: Select all

);
Then add a Filters\Special\Database connection as the last step.