How big a file?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

How big a file?

Post by sheridany »

I have a file that has a 1MM rows (22 columns csv format )daily that needs to be cleaned extensively. How many rows can TP handle on a single cpu workstation or is that even an option. Whats the best way to utilize TP for a job this big.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How big a file?

Post by DataMystic Support »

TP can handle billions of rows of CSV data. Just point it at the file with a list of filters.

What cleansing does it need?
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

Re: How big a file?

Post by sheridany »

The usual cleanup some search and replace remove blanks trim leading and trailing etc. The usual TP stuff. From a deployment standpoint and a ETL perspective we would like to load the clean data into a database after TP has processed the file. How might we do that?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How big a file?

Post by DataMystic Support »

I assume you want to trim blanks on each field in turn rather than with entire lines, so use Filters\Restrict\Delimited fields (CSV, Tab, Pipe, etc) to restrict to each field in turn, and inside this filter add the trim filters.

You will then need to modify the CSV to add Filters\Add\Left margin of

Code: Select all

insert into tablename () values (
and a Filters\Add\Right margin of

Code: Select all

);
Then add a Filters\Special\Database connection as the last step.
Post Reply