Page 1 of 1

slow running program

Posted: Thu May 21, 2009 9:16 am
by sheridany
Does anyone have a suggestion for making this go faster. it is abysmally slow. It is about 400,000 records in a csv file and we have to remove a lot of colums individually.

Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Skip binary files
| Sample size 4 characters
|
|--Remove fields:Comma-delimited field 185 .. field 185
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 162 .. field 162
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 139 .. field 139
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 115 .. field 115
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 101 .. field 101
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 82 .. field 82
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 72 .. field 72
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 62 .. field 62
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 58 .. field 58
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 30 .. field 30
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 26 .. field 26
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 18 .. field 18
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 15 .. field 16
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 13 .. field 13
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 11 .. field 11
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 7 .. field 9
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 4 .. field 4
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 1 .. field 2
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Replace list: C:\Documents and Settings\youngs\Desktop\email analysis 2008\cleanuptaxonomy.xls EasyPattern
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--EasyPattern [[1 or more char'/.']] with [_]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--EasyPattern ["] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--EasyPattern [ 12:00,] with [,]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
+--Merge output to file C:\Documents and Settings\youngs\Desktop\finalirtaxonomybp.csv

Re: slow running program

Posted: Tue May 26, 2009 11:29 am
by DataMystic Support
How many entries are in your replace list C:\Documents and Settings\youngs\Desktop\email analysis 2008\cleanuptaxonomy.xls ?

It will be much faster if you load this as a .csv or .tab format instead of .xls

Re: slow running program

Posted: Tue May 26, 2009 11:02 pm
by sheridany
There are approximately 190 entries in the replace list. I will try the csv format.

Re: slow running program

Posted: Wed May 27, 2009 1:52 am
by sheridany
It is still running very slow even after converting the search replace list into a csv file. :shock:

Re: slow running program

Posted: Wed May 27, 2009 7:45 am
by sheridany
The issue seems to be in the search and replace list. It is exact in nature so perhaps I am using the wrong filter? There are 190 search and replace rows that look like this?

Sales & Service Taxonomy./Cashback,Cashback
Sales & Service Taxonomy./Address Change,Addr_Chg
Sales & Service Taxonomy./Legal Ref,Legal_Ref
Sales & Service Taxonomy./Rewards,Rewards
Sales & Service Taxonomy./Cancellations/Savings Account,Cancel_SAV_Acct
Sales & Service Taxonomy./Cancellations/Checking Account,Cancel_CHK_Acct

Re: slow running program

Posted: Wed May 27, 2009 8:25 am
by DataMystic Support
I'd use a perl search/replace or EasyPattern search/replace- Exact matches can be slower.

Re: slow running program

Posted: Wed May 27, 2009 8:48 am
by sheridany
The only way I can see to do it currently is break it up into two processes. the first step is remove the unwanted columns and output to a csv file. Step 2 is a separate fll that does the search and replace using EP and some other EP cleanups. It runs much faster this way.

Re: slow running program

Posted: Wed May 27, 2009 10:47 am
by DataMystic Support
Strange that it should run faster like that.

Is it possible to send us your filter, the test file and the search/replace list so we can try and optimize it?

Re: slow running program

Posted: Fri May 29, 2009 8:21 am
by sheridany
I finally got it run much better all in one program but I am not see the results I expected in removing columns. I read in the help guide to remove columns from right to left but it is still leaving in columns that I wanted to remove. Do I have to shift the position by -1 every time I remove a column to get the correct next column.

Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Remove fields:Comma-delimited field 190 .. field 190
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 187 .. field 187
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 185 .. field 185
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 162 .. field 162
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 146 .. field 146
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 139 .. field 140
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 134 .. field 134
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 130 .. field 130
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 127 .. field 127
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 115 .. field 116
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 110 .. field 110
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 101 .. field 101
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 89 .. field 89
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 43 .. field 43
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 82 .. field 82
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 75 .. field 75
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 72 .. field 72
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 62 .. field 63
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 58 .. field 58
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 43 .. field 43
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 36 .. field 36
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 30 .. field 30
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 26 .. field 26
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 18 .. field 18
| Delimiter Type: 0
| Custom delimiter:
| [X] Has Header
|
|--Remove fields:Comma-delimited field 15 .. field 16
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Remove fields:Comma-delimited field 13 .. field 13
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Remove fields:Comma-delimited field 11 .. field 11
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Remove fields:Comma-delimited field 7 .. field 9
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Remove fields:Comma-delimited field 4 .. field 4
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Remove fields:Comma-delimited field 1 .. field 2
| Delimiter Type: 0
| Custom delimiter:
| [ ] Has Header
|
|--Replace list: C:\Documents and Settings\youngs\Desktop\email analysis 2008\cleanuptaxonomy.csv EasyPattern
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--EasyPattern ["] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--EasyPattern [ 12:00,] with [,]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|

Re: slow running program

Posted: Fri May 29, 2009 9:08 am
by DataMystic Support
If you process columns from right to left then no - you don't have to adjust column numbers. If you started from left to right then you would have to take into account your deletions when choosing column numbers, which is much trickier.