Page 1 of 1
remove lines from list
Posted: Wed Jan 23, 2008 11:29 am
by sheridany
I am cleaning up lots of junk email lines that come in to our support center and there are lots of different patterns or lines to look for and remove the particular line.
I started to build each one filter by filter and each one works great.
I tried using the remove lines from a list and it would not match any of the previous matches done in each remove filter.
Why does it not work?
I tried both easy pattern and Perl and nothing works.
I was under the impression it would work like the pattern matching list in search and replace from a list?
Posting filter clipboard output
Posted: Thu Jan 24, 2008 5:55 am
by sheridany
I thought I would post the filter clipboard output for reference. I would like to refer to multiple search patterns to remove lines versus having to do each. I still cannot get remove lines from list.
Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Convert End of Lines - Auto to DOS
| [X] Remove bad EOL
|
|--Remove blanks from Start of Line
|
|--Remove blanks from End of Line
|
|--Remove matching lines [SpamArrest]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [Reply to Reply]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [PlanetOut]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Restrict fields:Pipe-delimited field 1 .. field 1
| | [X] Process fields individually
| | [ ] Exclude delimiter
| | [ ] Exclude quotes (if present)
| | Delimiter Type: 3
| | Custom delimiter:
| | [ ] Has Header
| |
| +--EasyPattern [[(longest digit or character or punctuation) 'quoted-printable']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[linestart(longest 1 to 2 digits),'/',(longest 1 to 2 digits),'/',(longest 1 or more digits)]] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 8bit']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 7bit']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest 1 to 3 digits,'-',longest 1 to 2 digits,'-',longest 1 to 4 digits)]] with [xxx-xx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[('('3 digits')','-'3 digits,'-',4 digits)]] with [(xxx)-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[( 3 digits, '-', 3 digits,'-',4 digits)]] with [xxx-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
+--Output to file(s)
[ ] Only update date on changed files
[ ] Append mode
[ ] Change extension to: .txt
[ ] Open output file
Only output modified files Backup mode
out of memory error
Posted: Thu Jan 24, 2008 3:27 pm
by sheridany
I am also getting frequent out of memory error after running the trial several times either testing a filter or the entire filter list. I have opened a support ticket but they have not gotten back to me yet. I am using a 1.8 mhz intel processor with 1 gig of memory on windows xp. It would seem that would suffice for using this software?
Probably very inefficient filter
Posted: Thu Jan 24, 2008 9:57 pm
by dfhtextpipe
Very inefficient filters can cause TextPipe to slow down to almost a stop, or give out of memory errors.
Inefficient filters
Posted: Sat Jan 26, 2008 4:41 am
by sheridany
How can I tell if it is inefficient. I have been working on tweaking the pattern matching and remove lines and this runs. I am not sure how to improve but I am open to suggestions. Thanks in advance for advice.
Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Convert End of Lines - Auto to DOS
| [X] Remove bad EOL
|
|--Remove blanks from Start of Line
|
|--Remove blanks from End of Line
|
|--Remove matching lines [['PlanetOut' or 'SpamArrest' or 'Reply to Reply' or '7bit' or '8bit' or 'whitelist' or 'mailguard' or '-=_Part_' or LineStart('DOCTYPE') or Linestart('<') 'On-Line Drugstore']]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [X] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [[lineStart('"<html>') or lineStart('"<!DOCTYPE') or lineStart('C A N A D A ') or lineStart('"C A N A D A On-Line Pharmacy') or lineStart('"C.a.n.a.d.a On-Line Pharmacy') or lineStart('CA On-Line Pharmacy is C.a.n.a.d.a’s most treliable')or lineStart('table border') or lineStart('DEC2007' or 'OCT2007' or'NOV2007')]]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Restrict fields:Pipe-delimited field 1 .. field 1
| | [X] Process fields individually
| | [ ] Exclude delimiter
| | [ ] Exclude quotes (if present)
| | Delimiter Type: 3
| | Custom delimiter:
| | [ ] Has Header
| |
| +--EasyPattern [[(longest digit or character or punctuation) 'quoted-printable']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[linestart(longest 1 to 2 digits),'/',(longest 1 to 2 digits),'/',(longest 1 or more digits)]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 8bit']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 7bit']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest 1 to 3 digits,'-',longest 1 to 2 digits,'-',longest 1 to 4 digits)]] with [xxx-xx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[('('3 digits')','-'3 digits,'-',4 digits)]] with [(xxx)-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[( 3 digits, '-', 3 digits,'-',4 digits)]] with [xxx-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart('"' 1 to 2 digits,'/', 1 to 2 digits,'/' 1 or more digits \t )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(1digit \t )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(longest 1 to 5 spaces or tab or punctuation)]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart( digits, '-', longest 1 or more digits, '-', longest 1 or more digits, '=:', longest 1 to 5 digits )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(longest 1 or more digits or letters or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|>, '_Content-Transfer-Encoding', longest 1 to 2 digits or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--Remove multiple whitespace
|
|--Remove duplicate lines
| [ ] Ignore case
| Start column 1
| Length 5500
| [ ] Include One
Posted: Tue Feb 05, 2008 8:06 pm
by DataMystic Support
If you're just removing lines, you can combine those filters into one big 'mother of all filters'. That will be a lot faster.
How big is the Trial Run that you are processing?
TextPipe is optimized for disk throughput, so try putting the text in a file and processing that.
Let us know how you go