remove lines from list

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

remove lines from list

Post by sheridany »

I am cleaning up lots of junk email lines that come in to our support center and there are lots of different patterns or lines to look for and remove the particular line.
I started to build each one filter by filter and each one works great.
I tried using the remove lines from a list and it would not match any of the previous matches done in each remove filter.

Why does it not work?

I tried both easy pattern and Perl and nothing works.

I was under the impression it would work like the pattern matching list in search and replace from a list?
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

Posting filter clipboard output

Post by sheridany »

I thought I would post the filter clipboard output for reference. I would like to refer to multiple search patterns to remove lines versus having to do each. I still cannot get remove lines from list.

Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Convert End of Lines - Auto to DOS
| [X] Remove bad EOL
|
|--Remove blanks from Start of Line
|
|--Remove blanks from End of Line
|
|--Remove matching lines [SpamArrest]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [Reply to Reply]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [PlanetOut]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Restrict fields:Pipe-delimited field 1 .. field 1
| | [X] Process fields individually
| | [ ] Exclude delimiter
| | [ ] Exclude quotes (if present)
| | Delimiter Type: 3
| | Custom delimiter:
| | [ ] Has Header
| |
| +--EasyPattern [[(longest digit or character or punctuation) 'quoted-printable']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[linestart(longest 1 to 2 digits),'/',(longest 1 to 2 digits),'/',(longest 1 or more digits)]] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 8bit']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 7bit']] with [' ']
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest 1 to 3 digits,'-',longest 1 to 2 digits,'-',longest 1 to 4 digits)]] with [xxx-xx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[('('3 digits')','-'3 digits,'-',4 digits)]] with [(xxx)-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[( 3 digits, '-', 3 digits,'-',4 digits)]] with [xxx-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
+--Output to file(s)
[ ] Only update date on changed files
[ ] Append mode
[ ] Change extension to: .txt
[ ] Open output file
Only output modified files Backup mode
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

out of memory error

Post by sheridany »

I am also getting frequent out of memory error after running the trial several times either testing a filter or the entire filter list. I have opened a support ticket but they have not gotten back to me yet. I am using a 1.8 mhz intel processor with 1 gig of memory on windows xp. It would seem that would suffice for using this software?
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Probably very inefficient filter

Post by dfhtextpipe »

Very inefficient filters can cause TextPipe to slow down to almost a stop, or give out of memory errors.
sheridany
Posts: 36
Joined: Thu Nov 15, 2007 4:20 am

Inefficient filters

Post by sheridany »

How can I tell if it is inefficient. I have been working on tweaking the pattern matching and remove lines and this runs. I am not sure how to improve but I am open to suggestions. Thanks in advance for advice.

Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Convert End of Lines - Auto to DOS
| [X] Remove bad EOL
|
|--Remove blanks from Start of Line
|
|--Remove blanks from End of Line
|
|--Remove matching lines [['PlanetOut' or 'SpamArrest' or 'Reply to Reply' or '7bit' or '8bit' or 'whitelist' or 'mailguard' or '-=_Part_' or LineStart('DOCTYPE') or Linestart('<') 'On-Line Drugstore']]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [X] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Remove matching lines [[lineStart('"<html>') or lineStart('"<!DOCTYPE') or lineStart('C A N A D A ') or lineStart('"C A N A D A On-Line Pharmacy') or lineStart('"C.a.n.a.d.a On-Line Pharmacy') or lineStart('CA On-Line Pharmacy is C.a.n.a.d.a’s most treliable')or lineStart('table border') or lineStart('DEC2007' or 'OCT2007' or'NOV2007')]]
| [ ] Include line numbers
| [ ] Include filename
| [ ] Match case
| [ ] Count matches
| Pattern type: 4
| Context before: 0
| Context after: 0
|
|--Restrict fields:Pipe-delimited field 1 .. field 1
| | [X] Process fields individually
| | [ ] Exclude delimiter
| | [ ] Exclude quotes (if present)
| | Delimiter Type: 3
| | Custom delimiter:
| | [ ] Has Header
| |
| +--EasyPattern [[(longest digit or character or punctuation) 'quoted-printable']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[linestart(longest 1 to 2 digits),'/',(longest 1 to 2 digits),'/',(longest 1 or more digits)]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 8bit']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest digits or character or punctuation) 'Transfer-Encoding: 7bit']] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[(longest 1 to 3 digits,'-',longest 1 to 2 digits,'-',longest 1 to 4 digits)]] with [xxx-xx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[('('3 digits')','-'3 digits,'-',4 digits)]] with [(xxx)-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[( 3 digits, '-', 3 digits,'-',4 digits)]] with [xxx-xxx-xxxx]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart('"' 1 to 2 digits,'/', 1 to 2 digits,'/' 1 or more digits \t )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(1digit \t )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(longest 1 to 5 spaces or tab or punctuation)]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart( digits, '-', longest 1 or more digits, '-', longest 1 or more digits, '=:', longest 1 to 5 digits )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
EasyPattern [[lineStart(longest 1 or more digits or letters or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|>, '_Content-Transfer-Encoding', longest 1 to 2 digits or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> )]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
|
|--Remove multiple whitespace
|
|--Remove duplicate lines
| [ ] Ignore case
| Start column 1
| Length 5500
| [ ] Include One
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

If you're just removing lines, you can combine those filters into one big 'mother of all filters'. That will be a lot faster.

How big is the Trial Run that you are processing?

TextPipe is optimized for disk throughput, so try putting the text in a file and processing that.

Let us know how you go
Post Reply