Hi,
I need to extract a segment from a number of files, so I'm trying to find out if TextPipe will meet my needs, rather than writing custom Perl. The segment begins with either "underwriters" or "underwriting" alone on a line, and ends with a line beginning with the word "total". I've tried a couple of different attempts at filters in TextPipe using EasyPattern and Perl, and I'm not getting the results I expect.
Before I use any other filters, I remove blanks from the beginning and end of lines, and remove blank lines. I also remove ANSI codes and binary characters, just in case.
The EasyPattern filter I'm trying looks like this:
[lineStart, ('underwriting' or 'underwriters'), lineEnd, capture(1+ (1+ paragraph, optional paragraphDelimiter)), lineStart, 'total']
not case sensitive.
Then I replace with:
*** underwriters section: $1
This pattern never matches; I don't get a "*** underwriters section" line in my output.
I can get it to return something this way:
Find:
[lineStart, ('underwriting' or 'underwriters'), lineEnd]
[capture(0+ (0+ paragraphchar, optional paragraphdelimiter))]
[capture(0+ (0+ paragraphchar, optional paragraphdelimiter))]
Replace:
*** begin underwriters section: $0
*** underwriter next line text: $1
*** underwriter next line 2 text: $2
I only get 2 lines following the section heading, though, and there may be an arbitrary number of lines I need to capture. So the problem seems to be that it won't keep matching after a line break. I'm testing this on content pasted into the trial area, in case that matters.
Then I tried using Perl regular expressions instead. My perl pattern looks like this:
^(underwriting|underwriters)$(.*)^Total
(not case sensitive)
Replace with:
*** new underwriter section: $1 $2
I get this result:
*** new underwriter section: UNDERWRITING
but I don't get any of the additional content between that text and the word "Total", which I should, shouldn't I?
But if I do it this way:
^(underwriting|underwriters)$
(.*)^Total
(note the line break between the first and second captures)
Then I get
*** new underwriter section: UNDERWRITING General
where "General" is the contents of the next line. I can add more (*.) on additional lines in the find section and get more lines, but again, I need to get an arbitrary number of lines until it hits the word "Total" at the beginning of a line. How can I do this?
Thanks,
Elizabeth Dalton
Match to bounded area
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Hi Elizabeth,
1. Both the EasyPattern [lineEnd] and the perl pattern $ match a position ('just before the end of the line') rather than a character (a [crlf] or \r\n).
You're matching a position but no characters, so you need to rewrite your pattern as
You may need to adjust this to suit the text - if you email us an example we can help.
2. Yes, I think you should be getting something in $2. Could you please email us your filter?
1. Both the EasyPattern [lineEnd] and the perl pattern $ match a position ('just before the end of the line') rather than a character (a [crlf] or \r\n).
You're matching a position but no characters, so you need to rewrite your pattern as
Code: Select all
[lineStart, ('underwriting' or 'underwriters'), lineEnd, cr, lf, capture(0+ paragraph, optional paragraphDelimiter), lineStart, 'total']
2. Yes, I think you should be getting something in $2. Could you please email us your filter?