Hello, all!
I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:
^(<p align=left>).*.$
The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "<p align=left>", when I run the filter, TextPipe finds the first "<p align=left>" and returns it with the remainder of the file following the first "<p align=left>".
It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?
What I'm I doing wrong?
Thanks!
Extract Question/Problem
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
Extract Question/Problem Follow-up
I've changed my regular expression to:
^(<p align=left>).*(<br>)$
(matching any line (the entire line) beginning with "<p align=left>" and ending with "<br>")
This matches 0 items in my input file, although there are 4 such lines in the file.
Any ideas?
Thanks!
^(<p align=left>).*(<br>)$
(matching any line (the entire line) beginning with "<p align=left>" and ending with "<br>")
This matches 0 items in my input file, although there are 4 such lines in the file.
Any ideas?
Thanks!
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact: