Extract Question/Problem
Posted: Fri Dec 19, 2003 4:14 am
Hello, all!
I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:
^(<p align=left>).*.$
The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "<p align=left>", when I run the filter, TextPipe finds the first "<p align=left>" and returns it with the remainder of the file following the first "<p align=left>".
It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?
What I'm I doing wrong?
Thanks!
I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:
^(<p align=left>).*.$
The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "<p align=left>", when I run the filter, TextPipe finds the first "<p align=left>" and returns it with the remainder of the file following the first "<p align=left>".
It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?
What I'm I doing wrong?
Thanks!