Extract Question/Problem

Fodor · Post by **Fodor** » Fri Dec 19, 2003 4:14 am

Hello, all!

I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:

^().*.$

The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "", when I run the filter, TextPipe finds the first "" and returns it with the remainder of the file following the first "".

It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?

What I'm I doing wrong?

Thanks!

Fodor · Post by **Fodor** » Fri Dec 19, 2003 4:26 am

I've changed my regular expression to:

^().*( )$

(matching any line (the entire line) beginning with "" and ending with " ")

This matches 0 items in my input file, although there are 4 such lines in the file.

Any ideas?

Thanks!

Post by **DataMystic Support** » Fri Dec 19, 2003 8:20 am

'.' by default matches new lines - check the pattern settings.

You could use [^\r\n] instead of '.' to prevent this.

Extract Question/Problem

Extract Question/Problem

Extract Question/Problem Follow-up