Extract Question/Problem

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply

Extract Question/Problem

Post by Fodor »

Hello, all!

I'm evaluating TextPipe Pro to data mine a web site. The site has nested tables, but I've found a way to get to the data with an extract and a regular expression rather than having to manually remove the unneeded tables. First, I convert the UNIX EOL characters to DOS and remove all leading and trailing whitespace. Then, I try to use the following:

^(<p align=left>).*.$

The problem is that it doesn't work. While this regular expression should match any line (and the entire line) starting with "<p align=left>", when I run the filter, TextPipe finds the first "<p align=left>" and returns it with the remainder of the file following the first "<p align=left>".

It looks like TextPipe might be seeing the ".*" and matching the EOL characters rather than stopping at the ".$". Is that the problem? If so, is it a poorly formed regular expression?

What I'm I doing wrong?


Extract Question/Problem Follow-up

Post by Fodor »

I've changed my regular expression to:

^(<p align=left>).*(<br>)$

(matching any line (the entire line) beginning with "<p align=left>" and ending with "<br>")

This matches 0 items in my input file, although there are 4 such lines in the file.

Any ideas?

User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia

Post by DataMystic Support »

'.' by default matches new lines - check the pattern settings.

You could use [^\r\n] instead of '.' to prevent this.
Post Reply