Extract several parts of a line

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Extract several parts of a line

Post by gerd »

Hi,
I have tried a couple of things with Perl pattern to extract some parts from the following lines of sample text. In Detail: I would like to extract 3 parts of the line and combine the extraced $ parts in a new line. The parts are here:
- text between href="..."
- text between title="..."
and if it would even be possible text between first >...<span>
Sample Text:
<div class="tab"><a target="_top" href="viewforum.php?f=17&start=100" title="Page 3 TEST">Linktext Example John<span></span></a></div>
<div class="tab"><a title="Page 2 text" target="_top" href="viewforum.php?f=17&start=50" >Linktext Example Bob<span></span></a></div>

Required results:
<a title="Page 3 TEST" class="" href="viewforum.php?f=17&start=100">Linktext Example John</a>
<a title="Page 2 text" class="" href="viewforum.php?f=17&start=50">Linktext Example Bob</a>

I assume that such a "Replace" will not work.
Repace with <a title="$1" class="" href="$2">$3</a>

So far, I have only managed to use separate filters like
title="(.*)"
href="(.*)"
output to clipboard and copyafter afterwards the results in two different columns in an excel file and combine them in a third column. I guess it will work with Textpipe safer and quicker but I do not know any further. Could show me the filter?
thanks gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Extract several parts of a line

Post by DataMystic Support »

Why don't you use

Code: Select all

href=".*" title=".*">
And replace with

Code: Select all

$1,$2
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Extract several parts of a line

Post by gerd »

Hi Simon,
href="(.*)" title="(.*)" or href="(.*)" title="(.*)"> Both show not the requested results. Maybe my version Textpipe Lite 8.4.8 does not support this.
If i use your code
href=".*" title=".*">
I receive the error message: $2 is not a valid subexpression identifier

I assume there is general filter available for extracting several parts of text from a line, like
<a title="Page 3 TEST" class="" href="viewforum.php?f=17&start=100">Linktext Example John</a>
for extracting the attributes
1) the title
2) the URL
3) the Linktext
But again maybe Textpipe Lite does not support this or could you attach me the filter?
Thanks gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Extract several parts of a line

Post by DataMystic Support »

Sorry - should have been

Code: Select all

href="(.*)" title="(.*)">
or better:

Code: Select all

href="(.*)" title="(.*)">(.*)<
Then $3 is the link text.
Post Reply