Page 1 of 1

Extract several parts of a line

Posted: Thu Sep 16, 2010 8:52 pm
by gerd
Hi,
I have tried a couple of things with Perl pattern to extract some parts from the following lines of sample text. In Detail: I would like to extract 3 parts of the line and combine the extraced $ parts in a new line. The parts are here:
- text between href="..."
- text between title="..."
and if it would even be possible text between first >...<span>
Sample Text:
<div class="tab"><a target="_top" href="viewforum.php?f=17&start=100" title="Page 3 TEST">Linktext Example John<span></span></a></div>
<div class="tab"><a title="Page 2 text" target="_top" href="viewforum.php?f=17&start=50" >Linktext Example Bob<span></span></a></div>

Required results:
<a title="Page 3 TEST" class="" href="viewforum.php?f=17&start=100">Linktext Example John</a>
<a title="Page 2 text" class="" href="viewforum.php?f=17&start=50">Linktext Example Bob</a>

I assume that such a "Replace" will not work.
Repace with <a title="$1" class="" href="$2">$3</a>

So far, I have only managed to use separate filters like
title="(.*)"
href="(.*)"
output to clipboard and copyafter afterwards the results in two different columns in an excel file and combine them in a third column. I guess it will work with Textpipe safer and quicker but I do not know any further. Could show me the filter?
thanks gerd

Re: Extract several parts of a line

Posted: Fri Sep 17, 2010 9:06 am
by DataMystic Support
Why don't you use

Code: Select all

href=".*" title=".*">
And replace with

Code: Select all

$1,$2

Re: Extract several parts of a line

Posted: Fri Sep 17, 2010 6:56 pm
by gerd
Hi Simon,
href="(.*)" title="(.*)" or href="(.*)" title="(.*)"> Both show not the requested results. Maybe my version Textpipe Lite 8.4.8 does not support this.
If i use your code
href=".*" title=".*">
I receive the error message: $2 is not a valid subexpression identifier

I assume there is general filter available for extracting several parts of text from a line, like
<a title="Page 3 TEST" class="" href="viewforum.php?f=17&start=100">Linktext Example John</a>
for extracting the attributes
1) the title
2) the URL
3) the Linktext
But again maybe Textpipe Lite does not support this or could you attach me the filter?
Thanks gerd

Re: Extract several parts of a line

Posted: Sat Sep 18, 2010 9:14 am
by DataMystic Support
Sorry - should have been

Code: Select all

href="(.*)" title="(.*)">
or better:

Code: Select all

href="(.*)" title="(.*)">(.*)<
Then $3 is the link text.