Page 1 of 1
How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Sun Sep 07, 2008 12:19 am
by pheagila
I am trying to extract 2 seperate lines from HTML
If I try with only 1 line it
works, BUT
if I try both I get 0 byte output
2 lines that require extraction are:
<h1 id="Results">..DATA...</h1>
<span id="Data">..DATA...</span>
Code: Select all
Perl pattern [<h1 id="Results">(.*)</h1> <span id="Data">(.*)</span>] with [$1\r\n $2\r\n]
[X] Extract matches
Maximum text buffer size 99999
[X] '.' matches newline
Cheers
Re: How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Mon Sep 08, 2008 7:37 am
by DataMystic Support
Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:
Code: Select all
<h1 id="Results">.*</h1>|<span id="Data">.*</span>
Re: How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Mon Sep 08, 2008 11:00 pm
by pheagila
DataMystic Support wrote:Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:
Code: Select all
<h1 id="Results">.*</h1>|<span id="Data">.*</span>
thank you Support - that worked perfectly !!
1) I assume the
| acts as an AND ??
2) Is there any difference between
.* and
(.*) ??
3) What is the difference between
$0 &
$1$2 as they produce different outputs ??
Final issue I need to solve is with output of data with
correct new lines
Code: Select all
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$0\r\n]
[X] Extract matches
Maximum text buffer size 99999
[X] '.' matches newline
the following code outputs:
Results ABC
Data 123
Results DEF
Data 456
However, the output I need is:
Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---
Cheers
Re: How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Tue Sep 09, 2008 10:59 pm
by DataMystic Support
1) | means OR, not AND!
2) & 3) (.*) captures the bit in brackets so that it can be used in $1, $2 etc in the output
Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
Re: How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Tue Sep 09, 2008 11:53 pm
by pheagila
DataMystic Support wrote:
Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
Hi Support,
THe following examples below output incorrectly
Code: Select all
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$1$2]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$0]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$1$2\r\n]
Outputs:
Results ABC
Data 123
Results DEF
Data 456
Can you provide a code example for the
"Replace with"?
Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---
Re: How do I Extract 2 Seperate Tags/Lines from HTML ??
Posted: Wed Sep 10, 2008 10:40 pm
by DataMystic Support
Do a perl pattern search for
and replace with
Sorry - without seeing a sample of source data it is hard to help!