Page 1 of 1

How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Sun Sep 07, 2008 12:19 am
by pheagila
I am trying to extract 2 seperate lines from HTML

If I try with only 1 line it works, BUT if I try both I get 0 byte output

2 lines that require extraction are:

<h1 id="Results">..DATA...</h1>
<span id="Data">..DATA...</span>

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1> <span id="Data">(.*)</span>] with [$1\r\n $2\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline
Cheers

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Mon Sep 08, 2008 7:37 am
by DataMystic Support
Naturally after you extract the first line type, there is no text left to match the second type.

You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Mon Sep 08, 2008 11:00 pm
by pheagila
DataMystic Support wrote:Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>
thank you Support - that worked perfectly !! :)

1) I assume the | acts as an AND ??
2) Is there any difference between .* and (.*) ??
3) What is the difference between $0 & $1$2 as they produce different outputs ??

Final issue I need to solve is with output of data with correct new lines

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$0\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline
the following code outputs:

Results ABC
Data 123
Results DEF
Data 456

However, the output I need is:

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Cheers

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Tue Sep 09, 2008 10:59 pm
by DataMystic Support
1) | means OR, not AND!
2) & 3) (.*) captures the bit in brackets so that it can be used in $1, $2 etc in the output

Why don't you replace the word 'Results' with '\r\nResults' to get the new line?

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Tue Sep 09, 2008 11:53 pm
by pheagila
DataMystic Support wrote: Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
Hi Support,

THe following examples below output incorrectly

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$1$2]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$0]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$1$2\r\n]
Outputs:
Results ABC
Data 123
Results DEF
Data 456

Can you provide a code example for the "Replace with"?

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Posted: Wed Sep 10, 2008 10:40 pm
by DataMystic Support
Do a perl pattern search for

Code: Select all

Data (\d+)
and replace with

Code: Select all

$0\r\n
Sorry - without seeing a sample of source data it is hard to help!