How do I Extract 2 Seperate Tags/Lines from HTML ??

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by pheagila »

I am trying to extract 2 seperate lines from HTML

If I try with only 1 line it works, BUT if I try both I get 0 byte output

2 lines that require extraction are:

<h1 id="Results">..DATA...</h1>
<span id="Data">..DATA...</span>

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1> <span id="Data">(.*)</span>] with [$1\r\n $2\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline
Cheers
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by DataMystic Support »

Naturally after you extract the first line type, there is no text left to match the second type.

You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by pheagila »

DataMystic Support wrote:Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>
thank you Support - that worked perfectly !! :)

1) I assume the | acts as an AND ??
2) Is there any difference between .* and (.*) ??
3) What is the difference between $0 & $1$2 as they produce different outputs ??

Final issue I need to solve is with output of data with correct new lines

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$0\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline
the following code outputs:

Results ABC
Data 123
Results DEF
Data 456

However, the output I need is:

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Cheers
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by DataMystic Support »

1) | means OR, not AND!
2) & 3) (.*) captures the bit in brackets so that it can be used in $1, $2 etc in the output

Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by pheagila »

DataMystic Support wrote: Why don't you replace the word 'Results' with '\r\nResults' to get the new line?
Hi Support,

THe following examples below output incorrectly

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$1$2]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$0]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$1$2\r\n]
Outputs:
Results ABC
Data 123
Results DEF
Data 456

Can you provide a code example for the "Replace with"?

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Post by DataMystic Support »

Do a perl pattern search for

Code: Select all

Data (\d+)
and replace with

Code: Select all

$0\r\n
Sorry - without seeing a sample of source data it is hard to help!
Post Reply