DataMystic

Posted: **Sun Sep 07, 2008 12:19 am**

I am trying to extract 2 seperate lines from HTML

If I try with only 1 line it works, BUT if I try both I get 0 byte output

2 lines that require extraction are:

<h1 id="Results">..DATA...</h1>
<span id="Data">..DATA...</span>

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1> <span id="Data">(.*)</span>] with [$1\r\n $2\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline

Cheers

Posted: **Mon Sep 08, 2008 7:37 am**

Naturally after you extract the first line type, there is no text left to match the second type.

You need to combine the patterns like this:

Code: Select all

<h1 id="Results">.*</h1>|<span id="Data">.*</span>

Posted: **Mon Sep 08, 2008 11:00 pm**

DataMystic Support wrote:Naturally after you extract the first line type, there is no text left to match the second type.
You need to combine the patterns like this:
Code: Select all
<h1 id="Results">.*</h1>|<span id="Data">.*</span>

thank you Support - that worked perfectly !!

1) I assume the | acts as an AND ??
2) Is there any difference between .* and (.*) ??
3) What is the difference between $0 & $1$2 as they produce different outputs ??

Final issue I need to solve is with output of data with correct new lines

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$0\r\n]
   [X] Extract matches
   Maximum text buffer size 99999
   [X] '.' matches newline

the following code outputs:

Results ABC
Data 123
Results DEF
Data 456

However, the output I need is:

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Cheers

Posted: **Tue Sep 09, 2008 10:59 pm**

1) | means OR, not AND!
2) & 3) (.*) captures the bit in brackets so that it can be used in $1, $2 etc in the output

Why don't you replace the word 'Results' with '\r\nResults' to get the new line?

Posted: **Tue Sep 09, 2008 11:53 pm**

DataMystic Support wrote: Why don't you replace the word 'Results' with '\r\nResults' to get the new line?

Hi Support,

THe following examples below output incorrectly

Code: Select all

Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$1$2]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [\r\n$0]
OR
Perl pattern [<h1 id="Results">(.*)</h1>|<span id="Data">(.*)</span>] with [$1$2\r\n]

Outputs:
Results ABC
Data 123
Results DEF
Data 456

Can you provide a code example for the "Replace with"?

Results ABC
Data 123
--- NEW LINE ---
Results DEF
Data 456
--- NEW LINE ---

Posted: **Wed Sep 10, 2008 10:40 pm**

Do a perl pattern search for

Code: Select all

Data (\d+)

and replace with

Code: Select all

$0\r\n

Sorry - without seeing a sample of source data it is hard to help!

DataMystic

How do I Extract 2 Seperate Tags/Lines from HTML ??

How do I Extract 2 Seperate Tags/Lines from HTML ??

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??

Re: How do I Extract 2 Seperate Tags/Lines from HTML ??