Page 1 of 1

Extract certain HTML tags

Posted: Tue Oct 02, 2012 6:01 am
by JimC
I am trying to extract some data from webpages. All of the content I need is contained within two:
<div class="someclass">content</div>
tags.
What is best FILTER to extract just these two tage from a file and then proceed with further processing?

Something like an extract HTML/XML pair would be perfect, but I dont see that as an option

Re: Extract certain HTML tags

Posted: Tue Oct 02, 2012 9:11 am
by DataMystic Support
Hi Jim,

Just a perl pattern:

Code: Select all

<div class="someclass">(.*)</div>
replace with

Code: Select all

$1
and check 'Extract'.