Page 1 of 1

Filter Help

Posted: Sat Dec 03, 2005 7:16 am
by edjerum
Hello Simon,

I'm trying to build a filter that will extract just the unique text from several HTML pages from our website I've tried to use find and replace to remove the template (reappearing) html, java script, text and images from each page but can't figure out how to get it to remove more than one line at a time. An example of a page is at http://newcar101.com/auto-loan.html

Any help would be appreciated.

[/url]

Posted: Mon Dec 05, 2005 8:53 am
by DataMystic Support
Why don't you try using the search/replace filter - and just replace with nothing? The search/replace filter handles multi-line text too.

Search and Replace

Posted: Mon Dec 05, 2005 10:32 am
by edjerum
Hi Simon,

That worked. Thank you.

On other question. When I use the filter to remove HTML tags can I except certain tags that I want to keep?

Thanks for your help.

Ed

Posted: Mon Dec 05, 2005 11:32 am
by DataMystic Support
Hi Ed,

No - but if you precede the Remove HTML filter with a filter to modify the tags you want to keep to, say, [...] instead of <...> then this will work too.

e.g. Find perl pattern

<(\?(body|table|tr)[^>]*)>

Replace with

\[$1\]

Thank you

Posted: Mon Dec 05, 2005 12:00 pm
by edjerum
Thanks Simon.