Filter Help

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
edjerum
Posts: 3
Joined: Sat Dec 03, 2005 1:26 am
Location: San Diego

Filter Help

Post by edjerum »

Hello Simon,

I'm trying to build a filter that will extract just the unique text from several HTML pages from our website I've tried to use find and replace to remove the template (reappearing) html, java script, text and images from each page but can't figure out how to get it to remove more than one line at a time. An example of a page is at http://newcar101.com/auto-loan.html

Any help would be appreciated.

[/url]
ED
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Why don't you try using the search/replace filter - and just replace with nothing? The search/replace filter handles multi-line text too.
edjerum
Posts: 3
Joined: Sat Dec 03, 2005 1:26 am
Location: San Diego

Search and Replace

Post by edjerum »

Hi Simon,

That worked. Thank you.

On other question. When I use the filter to remove HTML tags can I except certain tags that I want to keep?

Thanks for your help.

Ed
ED
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi Ed,

No - but if you precede the Remove HTML filter with a filter to modify the tags you want to keep to, say, [...] instead of <...> then this will work too.

e.g. Find perl pattern

<(\?(body|table|tr)[^>]*)>

Replace with

\[$1\]
edjerum
Posts: 3
Joined: Sat Dec 03, 2005 1:26 am
Location: San Diego

Thank you

Post by edjerum »

Thanks Simon.
ED
Post Reply