Hello Simon,
I'm trying to build a filter that will extract just the unique text from several HTML pages from our website I've tried to use find and replace to remove the template (reappearing) html, java script, text and images from each page but can't figure out how to get it to remove more than one line at a time. An example of a page is at http://newcar101.com/auto-loan.html
Any help would be appreciated.
[/url]
Filter Help
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
- DataMystic Support
- Site Admin
- Posts: 2229
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Why don't you try using the search/replace filter - and just replace with nothing? The search/replace filter handles multi-line text too.
Regards,
Simon Carter, https://www.DataMystic.com
https://www.JadeDiabetes.com - Insulin dose calculator for Type 1 diabetes
https://www.DownloadPipe.com - 250,000 free software downloads
Simon Carter, https://www.DataMystic.com
https://www.JadeDiabetes.com - Insulin dose calculator for Type 1 diabetes
https://www.DownloadPipe.com - 250,000 free software downloads
Search and Replace
Hi Simon,
That worked. Thank you.
On other question. When I use the filter to remove HTML tags can I except certain tags that I want to keep?
Thanks for your help.
Ed
That worked. Thank you.
On other question. When I use the filter to remove HTML tags can I except certain tags that I want to keep?
Thanks for your help.
Ed
ED
- DataMystic Support
- Site Admin
- Posts: 2229
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Hi Ed,
No - but if you precede the Remove HTML filter with a filter to modify the tags you want to keep to, say, [...] instead of <...> then this will work too.
e.g. Find perl pattern
<(\?(body|table|tr)[^>]*)>
Replace with
\[$1\]
No - but if you precede the Remove HTML filter with a filter to modify the tags you want to keep to, say, [...] instead of <...> then this will work too.
e.g. Find perl pattern
<(\?(body|table|tr)[^>]*)>
Replace with
\[$1\]
Regards,
Simon Carter, https://www.DataMystic.com
https://www.JadeDiabetes.com - Insulin dose calculator for Type 1 diabetes
https://www.DownloadPipe.com - 250,000 free software downloads
Simon Carter, https://www.DataMystic.com
https://www.JadeDiabetes.com - Insulin dose calculator for Type 1 diabetes
https://www.DownloadPipe.com - 250,000 free software downloads