Delete words with less than X characters in a HTML Tag

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Delete words with less than X characters in a HTML Tag

Post by gerd »

I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter

Example: <h1>I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter</h1>
Requested result: <h1>Characters Capital Letter</h1>

I have been playing with [A-Z]{6,}? but have no idea how to proceed. What are the filter commends for this task of deleting strings within a certain HTML tag?
Thanks in advance
gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Post by DataMystic Support »

Try:

Code: Select all

<h1>^[A-Z][^<]{5,}</h1>
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Delete words with less than X characters in a HTML Tag

Post by gerd »

Simon,

<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>

In other words: Extract whole words only starting with a capital letter consisting of a least X characters.

I assume that the replacement would read $0 with activated Extract option

Thanks
gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Post by DataMystic Support »

Ah, that is much clearer.

Ok, first add a regex

Code: Select all

<h1>([^<]+)</h1>
and set the Replace Action to Send variable 1 to subfilter

Then add a second regex inside this, with a regex pattern of

Code: Select all

\b[A-Z]\w+?\b
Do not use the Extract option, just set the replacement to blank, and ensure Match Case is ON. Note - you will also have to change the definition of A-Z to include any special letters.

You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Delete words with less than X characters in a HTML Tag

Post by gerd »

Thanks Simon,

that regex is very close and works. The only missing point of the above example is now: How to include

Words starting with a capital letter consisting of a least X characters.

how to put e.g. {5,}? or something else in your above regex ?

Thanks
gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Delete words with less than X characters in a HTML Tag

Post by DataMystic Support »

Sorry - I missed that, here it is.

Code: Select all

\b[A-Z]\w{5,}?\b
Post Reply