Page 1 of 1

Delete words with less than X characters in a HTML Tag

Posted: Fri Mar 08, 2013 12:40 am
by gerd
I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter

Example: <h1>I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter</h1>
Requested result: <h1>Characters Capital Letter</h1>

I have been playing with [A-Z]{6,}? but have no idea how to proceed. What are the filter commends for this task of deleting strings within a certain HTML tag?
Thanks in advance
gerd

Re: Delete words with less than X characters in a HTML Tag

Posted: Tue Mar 12, 2013 11:05 am
by DataMystic Support
Try:

Code: Select all

<h1>^[A-Z][^<]{5,}</h1>

Re: Delete words with less than X characters in a HTML Tag

Posted: Tue Mar 12, 2013 7:30 pm
by gerd
Simon,

<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>

In other words: Extract whole words only starting with a capital letter consisting of a least X characters.

I assume that the replacement would read $0 with activated Extract option

Thanks
gerd

Re: Delete words with less than X characters in a HTML Tag

Posted: Wed Mar 13, 2013 10:17 am
by DataMystic Support
Ah, that is much clearer.

Ok, first add a regex

Code: Select all

<h1>([^<]+)</h1>
and set the Replace Action to Send variable 1 to subfilter

Then add a second regex inside this, with a regex pattern of

Code: Select all

\b[A-Z]\w+?\b
Do not use the Extract option, just set the replacement to blank, and ensure Match Case is ON. Note - you will also have to change the definition of A-Z to include any special letters.

You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.

Re: Delete words with less than X characters in a HTML Tag

Posted: Wed Mar 13, 2013 8:53 pm
by gerd
Thanks Simon,

that regex is very close and works. The only missing point of the above example is now: How to include

Words starting with a capital letter consisting of a least X characters.

how to put e.g. {5,}? or something else in your above regex ?

Thanks
gerd

Re: Delete words with less than X characters in a HTML Tag

Posted: Thu Mar 14, 2013 6:16 am
by DataMystic Support
Sorry - I missed that, here it is.

Code: Select all

\b[A-Z]\w{5,}?\b