Page 1 of 1
Delete words with less than X characters in a HTML Tag
Posted: Fri Mar 08, 2013 12:40 am
by gerd
I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter
Example: <h1>I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter</h1>
Requested result: <h1>Characters Capital Letter</h1>
I have been playing with [A-Z]{6,}? but have no idea how to proceed. What are the filter commends for this task of deleting strings within a certain HTML tag?
Thanks in advance
gerd
Re: Delete words with less than X characters in a HTML Tag
Posted: Tue Mar 12, 2013 11:05 am
by DataMystic Support
Re: Delete words with less than X characters in a HTML Tag
Posted: Tue Mar 12, 2013 7:30 pm
by gerd
Simon,
<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>
In other words: Extract whole words only starting with a capital letter consisting of a least X characters.
I assume that the replacement would read $0 with activated Extract option
Thanks
gerd
Re: Delete words with less than X characters in a HTML Tag
Posted: Wed Mar 13, 2013 10:17 am
by DataMystic Support
Ah, that is much clearer.
Ok, first add a regex
and set the Replace Action to
Send variable 1 to subfilter
Then add a second regex inside this, with a regex pattern of
Do not use the Extract option, just set the replacement to blank, and ensure Match Case is ON. Note - you will also have to change the definition of A-Z to include any special letters.
You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.
Re: Delete words with less than X characters in a HTML Tag
Posted: Wed Mar 13, 2013 8:53 pm
by gerd
Thanks Simon,
that regex is very close and works. The only missing point of the above example is now: How to include
Words starting with a capital letter consisting of a least X characters.
how to put e.g. {5,}? or something else in your above regex ?
Thanks
gerd
Re: Delete words with less than X characters in a HTML Tag
Posted: Thu Mar 14, 2013 6:16 am
by DataMystic Support
Sorry - I missed that, here it is.