I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter
Example: <h1>I need to delete all strings and words within a HTML Tag like H1 which are less than 6 Characters and do not start with a Capital Letter</h1>
Requested result: <h1>Characters Capital Letter</h1>
I have been playing with [A-Z]{6,}? but have no idea how to proceed. What are the filter commends for this task of deleting strings within a certain HTML tag?
Thanks in advance
gerd
Delete words with less than X characters in a HTML Tag
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Delete words with less than X characters in a HTML Tag
Try:
Code: Select all
<h1>^[A-Z][^<]{5,}</h1>
Re: Delete words with less than X characters in a HTML Tag
Simon,
<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>
In other words: Extract whole words only starting with a capital letter consisting of a least X characters.
I assume that the replacement would read $0 with activated Extract option
Thanks
gerd
<h1>^[A-Z][^<]{6,}</h1>
this does not work. Example: <h1>here some words BBQ Änderung Östereich</h1>
The requested result should be
<h1>Änderung Österreich</h1>
In other words: Extract whole words only starting with a capital letter consisting of a least X characters.
I assume that the replacement would read $0 with activated Extract option
Thanks
gerd
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Delete words with less than X characters in a HTML Tag
Ah, that is much clearer.
Ok, first add a regex
and set the Replace Action to Send variable 1 to subfilter
Then add a second regex inside this, with a regex pattern of
Do not use the Extract option, just set the replacement to blank, and ensure Match Case is ON. Note - you will also have to change the definition of A-Z to include any special letters.
You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.
Ok, first add a regex
Code: Select all
<h1>([^<]+)</h1>
Then add a second regex inside this, with a regex pattern of
Code: Select all
\b[A-Z]\w+?\b
You will also have to add a Filters\Remove\Blanks from start of line filter, after the 2nd regex (inside the first regex) as well.
Re: Delete words with less than X characters in a HTML Tag
Thanks Simon,
that regex is very close and works. The only missing point of the above example is now: How to include
Words starting with a capital letter consisting of a least X characters.
how to put e.g. {5,}? or something else in your above regex ?
Thanks
gerd
that regex is very close and works. The only missing point of the above example is now: How to include
Words starting with a capital letter consisting of a least X characters.
how to put e.g. {5,}? or something else in your above regex ?
Thanks
gerd
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Delete words with less than X characters in a HTML Tag
Sorry - I missed that, here it is.
Code: Select all
\b[A-Z]\w{5,}?\b