Hi,
I am struggling with a filter that should perform the following:
Find all words in a text which start with a capital letter and consist of at least 6 characters and extract them to a csv file.
Example Text:
You can also perform Partial Trial Runs by right-clicking on filters in the Filter list.
Target of Extraction:
Partial Filter
because those two words consist of at least 6 characters.
I am playing trial and error with ([A-Z](\w{6,})[a-z]) and other versions without any success. Any idea?
thanks gerd
Find words starting with a capital letter
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Find words starting with a capital letter
Hi Gerd,
Try:
Find (match case turned on)
[A-Z][a-z]{5,}?
Replace with
$0\r\n
Extract option on.
Try:
Find (match case turned on)
[A-Z][a-z]{5,}?
Replace with
$0\r\n
Extract option on.
Re: Find words starting with a capital letter
Thanks a lot,
that's the result what I was looking for. I guess I have somehow tried the line [A-Z][a-z]{5,} but surely without the missing question mark. I use the ? so far only as "at most one match". I guess I should take the time and go through the pages 77 - 106 of your manual carefully. Or do you have a hint to find the explanation how to use ? in this respect?
Anyhow, your hint is a great help for me.
gerd
that's the result what I was looking for. I guess I have somehow tried the line [A-Z][a-z]{5,} but surely without the missing question mark. I use the ? so far only as "at most one match". I guess I should take the time and go through the pages 77 - 106 of your manual carefully. Or do you have a hint to find the explanation how to use ? in this respect?
Anyhow, your hint is a great help for me.
gerd
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Find words starting with a capital letter
Caveat!
Any case operations or case patterns are highly dependent on the alphabet for the language of the text being processed.
For languages with diacritics, the whole topic becomes much more complex.
And for some languages, there are further pitfalls to catch out the unwary. See
http://en.wikipedia.org/wiki/Dotless_i
which is a feature of Turkish, and a few other languages.
David
Any case operations or case patterns are highly dependent on the alphabet for the language of the text being processed.
For languages with diacritics, the whole topic becomes much more complex.
And for some languages, there are further pitfalls to catch out the unwary. See
http://en.wikipedia.org/wiki/Dotless_i
which is a feature of Turkish, and a few other languages.
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Find words starting with a capital letter
Hi David,
You could try using the perl regex '\w' to match word characters in a locale-specific way.
The ? at the end of a +, * or {} repetition reverses the normal greediness.
In TextPipe, the default is to be non-greedy, so [a-z]{5,} matches only 5 chars if it can, whereas
[a-z]{5,}? matches as many characters as it can.
You can toggle the default greediness using the pattern options button [...] for each pattern.
You could try using the perl regex '\w' to match word characters in a locale-specific way.
The ? at the end of a +, * or {} repetition reverses the normal greediness.
In TextPipe, the default is to be non-greedy, so [a-z]{5,} matches only 5 chars if it can, whereas
[a-z]{5,}? matches as many characters as it can.
You can toggle the default greediness using the pattern options button [...] for each pattern.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Find words starting with a capital letter
Simon,
Although TexpPipe is locale sensitive, the fact is that I retain the English locale settings (region and language) for my PC,
even though I'm working on any number of different foreign language text files.
David
Although TexpPipe is locale sensitive, the fact is that I retain the English locale settings (region and language) for my PC,
even though I'm working on any number of different foreign language text files.
David
David