Find words starting with a capital letter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Find words starting with a capital letter

Post by gerd »

Hi,
I am struggling with a filter that should perform the following:
Find all words in a text which start with a capital letter and consist of at least 6 characters and extract them to a csv file.
Example Text:
You can also perform Partial Trial Runs by right-clicking on filters in the Filter list.

Target of Extraction:
Partial Filter

because those two words consist of at least 6 characters.

I am playing trial and error with ([A-Z](\w{6,})[a-z]) and other versions without any success. Any idea?
thanks gerd
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Find words starting with a capital letter

Post by DataMystic Support »

Hi Gerd,

Try:

Find (match case turned on)
[A-Z][a-z]{5,}?
Replace with
$0\r\n
Extract option on.
gerd
Posts: 39
Joined: Wed Mar 12, 2008 10:52 pm

Re: Find words starting with a capital letter

Post by gerd »

Thanks a lot,

that's the result what I was looking for. I guess I have somehow tried the line [A-Z][a-z]{5,} but surely without the missing question mark. I use the ? so far only as "at most one match". I guess I should take the time and go through the pages 77 - 106 of your manual carefully. Or do you have a hint to find the explanation how to use ? in this respect?
Anyhow, your hint is a great help for me.
gerd
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Find words starting with a capital letter

Post by dfhtextpipe »

Caveat!

Any case operations or case patterns are highly dependent on the alphabet for the language of the text being processed.

For languages with diacritics, the whole topic becomes much more complex.

And for some languages, there are further pitfalls to catch out the unwary. See
http://en.wikipedia.org/wiki/Dotless_i

which is a feature of Turkish, and a few other languages.

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Find words starting with a capital letter

Post by DataMystic Support »

Hi David,

You could try using the perl regex '\w' to match word characters in a locale-specific way.

The ? at the end of a +, * or {} repetition reverses the normal greediness.

In TextPipe, the default is to be non-greedy, so [a-z]{5,} matches only 5 chars if it can, whereas
[a-z]{5,}? matches as many characters as it can.

You can toggle the default greediness using the pattern options button [...] for each pattern.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Find words starting with a capital letter

Post by dfhtextpipe »

Simon,

Although TexpPipe is locale sensitive, the fact is that I retain the English locale settings (region and language) for my PC,
even though I'm working on any number of different foreign language text files.

David
David
Post Reply