Splitting CamelCase words in Unicode?
Posted: Sun Oct 14, 2012 2:39 am
I have a non-English text that includes lots of CamelCase words.
The language is based on an extended Latin alphabet with various diacritics.
The text is encoded as UTF-8 (without BOM) and the Unicode is normalized NFC.
The following regexp can find them when used in the Notepad++ search feature: (with Match case ticked)
Now supposing I'd like to use TextPipe to split these words where the CamelCase case-change is found.
i.e. where a lowercase letter is followed by an adjacent uppercase letter.
e.g. Change "CamelCase" to "Camel Case".
How should I do this in TextPipe?
i.e. What is the simplest method to implement this requirement?
Bear in mind that I can't just paste this code into a Perl replace filter: with Match case ticked.
I can do this in NotePad++ but this sort of thing simply doesn't work in TextPipe.
If it's so easy in Notepad++, then why is it so difficult in TextPipe?
David
The language is based on an extended Latin alphabet with various diacritics.
The text is encoded as UTF-8 (without BOM) and the Unicode is normalized NFC.
The following regexp can find them when used in the Notepad++ search feature:
Code: Select all
[a-zàáãèéìíòóõùúṣẹẽọ][A-ZÀÁÃÈÉÌÍÒÓÕÙÚṢẸẼỌ]
Now supposing I'd like to use TextPipe to split these words where the CamelCase case-change is found.
i.e. where a lowercase letter is followed by an adjacent uppercase letter.
e.g. Change "CamelCase" to "Camel Case".
How should I do this in TextPipe?
i.e. What is the simplest method to implement this requirement?
Bear in mind that I can't just paste this code into a Perl replace filter:
Code: Select all
Replace ([a-zàáãèéìíòóõùúṣẹẽọ])([A-ZÀÁÃÈÉÌÍÒÓÕÙÚṢẸẼỌ]) by $1 $2
I can do this in NotePad++ but this sort of thing simply doesn't work in TextPipe.
If it's so easy in Notepad++, then why is it so difficult in TextPipe?
David