User-named character classes?
Posted: Sat Dec 31, 2011 4:09 am
Suppose I wish to match for a pattern that [e.g.] consists of any UTF-8 character in the Czech alphabet (in either case).
See http://en.wikipedia.org/wiki/Czech_alphabet
Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:This is equivalent to the shorter pattern The latter will not work when entered as a simple Perl pattern in TextPipe, so one has to use the more complicated one with all the hexadecimal codes.
It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that would be equivalent to the above pattern.
I can't use captured text and store it in a global variable, as the files to be processed will not contain it.
Am I forced to resort to VBScript, or is there a simpler more open method?
David
See http://en.wikipedia.org/wiki/Czech_alphabet
Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:
Code: Select all
[A-Za-z\x{00C1}\x{00C9}\x{00CD}\x{00D3}\x{00DA}\x{00DD}\x{00E1}\x{00E9}\x{00ED}\x{00F3}\x{00FA}\x{00FD}\x{010C}\x{010D}\x{010E}\x{010F}\x{011A}\x{011B}\x{0147}\x{0148}\x{0158}\x{0159}\x{0160}\x{0161}\x{0164}\x{0165}\x{016E}\x{016F}\x{017D}\x{017E}]
Code: Select all
[A-Za-zÁÉÍÓÚÝáéíóúýČčĎďĚěŇňŘřŠšŤťŮůŽž]
It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that
Code: Select all
[:czech:]
I can't use captured text and store it in a global variable, as the files to be processed will not contain it.
Am I forced to resort to VBScript, or is there a simpler more open method?
David