See http://en.wikipedia.org/wiki/Czech_alphabet
Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:
Code: Select all
[A-Za-z\x{00C1}\x{00C9}\x{00CD}\x{00D3}\x{00DA}\x{00DD}\x{00E1}\x{00E9}\x{00ED}\x{00F3}\x{00FA}\x{00FD}\x{010C}\x{010D}\x{010E}\x{010F}\x{011A}\x{011B}\x{0147}\x{0148}\x{0158}\x{0159}\x{0160}\x{0161}\x{0164}\x{0165}\x{016E}\x{016F}\x{017D}\x{017E}]
Code: Select all
[A-Za-zÁÉÍÓÚÝáéíóúýČčĎďĚěŇňŘřŠšŤťŮůŽž]
It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that
Code: Select all
[:czech:]
I can't use captured text and store it in a global variable, as the files to be processed will not contain it.
Am I forced to resort to VBScript, or is there a simpler more open method?
David