User-named character classes?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

User-named character classes?

Post by dfhtextpipe »

Suppose I wish to match for a pattern that [e.g.] consists of any UTF-8 character in the Czech alphabet (in either case).
See http://en.wikipedia.org/wiki/Czech_alphabet

Excluding the "Ch" diglot, a Perl pattern that does this would be as follows:

Code: Select all

[A-Za-z\x{00C1}\x{00C9}\x{00CD}\x{00D3}\x{00DA}\x{00DD}\x{00E1}\x{00E9}\x{00ED}\x{00F3}\x{00FA}\x{00FD}\x{010C}\x{010D}\x{010E}\x{010F}\x{011A}\x{011B}\x{0147}\x{0148}\x{0158}\x{0159}\x{0160}\x{0161}\x{0164}\x{0165}\x{016E}\x{016F}\x{017D}\x{017E}]
This is equivalent to the shorter pattern

Code: Select all

[A-Za-zÁÉÍÓÚÝáéíóúýČčĎďĚěŇňŘřŠšŤťŮůŽž]
The latter will not work when entered as a simple Perl pattern in TextPipe, so one has to use the more complicated one with all the hexadecimal codes.

It would be much simpler if there was a facility to define user-named character classes, such that a much shorter pattern name can be used, perhaps by extending the POSIX notation such that

Code: Select all

[:czech:]
would be equivalent to the above pattern.

I can't use captured text and store it in a global variable, as the files to be processed will not contain it.

Am I forced to resort to VBScript, or is there a simpler more open method?

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: User-named character classes?

Post by DataMystic Support »

Hi David,

A proposed solution, is in the perl search/replace mode, when utf8 support is checked, the unicode data entered is converted to utf8 before being passed to the perl module.

This allows your simpler pattern to pass through without any problems, and results in the same output as the more complex sample.

You can see this trial in action in
http://www.datamystic.com/textpipestandard2.exe - available in an hour or so.

- let me know if it meets your needs, and also if there are any side-effects.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: User-named character classes?

Post by dfhtextpipe »

Hi Simon,

I was away when you posted that - if I get time today, I'll give it a try.

David
David
Post Reply