Pattern matching UTF-8 support

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Pattern matching UTF-8 support

Post by dfhtextpipe »

The help page includes the following:
2. In a pattern, the escape sequence \x{...}, where the contents of the braces is a string of hexadecimal digits, is interpreted as a UTF-8 character whose code number is the given hexadecimal number, for example: \x{1234}. If a non-hexadecimal digit appears between the braces, the item is not recognized. This escape sequence can be used either as a literal, or within a character class.
Yet if in a replace filter, in the "Find pattern (perl style" field, and with UTF-8 enabled, I enter \x{1234}, TextPipe indicates that "the character value in the x{...} sequence is too large".

This makes it impossible to use as documented.

I have many UTF-8 files containing the single character U+FEFF ZERO WIDTH NO BREAK SPACE.
I would like to remove this character. How can I do it? TextPipe does not allow \x{feff}.

For U+FEFF, the equivalent hexadecimal byte codes are \xEF\xBB\xBF yet TextPipe doesn't find this pattern either!
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Pattern matching UTF-8 support

Post by DataMystic Support »

Hi David,

I checked it out with the PCRE guys. You need to enable the UTF-8 flag for this to work.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Pattern matching UTF-8 support

Post by dfhtextpipe »

I had enabled UTF-8 support. That's the point - I did all the right things but it still won't work.

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Pattern matching UTF-8 support

Post by DataMystic Support »

Hi David,

Just checked and you are right. This is a display validation issue only - the filter works properly, but the error message it gives is not correct.

It will be fixed in the next release.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Pattern matching UTF-8 support

Post by dfhtextpipe »

Thanks for the response, Simon.

I was almost starting to imagine I was going mad, but you have allayed my fears!

David
David
Post Reply