Pattern matching UTF-8 support
Posted: Fri Oct 14, 2011 10:54 pm
The help page includes the following:
This makes it impossible to use as documented.
I have many UTF-8 files containing the single character U+FEFF ZERO WIDTH NO BREAK SPACE.
I would like to remove this character. How can I do it? TextPipe does not allow \x{feff}.
For U+FEFF, the equivalent hexadecimal byte codes are \xEF\xBB\xBF yet TextPipe doesn't find this pattern either!
Yet if in a replace filter, in the "Find pattern (perl style" field, and with UTF-8 enabled, I enter \x{1234}, TextPipe indicates that "the character value in the x{...} sequence is too large".2. In a pattern, the escape sequence \x{...}, where the contents of the braces is a string of hexadecimal digits, is interpreted as a UTF-8 character whose code number is the given hexadecimal number, for example: \x{1234}. If a non-hexadecimal digit appears between the braces, the item is not recognized. This escape sequence can be used either as a literal, or within a character class.
This makes it impossible to use as documented.
I have many UTF-8 files containing the single character U+FEFF ZERO WIDTH NO BREAK SPACE.
I would like to remove this character. How can I do it? TextPipe does not allow \x{feff}.
For U+FEFF, the equivalent hexadecimal byte codes are \xEF\xBB\xBF yet TextPipe doesn't find this pattern either!