Page 1 of 1

Analyzing Unicode Text with Regular Expressions

Posted: Mon Oct 26, 2009 6:59 pm
by dfhtextpipe
Here's an article which should be helpful to others:

http://icu-project.org/docs/papers/iuc26_regexp.pdf

Using Regular Expressions with Unicode texts can be a nightmare, largely as (too) much public documentation is geared towards using them just with ANSI characters.

This 18 page article from 2004 rectifies a lot of that.

Re: Analyzing Unicode Text with Regular Expressions

Posted: Tue Oct 27, 2009 8:54 pm
by DataMystic Support
Thanks David,

TextPipe uses the PCRE (perl compatable regular expression) library - hence all the Unicode regex functions are implemented. Generally you need to check the 'Allow UTF-8' option of the perl or EasyPattern replacement.