Analyzing Unicode Text with Regular Expressions

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Analyzing Unicode Text with Regular Expressions

Post by dfhtextpipe »

Here's an article which should be helpful to others:

http://icu-project.org/docs/papers/iuc26_regexp.pdf

Using Regular Expressions with Unicode texts can be a nightmare, largely as (too) much public documentation is geared towards using them just with ANSI characters.

This 18 page article from 2004 rectifies a lot of that.
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Analyzing Unicode Text with Regular Expressions

Post by DataMystic Support »

Thanks David,

TextPipe uses the PCRE (perl compatable regular expression) library - hence all the Unicode regex functions are implemented. Generally you need to check the 'Allow UTF-8' option of the perl or EasyPattern replacement.
Post Reply