How does TextPipe handle the Unicode line separator U+2028 ?
e.g. If the Files to be Processed have these as the EOL marker.
Assume that these are Unicode files - encoded in either UTF-16 LE or UTF-8 (with or without BOM).
Also how about in Perl pattern matching?
e.g. In the Patterns options button [...] dialog that include the tick option '.' matches newline.
David
PS. The attachment contains a simple TP filter to convert EOLs to U+2028.
Unicode line separator U+2028
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Unicode line separator U+2028
- Attachments
-
- Change EOLs to U+2028.zip
- TextPipe filter to change EOLs to U+2028.
- (762 Bytes) Downloaded 572 times
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Unicode line separator U+2028
Thanks David - we've included your filter in a new 'Unicode' filter subfolder.
I don't believe that PCRE (the library we use) pattern matching handles anything other \r, \r\n and \n line feeds.
I don't believe that PCRE (the library we use) pattern matching handles anything other \r, \r\n and \n line feeds.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Unicode line separator U+2028
Well, well, well.
The help page entitled Unicode Pattern Reference includes this:
Something overlooked, perhaps?
David
PS. Of the various Unicode compatible text editors (for Windows) that I use regularly, only SC Unipad handles these correctly.
The help page entitled Unicode Pattern Reference includes this:
So this suggests that TextPipe ought to be able to handle U+2028 and U+2029.Definitions
Separator - any one of U+2028, U+2029, NL, CR.
Something overlooked, perhaps?
David
PS. Of the various Unicode compatible text editors (for Windows) that I use regularly, only SC Unipad handles these correctly.
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Unicode line separator U+2028
FWIW. Here's a similar filter to change EOLs to U+2029 Paragraph Separator.
- Attachments
-
- Change EOLs to U+2029.zip
- TP filter to change EOLs to U+2029
- (767 Bytes) Downloaded 792 times
David