Page 1 of 1

Unicode sorts?

Posted: Wed Jul 14, 2010 6:10 am
by dfhtextpipe
Sorting lines of Unicode text is a huge topic in its own right, yet TextPipe doesn't yet offer to sort UTF-8 (or other encodings) even on the basis of codepoint values.

Although this is no substitute for intelligent sorting for the text of various languages, it would still have some useful applications, such as for analysis of character frequencies in Unicode text files.

cf. My recent post on this topic in the Help and Support section.

Re: Unicode sorts?

Posted: Wed Jul 14, 2010 8:58 am
by DataMystic Support
Thanks David - we'll look into adding it shortly.

Re: Unicode sorts?

Posted: Wed Jul 14, 2010 10:23 am
by DataMystic Support
Hi David,

Windows does not provide functions for natively sorting anything except Ansi and Unicode.

SO utf-8 is out of the question - you would need to convert the text from utf-8 to UTF16LE first, then sort (with a new widestring sort we can add), and then convert it back later.

How does that sound?

Re: Unicode sorts?

Posted: Thu Jul 15, 2010 2:05 am
by dfhtextpipe
Might be a useful (though somewhat awkward) workaround.

For now I'm content to just use Notepad++ | TextFX Tools | Sort.

Yet I can imagine that other users may find the wide sort of UTF16LE a benefit.