DataMystic

Posted: **Sun Jan 10, 2016 2:53 am**

The Sort filter is described as:

The sort type controls the method by which items are sorted. The available options are:
· ANSI sort (case insensitive)
· ANSI sort (case sensitive) - faster than case insensitive as no case-mapping is performed
· ASCII sort (case insensitive)
· ASCII sort (case sensitive) - faster than case insensitive as no case-mapping is performed
· Numeric sort
· Sort by length of line

It doesn't support sorting of UTF-8 text.

On the other hand, I regularly use the Count Duplicate Lines filter, and find that it handles UTF-8 text quite happily, and that the output is Sorted.

So why not improve the Sort filter using the code underlying the Count Duplicate Lines filter?

Please!

David

Posted: **Tue Jan 19, 2016 4:39 pm**

Hi David,

That is really strange, because they both use the same underlying list to do comparisons.

Do you have a set of test files and filters that you could share with me?

Thanks,

Simon

Posted: **Thu Jan 21, 2016 10:12 pm**

I'll get back to you with some test files.

Remind me in a week if I forget, please.

David

Posted: **Thu Jan 21, 2016 11:44 pm**

Will do!

Posted: **Fri Jan 22, 2016 3:21 am**

The Sort filter (ANSI case senstive selected) and the Count Duplicate Lines filter give identical results
(once the count column has been removed)

However, neither filter is good at sorting Unicode text files.

Worse than that, the Count Duplicate Lines filter Help page doesn't inform users of its sort limitations.
At least the Sort filter shows what are the nine available options in the drop down selector.

What's really needed (IMHO) is a Sort filter that provides the following further options:

UCA = Unicode collation algorithm
CLDR = Common Locale Data Repository
EOR = European Ordering Rules

In addition, it would be very useful to provide custom sort method for some scripts, such as Unicode Hebrew with accents and points.
For another slant on this in particular, see https://github.com/ninjaaron/ivsort.py

Unicode text sorts should be applicable for both UTF-8 and UTF-16LE input data.
i.e. One shouldn't have to convert UTF-8 to UTF-16 before doin the sort.

Best regards,

David

DataMystic

Improve the Sort filter

Improve the Sort filter

Re: Improve the Sort filter

Re: Improve the Sort filter

Re: Improve the Sort filter

Re: Improve the Sort filter