It's reposted here to focus attention on the Count Duplicate Lines filter.
My existing Text to Word List filter didn't cope properly with soft hyphens,
presumably because U+00AD is beyond ASCII, being part of Windows-1252 (aka ANSI).
There's no clue that characters U+00A0 to U+00FF are unsupported by the Count Duplicate Lines filter,
which follows the Text to Word List subfilter in my two stage filter.
The contrast with the Sort filter is brought to your attention:
So before extending the Text to Word List filter to cope with Unicode in general,Sort Type
The sort type controls the method by which items are sorted. The available options are:
· ANSI sort (case insensitive)
· ANSI sort (case sensitive) - faster than case insensitive as no case-mapping is performed
· ASCII sort (case insensitive)
· ASCII sort (case sensitive) - faster than case insensitive as no case-mapping is performed
...
please could you first extend the Count Duplicate Lines filter to support ANSI.
Meanwhile, I'll tweak my two stage filter to investigate further.
David