Character cAsE filters and the Turkish alphabet

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Character cAsE filters and the Turkish alphabet

Post by dfhtextpipe »

The help pages for the various Character cAsE filters states the following:
This filter expects UTF-8 data and will handle foreign character sets.
This is not quite true, in that there are exceptions in some bicameral alphabets such as Turkish and Northern Azeri.
Both these alphabets include the following two letters:

Code: Select all

U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE : i dot
U+0131 LATIN SMALL LETTER DOTLESS I
So for example pasting the following into the Trial Run area:

Code: Select all

İı
running the tOGGLE cASE filter makes no change.
On the other hand, it does change most accented Latin letters, e.g.

Code: Select all

Š
to

Code: Select all

š
Perhaps the sentence in the Help pages should be qualified.
This filter expects UTF-8 data and will handle some foreign character sets.
Not sure how you might implement the proper case rules for the Turkish alphabet, etc.
These filters would first need to have the writing system context specified by the user.

Furthermore, I would guess that you'd not given any consideration to extending these Character cAsE filters to cover the Cherokee supplement block of small letters that were defined by Unicode 8.0 (June 2015).


Best regards,
David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Character cAsE filters and the Turkish alphabet

Post by DataMystic Support »

Thanks David - we have made the clarification change to the help file.

Can you please provide sample text for the Cherokee supplement block?
Post Reply