Locale-sensitive filters and multilingual texts?
Posted: Sat Jun 02, 2012 6:15 am
Several TextPipe filters are sensitive to the locale, especially those that involve sorting or case comparison.
Locales are currently configured as part of the regional settings of the Windows operating system.
Yet a monoglot programmer may be someone who is tasked with processing multilingual text files.
i.e. Several different projects each for a specific language.
Furthermore, the programmer may largely bring IT skills to these projects, rather than skills in each of languages.
It makes next to no sense for the programmer to keep changing the locale at the OS level.
This just leads the way to incomprehensible GUIs for all his Windows applications.
It may also lead to having to put up with unfamiliar keyboard layouts for different alphabets and syllabaries, etc.
To work using TextPipe in such circumstances, it would be much better for each locale-sensitive TextPipe filter
to include options for specifying the locale to use for that filter.
e.g. If you are processing a text written if French, German or Turkish, then the chosen filter will have an option to select one
of these locales from a whole host of locales that TextPipe is designed to support.
Yet even extended the available locales to cover more of the Latin alphabet based locales would be a good start.
Is this something that you would be prepared to develop as an enhancement to TextPipe?
Best regards,
David
Locales are currently configured as part of the regional settings of the Windows operating system.
Yet a monoglot programmer may be someone who is tasked with processing multilingual text files.
i.e. Several different projects each for a specific language.
Furthermore, the programmer may largely bring IT skills to these projects, rather than skills in each of languages.
It makes next to no sense for the programmer to keep changing the locale at the OS level.
This just leads the way to incomprehensible GUIs for all his Windows applications.
It may also lead to having to put up with unfamiliar keyboard layouts for different alphabets and syllabaries, etc.
To work using TextPipe in such circumstances, it would be much better for each locale-sensitive TextPipe filter
to include options for specifying the locale to use for that filter.
e.g. If you are processing a text written if French, German or Turkish, then the chosen filter will have an option to select one
of these locales from a whole host of locales that TextPipe is designed to support.
- I mention French because vowels have accents and because of the cedilla, etc.
I mention German because vowels can have accents and because of the chracter ß (U+00DF) LATIN SMALL LETTER SHARP S.
I mention Turkish here because of the dotted and dotless I aspect of the Turkish alphabet.
Yet even extended the available locales to cover more of the Latin alphabet based locales would be a good start.
Code: Select all
Block Name Range Code Points Characters Unicode Version
Basic Latin 0000..007F 128 128 1.0.0
Latin-1 Supplement 0080..00FF 128 128 1.0.0
Latin Extended-A 0100..017F 128 128 1.0.0
Latin Extended-B 0180..024F 208 208 1.0.0
Best regards,
David