Unicode support beyond the Basic Multilingual Plane?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

It rather looks as though TextPipe does not support Unicode characters beyond the Basic Multilingual Plane.

cf. Unicode 11 added Plane 16 to the standard. 100000..10FFFF Supplementary Private Use Area-B
See https://www.unicode.org/versions/Unicode11.0.0/

I've just been testing the filter Convert Numeric HTML/XML Entities to text using the trial run area.

Codes beyond the BMP are improperly converted. e.g.

Code: Select all

𑊰
becomes

Code: Select all

which is U+12B0 ETHIOPIC SYLLABLE KWA
The proper conversion should be U+112B0 KHUDAWADI LETTER A

Thus files containing NCRs with more than 4 hex digits would be converted with errors in the output.

When will TextPipe become more fully compliant with the latest Unicode standard?

Best regards,

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DataMystic Support »

We are currently looking into what is required here.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

What's New in TextPipe v11 – 12 December, 2019
==============================================
...
  • Upgraded Unicode support to Unicode 12.1.
...

Thanks!

David
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

Bug alert!

Convert Numeric HTML/XML Entities to text converted

Code: Select all

𑊰 ꨀ
to

Code: Select all

; ;
NB. I have also just tried using this same example of Entity data in a UTF-8 input file as well as in the Trial Run area.
The output file had simply a semicolon just like the trial run area.

This is now become a very serious software bug!

Regards,

David
Last edited by dfhtextpipe on Thu Mar 05, 2020 2:28 am, edited 3 times in total.
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

I have also retested the similar filter called Convert HTML/XML entities to text.

Please refer to https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

It's apparent from the Help page entitled Convert HTML/XML entities to text that this filter only supports HTML 4.0

It would be therefore be a further essential improvement to expand the covered entities to the larger set of character entity references in HTML 5.0 - complete with the alternative names for some of these.

Furthermore, I have just tested all the 252 covered entities with the filter.
In regard to the HTML 5.0 standard, two of these are now improperly converted by TextPipe 11.4

Code: Select all

Entity	Character	TextPipe	Exact?
⟨	⟨	〈	FALSE
⟩	⟩	〉	FALSE

Code: Select all

⟨ should be U+27E8 (moved to current code point in HTML 5.0; previously in HTML 4.0 it was mapped to U+2329 (9000); 
⟩ should be U+27E9 (moved to current code point in HTML 5.0; previously in HTML 4.0 it was mapped to U+232A (9001);
Best regards,

David
Last edited by dfhtextpipe on Thu Mar 05, 2020 2:24 am, edited 1 time in total.
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

See also http://www.datamystic.com/forums/viewtopic.php?f=17&t=2505
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

Hi Simon,

Anything to report on this critical issue and the related one?

Best regards,

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Unicode support beyond the Basic Multilingual Plane?

Post by DataMystic Support »

Hi David,

This is being prepared for v11.6.

Regards,

Simon
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unicode support beyond the Basic Multilingual Plane?

Post by dfhtextpipe »

Excellent news!

David
David
Post Reply