Page 1 of 1
Convert XML & HTML entities to Unicode?
Posted: Tue May 15, 2012 5:36 pm
by dfhtextpipe
Hi Simon,
TextPipe has a filter to convert
Numeric HTML entities to text.
This is fine as far as it goes, but rather narrow in scope, as it only works for
numerical entities.
Please consider to enhance TextPipe to provide a filter to convert
all XML & HTML entities to Unicode.
See
http://en.wikipedia.org/wiki/List_of_XM ... references
This would greatly improve the usefulness of TextPipe.
David
Re: Convert XML & HTML entities to Unicode?
Posted: Wed May 16, 2012 4:09 pm
by DataMystic Support
This filter has been changed to convert ALL entities, and the name and documentation updated accordingly.
Thanks for the link! It proved very useful.
Re: Convert XML & HTML entities to Unicode?
Posted: Wed May 16, 2012 8:29 pm
by dfhtextpipe
That's great, Simon.
As Hannibal Smith used to say, "
I love it when a plan comes together".
http://en.wikipedia.org/wiki/John_%22Hannibal%22_Smith
David
Re: Convert XML & HTML entities to Unicode?
Posted: Wed May 16, 2012 11:30 pm
by DataMystic Support
I love that A-Team movie! Very amusing...
Re: Convert XML & HTML entities to Unicode?
Posted: Thu May 17, 2012 4:12 pm
by dfhtextpipe
Afterthoughts:
If the input file is XML and/or the output file is/will be XML, then the user may wish to exclude the XML entities.
Therefore please provide a tick box option for the XML entitles to be excluded/included.
Give the user choice for each subset, as this makes good sense, and the filter becomes more versatile.
- ☑ Include the predefined XML entities
☑ Include the defined HTML entities
☑ Include NCRs (Numeric character references)
NB. For the latter, one might also find it useful to distinguish between the
decimal and
hexadecimal forms of NCR.
btw. The original name for this filter was somewhat inaccurate, in that it referred to
Numeric HTML entities,
whereas in fact the proper name for those is
numeric character reference (NCR).
See
http://en.wikipedia.org/wiki/Numeric_ch ... _reference
David