Hi Simon,
TextPipe has a filter to convert Numeric HTML entities to text.
This is fine as far as it goes, but rather narrow in scope, as it only works for numerical entities.
Please consider to enhance TextPipe to provide a filter to convert all XML & HTML entities to Unicode.
See http://en.wikipedia.org/wiki/List_of_XM ... references
This would greatly improve the usefulness of TextPipe.
David
Convert XML & HTML entities to Unicode?
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Convert XML & HTML entities to Unicode?
This filter has been changed to convert ALL entities, and the name and documentation updated accordingly.
Thanks for the link! It proved very useful.
Thanks for the link! It proved very useful.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Convert XML & HTML entities to Unicode?
That's great, Simon.
As Hannibal Smith used to say, "I love it when a plan comes together".
http://en.wikipedia.org/wiki/John_%22Hannibal%22_Smith
David
As Hannibal Smith used to say, "I love it when a plan comes together".
http://en.wikipedia.org/wiki/John_%22Hannibal%22_Smith
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Convert XML & HTML entities to Unicode?
I love that A-Team movie! Very amusing...
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Convert XML & HTML entities to Unicode?
Afterthoughts:
If the input file is XML and/or the output file is/will be XML, then the user may wish to exclude the XML entities.
Therefore please provide a tick box option for the XML entitles to be excluded/included.
Give the user choice for each subset, as this makes good sense, and the filter becomes more versatile.
btw. The original name for this filter was somewhat inaccurate, in that it referred to Numeric HTML entities,
whereas in fact the proper name for those is numeric character reference (NCR).
See http://en.wikipedia.org/wiki/Numeric_ch ... _reference
David
If the input file is XML and/or the output file is/will be XML, then the user may wish to exclude the XML entities.
Therefore please provide a tick box option for the XML entitles to be excluded/included.
Give the user choice for each subset, as this makes good sense, and the filter becomes more versatile.
- ☑ Include the predefined XML entities
☑ Include the defined HTML entities
☑ Include NCRs (Numeric character references)
btw. The original name for this filter was somewhat inaccurate, in that it referred to Numeric HTML entities,
whereas in fact the proper name for those is numeric character reference (NCR).
See http://en.wikipedia.org/wiki/Numeric_ch ... _reference
David
David