Convert Word documents to text filter
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Convert Word documents to text filter
Is there any good reason why the filter to Convert Word documents to text leaves the BOM in the UTF-8 text output file?
The help page for this filter does not state that the BOM may need to be removed afterwards!
Best regards,
David
The help page for this filter does not state that the BOM may need to be removed afterwards!
Best regards,
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Convert Word documents to text filter
At the start of the file? Or at other locations?
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Convert Word documents to text filter
At the usual place for a BOM - the start of the file.
Aside: I don't regard U+FEFF ZERO WIDTH NO-BREAK SPACE [BOM, ZWNBSP] : BOM, ZWNBSP at any other location as a BOM.
In other locations, it functions as a ZWNBSP, which was not part of my report.
Best regards,
David
Aside: I don't regard U+FEFF ZERO WIDTH NO-BREAK SPACE [BOM, ZWNBSP] : BOM, ZWNBSP at any other location as a BOM.
In other locations, it functions as a ZWNBSP, which was not part of my report.
Best regards,
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Convert Word documents to text filter
I will adjust the help file to include this.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Convert Word documents to text filter
In the early days of Unicode, the naming convention was
- UTF-8 without BOM
- UTF-8
The latter had the BOM implicitly.
More recently, the naming convention has changed to
- UTF-8
- UTF-8 with BOM
The former now has no BOM implicitly.
The convention change was recognized and implemented by the Notepad++ text editor developers several years back.
i.e. In the respective options of its Encoding menu.
It behoves TextPipe to also recognize the change and for the UI and Help files to be consistent to the current convention.
David
- UTF-8 without BOM
- UTF-8
The latter had the BOM implicitly.
More recently, the naming convention has changed to
- UTF-8
- UTF-8 with BOM
The former now has no BOM implicitly.
The convention change was recognized and implemented by the Notepad++ text editor developers several years back.
i.e. In the respective options of its Encoding menu.
It behoves TextPipe to also recognize the change and for the UI and Help files to be consistent to the current convention.
David
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Convert Word documents to text filter
My comment of June 6 last year is also relevant to the use of UTF-8 in the new JSON text format for saved filter files in TextPipe v11.4
Has anything been done to remove the BOM from text files output by the Word documents to text filter?
David
Has anything been done to remove the BOM from text files output by the Word documents to text filter?
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Convert Word documents to text filter
Yes - this was removed in TextPipe 11.4
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK