Unexpectly removed "end of line characters" on UNICODE files
Posted: Mon Aug 24, 2009 5:33 pm
My knowledge level about unicode-big5-EOL_characters and English skill is newbie.
I have got a problem when trying to convert some big5 encodeing files to gbk ones.
The file content seens like below in a Chinese winxp system notepad:
I setup a filter "Convert from BIG5 to GBK" and go run( or Trial run can also show the same meaning ). The Chinese character has been expectly converted, now it seens like:
I tried to "Analyze file" the original big5 file, it got: Encoding: ASCII or ANSI (or UTF-8 without BOM), No BOM, No end of line characters found - likely a mainframe or fixed-length record format, Unknown format. And this information may help? Why 'No end of line' and otherwise the converting result above shows that the #13#10 character has been removed? ( I use Winhex and notice the origial file contains the 0x0D0A charater. )
Another notice: In "File input" page, I have to set Binary files - Process, otherwise all the files would be skipped.
Thanks.
I have got a problem when trying to convert some big5 encodeing files to gbk ones.
The file content seens like below in a Chinese winxp system notepad:
and please note that this block now is multi-line.材彻笿
材彻鹤眔褐
材彻皑初
材き彻刊┬动
材せ彻ド
材彻硔
材彻疷隔硔
I setup a filter "Convert from BIG5 to GBK" and go run( or Trial run can also show the same meaning ). The Chinese character has been expectly converted, now it seens like:
and this is ONE line..... I hope the result should be:卷九 第一章 陰癸魅影第二章 荒村奇遇第三章 因禍得福第四章 飛馬牧場第五章 膳房爭雄第六章 美人如玉第七章 後山奇逢第八章 狹路相逢
I tried to cover some related filters like "End of line characters", ANSI to/from unicode and as a newbie, I can't get the hope relust so far.卷九 第一章 陰癸魅影
第二章 荒村奇遇
第三章 因禍得福
第四章 飛馬牧場
第五章 膳房爭雄
第六章 美人如玉
第七章 後山奇逢
I tried to "Analyze file" the original big5 file, it got: Encoding: ASCII or ANSI (or UTF-8 without BOM), No BOM, No end of line characters found - likely a mainframe or fixed-length record format, Unknown format. And this information may help? Why 'No end of line' and otherwise the converting result above shows that the #13#10 character has been removed? ( I use Winhex and notice the origial file contains the 0x0D0A charater. )
Another notice: In "File input" page, I have to set Binary files - Process, otherwise all the files would be skipped.
Thanks.