Unexpectly removed "end of line characters" on UNICODE files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
cutebuddy
Posts: 2
Joined: Mon Aug 24, 2009 4:29 pm

Unexpectly removed "end of line characters" on UNICODE files

Post by cutebuddy »

My knowledge level about unicode-big5-EOL_characters and English skill :D is newbie.

I have got a problem when trying to convert some big5 encodeing files to gbk ones.

The file content seens like below in a Chinese winxp system notepad:
材彻笿
材彻鹤眔褐
材彻皑初
材き彻刊┬动
材せ彻ド
材彻硔
材彻疷隔硔
and please note that this block now is multi-line.

I setup a filter "Convert from BIG5 to GBK" and go run( or Trial run can also show the same meaning ). The Chinese character has been expectly converted, now it seens like:
卷九 第一章 陰癸魅影第二章 荒村奇遇第三章 因禍得福第四章 飛馬牧場第五章 膳房爭雄第六章 美人如玉第七章 後山奇逢第八章 狹路相逢
and this is ONE line..... I hope the result should be:
卷九 第一章 陰癸魅影
第二章 荒村奇遇
第三章 因禍得福
第四章 飛馬牧場
第五章 膳房爭雄
第六章 美人如玉
第七章 後山奇逢
I tried to cover some related filters like "End of line characters", ANSI to/from unicode and as a newbie, I can't get the hope relust so far.

I tried to "Analyze file" the original big5 file, it got: Encoding: ASCII or ANSI (or UTF-8 without BOM), No BOM, No end of line characters found - likely a mainframe or fixed-length record format, Unknown format. And this information may help? Why 'No end of line' and otherwise the converting result above shows that the #13#10 character has been removed? ( I use Winhex and notice the origial file contains the 0x0D0A charater. )

Another notice: In "File input" page, I have to set Binary files - Process, otherwise all the files would be skipped.

Thanks.
cutebuddy
Posts: 2
Joined: Mon Aug 24, 2009 4:29 pm

Re: Unexpectly removed "end of line characters" on UNICODE files

Post by cutebuddy »

The version is:  TextPipe Pro 8.1.10 Evaluation Edition.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Unexpectly removed "end of line characters" on UNICODE files

Post by DataMystic Support »

Hi there. Please upgrade to v8.3.7.

TextPipe doesn't ever remove or insert extract characters - you must tell it what to do. So I am not sure why this is happening.

If you convert from BIG5 to UTF-8, what do you see in Notepad? The same problem?
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unexpectly removed "end of line characters" on UNICODE files

Post by dfhtextpipe »

If all you wish to do is to change a file encoding, then perhaps TextPipe is not the most appropriate tool.

I'm not being a "wet blanket", as I do regard TextPipe as one of the best pieces of software I have ever purchased.

Nevertheless, what you want to achieve might be as simple opening a file with a suitable Windows text editor and changing the encoding, then resaving.

[Ed: Which ignores the whole point of using TextPipe to automate text processing]

Best regards,

David
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Unexpectly removed "end of line characters" on UNICODE files

Post by dfhtextpipe »

As you are confessedly a Unicode newbie, this website should be of immense benefit.

Alan Wood’s Unicode Resources: http://www.alanwood.net/unicode/

David
David
Post Reply