split files question

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
hello

split files question

Post by hello »

Dear sir:

I cannot split chinese text files from this software,the resultant files become cannot read,character have gibberish and garbish.
How to configure it to enable split chinese character?
Might it not supporting chinese string splitting,isn,t it?
If so,future version will implement this or not?

I hope you can help me.
hello

split at certain characters

Post by hello »

I want to split files that is in chinese character,after 700 chinese characters ,I want to split at this.
how to configure it to do so?
=per files have 700 chinese characters in it,and splits according to this rule

thanks you
hello

wrap and split in chinese

Post by hello »

sorry,I have one more question.
how to wrap chinese so that it will not become gibberish character,because chinese is double byte character that when splitting it and wrapping it then the output become unreadable in both case.

thanks
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

First convert the file to Unicode (UTF16LE), and then ensure that the split position is a multiple of 2 - eg for 700 chinese characters, split at 1400.
Guest

Post by Guest »

I convert it from ansi to unicode in textpipe.after that,I want to split it after 700 characters(that you told me equal to 1400 )
but,I cannot find this choice of split,in the characters pull down
menu,what to choose in order to do this?=split according to how much characters are reached.(I only find ,.and some english alphabat to choose from the drop down menu)

next question :when I convert from ansi to unicode,and to to wrap at 58 column width,but the result did not wrap totally.
I can wrap it successfully only when it is in ansi(original encoding).why is this like that?
Guest

Post by Guest »

I mean the document is not wrapped at all,still have the very long width .because some document is longer than other in term of lines ,therefore I have to wrap all the various document in the same width (58)in order to split them so that each document have the same length.otherwise it is difficult to predict the length of each pages if not wrap it first.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

The wrap filter only works on ANSI/ASCII data, not on Unicode. So perhaps you can perform the wrapping filter first?

For the split filter, choose Filters Menu\Special\Split Files, then choose option 1 - 'Split at size of (bytes)', and type in 1400 in the box.
Guest

Post by Guest »

but if wrap the ansi first,that the character become garbish and unreadable because chinese is double byte .
therefore it is the same problem again .

or is there a method to change multiple span of lines in a notepad to become shorter?(similar to wrap)

for example:
1 Styles can be applied quickly to selected text.Styles can
be applied quickly to selected text.Styles can be applied quickly to selected text.Styles can be applied quickly to selected text.
2 Styles can be applied quickly to selected text.

change it to shorter characters per line:

1text.Styles can be applied quickly to selected text.Styles 2can be applied quickly to selected text.Styles can be 3applied quickly to selected text.
4Styles can be applied quickly to selected text.
5Styles can be applied quickly to selected text.

in some files a line have rather long and mutiple lines ,and I want all files have the same line length in order to split them more evenly (so that I can use split filter at 18 lines )
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Yes, the wrap filter is ANSI-based, for some reason I thought you were dealing with multi-byte instead of double-byte data. Ignore my earlier comment.

The Split filter at 18 lines also will not work on Unicode data.

Use the Unicode pattern match -

To find 58 characters:

.{58}

replace with those 58 plus a line feed.

$0\r\n
Guest

Post by Guest »

I have tried to split it at 1400 bytes after conversion from ansi to unicode.after that,the gibberish dissappear a lot,but when I have twenty splitted files (after split)there will be 3 or 4 files appeared to be blank and no content.

I also tried to search and replace using .{58}(easy pattern,perl pattern,etc)but there is no result after that operation(the same as previous state ),may be it is not apply to chinese character,I guess.

Anyway,I find this software very good and excellent.
I hope in the future there might have more new feature to suit other characters charset.

Thanks for your kind reply.

:p
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Chinese characters cannot be represented in ANSI - so why are you doing an ANSI to Unicode conversion?

If the data is in Big5 format, first do a Big5 -> Unicode conversion.
Post Reply