Page 1 of 1

split files question

Posted: Sun Jun 12, 2005 9:27 am
by hello
Dear sir:

I cannot split chinese text files from this software,the resultant files become cannot read,character have gibberish and garbish.
How to configure it to enable split chinese character?
Might it not supporting chinese string splitting,isn,t it?
If so,future version will implement this or not?

I hope you can help me.

split at certain characters

Posted: Sun Jun 12, 2005 11:18 am
by hello
I want to split files that is in chinese character,after 700 chinese characters ,I want to split at this.
how to configure it to do so?
=per files have 700 chinese characters in it,and splits according to this rule

thanks you

wrap and split in chinese

Posted: Sun Jun 12, 2005 10:18 pm
by hello
sorry,I have one more question.
how to wrap chinese so that it will not become gibberish character,because chinese is double byte character that when splitting it and wrapping it then the output become unreadable in both case.

thanks

Posted: Mon Jun 13, 2005 10:24 am
by DataMystic Support
First convert the file to Unicode (UTF16LE), and then ensure that the split position is a multiple of 2 - eg for 700 chinese characters, split at 1400.

Posted: Tue Jun 14, 2005 10:58 am
by Guest
I convert it from ansi to unicode in textpipe.after that,I want to split it after 700 characters(that you told me equal to 1400 )
but,I cannot find this choice of split,in the characters pull down
menu,what to choose in order to do this?=split according to how much characters are reached.(I only find ,.and some english alphabat to choose from the drop down menu)

next question :when I convert from ansi to unicode,and to to wrap at 58 column width,but the result did not wrap totally.
I can wrap it successfully only when it is in ansi(original encoding).why is this like that?

Posted: Tue Jun 14, 2005 11:02 am
by Guest
I mean the document is not wrapped at all,still have the very long width .because some document is longer than other in term of lines ,therefore I have to wrap all the various document in the same width (58)in order to split them so that each document have the same length.otherwise it is difficult to predict the length of each pages if not wrap it first.

Posted: Tue Jun 14, 2005 1:21 pm
by DataMystic Support
The wrap filter only works on ANSI/ASCII data, not on Unicode. So perhaps you can perform the wrapping filter first?

For the split filter, choose Filters Menu\Special\Split Files, then choose option 1 - 'Split at size of (bytes)', and type in 1400 in the box.

Posted: Wed Jun 15, 2005 9:40 am
by Guest
but if wrap the ansi first,that the character become garbish and unreadable because chinese is double byte .
therefore it is the same problem again .

or is there a method to change multiple span of lines in a notepad to become shorter?(similar to wrap)

for example:
1 Styles can be applied quickly to selected text.Styles can
be applied quickly to selected text.Styles can be applied quickly to selected text.Styles can be applied quickly to selected text.
2 Styles can be applied quickly to selected text.

change it to shorter characters per line:

1text.Styles can be applied quickly to selected text.Styles 2can be applied quickly to selected text.Styles can be 3applied quickly to selected text.
4Styles can be applied quickly to selected text.
5Styles can be applied quickly to selected text.

in some files a line have rather long and mutiple lines ,and I want all files have the same line length in order to split them more evenly (so that I can use split filter at 18 lines )

Posted: Wed Jun 15, 2005 11:32 am
by DataMystic Support
Yes, the wrap filter is ANSI-based, for some reason I thought you were dealing with multi-byte instead of double-byte data. Ignore my earlier comment.

The Split filter at 18 lines also will not work on Unicode data.

Use the Unicode pattern match -

To find 58 characters:

.{58}

replace with those 58 plus a line feed.

$0\r\n

Posted: Wed Jun 15, 2005 10:13 pm
by Guest
I have tried to split it at 1400 bytes after conversion from ansi to unicode.after that,the gibberish dissappear a lot,but when I have twenty splitted files (after split)there will be 3 or 4 files appeared to be blank and no content.

I also tried to search and replace using .{58}(easy pattern,perl pattern,etc)but there is no result after that operation(the same as previous state ),may be it is not apply to chinese character,I guess.

Anyway,I find this software very good and excellent.
I hope in the future there might have more new feature to suit other characters charset.

Thanks for your kind reply.

:p

Posted: Thu Jun 16, 2005 9:57 am
by DataMystic Support
Chinese characters cannot be represented in ANSI - so why are you doing an ANSI to Unicode conversion?

If the data is in Big5 format, first do a Big5 -> Unicode conversion.