split files question

hello · Post by **hello** » Sun Jun 12, 2005 9:27 am

Dear sir:

I cannot split chinese text files from this software,the resultant files become cannot read,character have gibberish and garbish.
How to configure it to enable split chinese character?
Might it not supporting chinese string splitting,isn,t it?
If so,future version will implement this or not?

I hope you can help me.

hello · Post by **hello** » Sun Jun 12, 2005 11:18 am

I want to split files that is in chinese character,after 700 chinese characters ,I want to split at this.
how to configure it to do so?
=per files have 700 chinese characters in it,and splits according to this rule

thanks you

hello · Post by **hello** » Sun Jun 12, 2005 10:18 pm

sorry,I have one more question.
how to wrap chinese so that it will not become gibberish character,because chinese is double byte character that when splitting it and wrapping it then the output become unreadable in both case.

thanks

Post by **DataMystic Support** » Mon Jun 13, 2005 10:24 am

First convert the file to Unicode (UTF16LE), and then ensure that the split position is a multiple of 2 - eg for 700 chinese characters, split at 1400.

Guest · Post by **Guest** » Tue Jun 14, 2005 10:58 am

I convert it from ansi to unicode in textpipe.after that,I want to split it after 700 characters(that you told me equal to 1400 )
but,I cannot find this choice of split,in the characters pull down
menu,what to choose in order to do this?=split according to how much characters are reached.(I only find ,.and some english alphabat to choose from the drop down menu)

next question :when I convert from ansi to unicode,and to to wrap at 58 column width,but the result did not wrap totally.
I can wrap it successfully only when it is in ansi(original encoding).why is this like that?

Guest · Post by **Guest** » Tue Jun 14, 2005 11:02 am

I mean the document is not wrapped at all,still have the very long width .because some document is longer than other in term of lines ,therefore I have to wrap all the various document in the same width (58)in order to split them so that each document have the same length.otherwise it is difficult to predict the length of each pages if not wrap it first.

Post by **DataMystic Support** » Tue Jun 14, 2005 1:21 pm

The wrap filter only works on ANSI/ASCII data, not on Unicode. So perhaps you can perform the wrapping filter first?

For the split filter, choose Filters Menu\Special\Split Files, then choose option 1 - 'Split at size of (bytes)', and type in 1400 in the box.

Guest · Post by **Guest** » Wed Jun 15, 2005 9:40 am

but if wrap the ansi first,that the character become garbish and unreadable because chinese is double byte .
therefore it is the same problem again .

or is there a method to change multiple span of lines in a notepad to become shorter?(similar to wrap)

for example:
1 Styles can be applied quickly to selected text.Styles can
be applied quickly to selected text.Styles can be applied quickly to selected text.Styles can be applied quickly to selected text.
2 Styles can be applied quickly to selected text.

change it to shorter characters per line:

1text.Styles can be applied quickly to selected text.Styles 2can be applied quickly to selected text.Styles can be 3applied quickly to selected text.
4Styles can be applied quickly to selected text.
5Styles can be applied quickly to selected text.

in some files a line have rather long and mutiple lines ,and I want all files have the same line length in order to split them more evenly (so that I can use split filter at 18 lines )

Post by **DataMystic Support** » Wed Jun 15, 2005 11:32 am

Yes, the wrap filter is ANSI-based, for some reason I thought you were dealing with multi-byte instead of double-byte data. Ignore my earlier comment.

The Split filter at 18 lines also will not work on Unicode data.

Use the Unicode pattern match -

To find 58 characters:

.{58}

replace with those 58 plus a line feed.

$0\r\n

Guest · Post by **Guest** » Wed Jun 15, 2005 10:13 pm

I have tried to split it at 1400 bytes after conversion from ansi to unicode.after that,the gibberish dissappear a lot,but when I have twenty splitted files (after split)there will be 3 or 4 files appeared to be blank and no content.

I also tried to search and replace using .{58}(easy pattern,perl pattern,etc)but there is no result after that operation(the same as previous state ),may be it is not apply to chinese character,I guess.

Anyway,I find this software very good and excellent.
I hope in the future there might have more new feature to suit other characters charset.

Thanks for your kind reply.

:p

Post by **DataMystic Support** » Thu Jun 16, 2005 9:57 am

Chinese characters cannot be represented in ANSI - so why are you doing an ANSI to Unicode conversion?

If the data is in Big5 format, first do a Big5 -> Unicode conversion.

DataMystic

split files question

split files question

split at certain characters

wrap and split in chinese