Page 1 of 1

Splitting word files

Posted: Mon Aug 15, 2005 10:30 pm
by MJ

I need to split a word file into lots of small snippets of texts and have succeeded doing that with the 'split file' filter. However, there's a problem with the 'garbage' at the beginning and the end of the word file.

I used a non-alphabetic character (134) to split the word file as needed (resulting in a few hundred text snippet files). So I don't want to convert the word file into another format before applying the rather successful split filter.

However, there are some unwelcome side-effects that I want to get rid of
a) a corrupted first file
b) a 'last' meaningful file filled with some text + some 'garbage'
c) after that dozens of snippet files filled exlcusively with 'garbage'.

What's the best way to clean the two useful files (a and b) of the garbage and get rid of all the variant c) files?

Any help is highly appreciated


Posted: Wed Aug 17, 2005 12:22 pm
by DataMystic Support
Hi MJ,

Word uses a binary format, which will have lots of garbage in it. Why not precede your split file with a Convert\Convert Word document to text filter?

Posted: Wed Aug 17, 2005 5:21 pm
by Guest
Hello Simon,

yes, thanks for the hint. Eventually I have gone that route.

In my first attempt to split the files I wanted to avoid that by making use of a binary code that is gone after a transfer to the text format. However, I found alternative catchwords to get the splits where I want them.

Thanks again