Splitting word files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
MJ

Splitting word files

Post by MJ »

Hello,

I need to split a word file into lots of small snippets of texts and have succeeded doing that with the 'split file' filter. However, there's a problem with the 'garbage' at the beginning and the end of the word file.

I used a non-alphabetic character (134) to split the word file as needed (resulting in a few hundred text snippet files). So I don't want to convert the word file into another format before applying the rather successful split filter.

However, there are some unwelcome side-effects that I want to get rid of
a) a corrupted first file
b) a 'last' meaningful file filled with some text + some 'garbage'
c) after that dozens of snippet files filled exlcusively with 'garbage'.

What's the best way to clean the two useful files (a and b) of the garbage and get rid of all the variant c) files?

Any help is highly appreciated

MJ
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi MJ,

Word uses a binary format, which will have lots of garbage in it. Why not precede your split file with a Convert\Convert Word document to text filter?
Guest

Post by Guest »

Hello Simon,

yes, thanks for the hint. Eventually I have gone that route.

In my first attempt to split the files I wanted to avoid that by making use of a binary code that is gone after a transfer to the text format. However, I found alternative catchwords to get the splits where I want them.

Thanks again
MJ
Post Reply