Remove CRLF To Reformat Text Files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
MilesDadRobin
Posts: 3
Joined: Sat Jan 28, 2012 3:04 am

Remove CRLF To Reformat Text Files

Post by MilesDadRobin »

Hi,

I've searched for hours both in help and this forum but can't find the answer to what I'm sure must be a simple question so sorry if this is an obvious answer.


I have text files which have been hard formatted with lines word-wrapped at a fixed number of characters with a CRLF.

I want to remove the CRLF between words, so the text can flow in longer paragraphs.

So I've tried lots of permutations of finding:

word character - space - CR - LF - word character

and replacing with:

word character - space - word character



I can successfully find the patter, but can't replace.

I've tried loads of variations of:

$1 $5

to replace the word character, then a space, then the last word character, but without the CRLF

each time I get a "$1i is not a valid subexpression"

I'm obviously making a simple mistake, and after all these hours spent searching, would be really grateful for some help?

Thanks,

Robin
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Remove CRLF To Reformat Text Files

Post by dfhtextpipe »

If you don't have too many files to process, it's a simple task in Notepad++ to select any paragraph and use the Join option in the Edit menu.
The shortcut {Ctrl-J} does this quickly. Visit http://notepad-plus-plus.org/ for details.
Notepad++ is a great editor to have in your system alongside TextPipe - it's what I use to design my filters and test the output files.

With TextPipe, the trick is to know what needs to be kept separate.
It helps if each of the paragraphs to be joined are separated by a blank line.

A TextPipe filter to reformat such paragraphs should not require to make use of any $ variables.
After all, you merely want to replace the CR LF by a single space within successive sections of text.

However, if the EOLs are also splitting words, then the task becomes more complex.

David
David
MilesDadRobin
Posts: 3
Joined: Sat Jan 28, 2012 3:04 am

Re: Remove CRLF To Reformat Text Files

Post by MilesDadRobin »

Hi David,

Thanks for replying. It's a little bit more than just replacing plain CRLFs

I'm looking for:

---------------------
....word
CRLF
word.....
---------------------


where a line was word-wrapped with a hard CRLF, and replace with:

---------------------
...word word...
---------------------


with just a space between.

If I just replace all CRLFs with a space, the entire document just becomes one long unreadable line.

I can find the ...wordCRLFword.... pattern, but just can't find the right syntact to replace the words on either side, with a space in between instead of a CRLF.

The Notepad++ hint looks very useful by the way.

Thanks again.

Robin
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Remove CRLF To Reformat Text Files

Post by DataMystic Support »

Hi Robin,

Find perl pattern:
(\w) ?\r\n(\w)

Replace with
$1 $2

Where \w means a 'word' character, and ' ?' means to find an optional space at the end of the line.
MilesDadRobin
Posts: 3
Joined: Sat Jan 28, 2012 3:04 am

Re: Remove CRLF To Reformat Text Files

Post by MilesDadRobin »

That was exactly what I needed.

Thanks for your help!

Robin
Post Reply