Page 1 of 1

Remove CRLF To Reformat Text Files

Posted: Sat Jan 28, 2012 3:20 am
by MilesDadRobin
Hi,

I've searched for hours both in help and this forum but can't find the answer to what I'm sure must be a simple question so sorry if this is an obvious answer.


I have text files which have been hard formatted with lines word-wrapped at a fixed number of characters with a CRLF.

I want to remove the CRLF between words, so the text can flow in longer paragraphs.

So I've tried lots of permutations of finding:

word character - space - CR - LF - word character

and replacing with:

word character - space - word character



I can successfully find the patter, but can't replace.

I've tried loads of variations of:

$1 $5

to replace the word character, then a space, then the last word character, but without the CRLF

each time I get a "$1i is not a valid subexpression"

I'm obviously making a simple mistake, and after all these hours spent searching, would be really grateful for some help?

Thanks,

Robin

Re: Remove CRLF To Reformat Text Files

Posted: Tue Jan 31, 2012 2:42 am
by dfhtextpipe
If you don't have too many files to process, it's a simple task in Notepad++ to select any paragraph and use the Join option in the Edit menu.
The shortcut {Ctrl-J} does this quickly. Visit http://notepad-plus-plus.org/ for details.
Notepad++ is a great editor to have in your system alongside TextPipe - it's what I use to design my filters and test the output files.

With TextPipe, the trick is to know what needs to be kept separate.
It helps if each of the paragraphs to be joined are separated by a blank line.

A TextPipe filter to reformat such paragraphs should not require to make use of any $ variables.
After all, you merely want to replace the CR LF by a single space within successive sections of text.

However, if the EOLs are also splitting words, then the task becomes more complex.

David

Re: Remove CRLF To Reformat Text Files

Posted: Tue Jan 31, 2012 8:39 am
by MilesDadRobin
Hi David,

Thanks for replying. It's a little bit more than just replacing plain CRLFs

I'm looking for:

---------------------
....word
CRLF
word.....
---------------------


where a line was word-wrapped with a hard CRLF, and replace with:

---------------------
...word word...
---------------------


with just a space between.

If I just replace all CRLFs with a space, the entire document just becomes one long unreadable line.

I can find the ...wordCRLFword.... pattern, but just can't find the right syntact to replace the words on either side, with a space in between instead of a CRLF.

The Notepad++ hint looks very useful by the way.

Thanks again.

Robin

Re: Remove CRLF To Reformat Text Files

Posted: Tue Jan 31, 2012 4:29 pm
by DataMystic Support
Hi Robin,

Find perl pattern:
(\w) ?\r\n(\w)

Replace with
$1 $2

Where \w means a 'word' character, and ' ?' means to find an optional space at the end of the line.

Re: Remove CRLF To Reformat Text Files

Posted: Wed Feb 01, 2012 7:04 am
by MilesDadRobin
That was exactly what I needed.

Thanks for your help!

Robin