Page 1 of 1

Text formatting

Posted: Sun Oct 28, 2012 2:16 pm
by Aircut
I face the job of cleaning up malformed essays.

some of the writers leave no space after the full stop and other have an extra space before... same for commas, exclamation marks and question marks.

my question, is how to create a filter that removes unwanted space between the words and the full stop point, and adds one space after it, doing it to the entire block of text BUT skipping email addresses and URLs...

thank for any hints

Re: Text formatting

Posted: Mon Oct 29, 2012 11:29 pm
by DataMystic Support
The perl pattern you want to use is:

Code: Select all

 *?([,\!\?]) *?
Replace with

Code: Select all

$1 
For emails and URLs, you will need to use a different strategy for handling periods, perhaps replace periods in urls and hyperlinks with tabs temporarily (using a restriction), then use a perl pattern of:

Code: Select all

 *?([\.,\!\?]) *?
Replace with

Code: Select all

$1