Text formatting

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Aircut
Posts: 12
Joined: Sun Oct 28, 2012 2:09 pm

Text formatting

Post by Aircut »

I face the job of cleaning up malformed essays.

some of the writers leave no space after the full stop and other have an extra space before... same for commas, exclamation marks and question marks.

my question, is how to create a filter that removes unwanted space between the words and the full stop point, and adds one space after it, doing it to the entire block of text BUT skipping email addresses and URLs...

thank for any hints
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Text formatting

Post by DataMystic Support »

The perl pattern you want to use is:

Code: Select all

 *?([,\!\?]) *?
Replace with

Code: Select all

$1 
For emails and URLs, you will need to use a different strategy for handling periods, perhaps replace periods in urls and hyperlinks with tabs temporarily (using a restriction), then use a perl pattern of:

Code: Select all

 *?([\.,\!\?]) *?
Replace with

Code: Select all

$1 
Post Reply