Page 1 of 1

remove errant CRLF from customer emails

Posted: Thu Mar 06, 2008 4:14 am
by sheridany
I have to remove CRLF from customer emails who write us and organize the data into one row per customer. The CRLF in the text data needs to be removed until we encounter a 3 digits number which starts a new line.
it look like this....

001|DEC2007|I am frustrated with the bank
being able to help me
Please call me.
002|DEC2007| This is craxy. I have never seen
anything
like
this.

Posted: Thu Mar 06, 2008 8:11 am
by DataMystic Support
Use the EasyPattern:

Code: Select all

[ 3 digits, 1+ chars, mustEndWith( cr, lf, 3 digits ) ]
Replace with:

Code: Select all

$0
Add a subfilter to this to replace EasyPattern

Code: Select all

[cr, lf]
with nothing.

Perhaps I am missing something?

Posted: Thu Mar 06, 2008 9:13 am
by sheridany
I tried what you said but it did not work right. I assumed you meant a Easy Pattern subfilter under the Subfilter that replaces
[ 3 digits, 1+ chars, mustEndWith( cr, lf, 3 digits ) ] with $0?

I must be missing something....

Here is the filter export.

Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Remove multiple whitespace
|
|--EasyPattern [[ 3 digits, 1+ chars, mustEndWith( cr, lf, 3 digits ) ]] with [$0]
| | [ ] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| | Maximum text buffer size 4096
| |
| +--EasyPattern [[cr, lf]] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096

Posted: Thu Mar 06, 2008 9:46 am
by DataMystic Support
Ok, I missed a couple of things. It does not process the last line, and it should be replacing the embedded cr.lf with a space.

I worked around this by adding a ascii(255) (hex \xff) character at the start of each record to prevent it joining all the records together. These get removed at the end

Here is the new filter:

Code: Select all

|   
|--EasyPattern [[ linestart, 3 digits ]] with [\xff$0]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 4096
|   
|--EasyPattern [[ ascii($ff), capture(3 digits, longest 1+ not ascii($ff)) ]] with [$0]
|  |  [ ] Match case
|  |  [ ] Whole words only
|  |  [ ] Case sensitive replace
|  |  [X] Prompt on replace
|  |  [ ] Skip prompt if identical
|  |  [ ] First only
|  |  [ ] Extract matches
|  |  Maximum text buffer size 4096
|  |
|  |--EasyPattern [[cr, lf]] with [ ]
|  |     [ ] Match case
|  |     [ ] Whole words only
|  |     [ ] Case sensitive replace
|  |     [X] Prompt on replace
|  |     [ ] Skip prompt if identical
|  |     [ ] First only
|  |     [ ] Extract matches
|  |     Maximum text buffer size 4096
|  |   
|  +--Add footer [\r\n]
|      
|--EasyPattern [[ ascii($ff) ]] with []
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 4096
|   
I can also email you this filter if you drop us an email

Not quite yet

Posted: Fri Mar 07, 2008 6:47 am
by sheridany
All I want to do is get rid of the carriage returns before the start of a new line. The new line always start with the id of 001, 002 003 etc. The customer has imposed the cr lf when they are typing the message to us. Our downstream application can't handle the cr lf because it thinks it is a new line when it is not. This is a pipe delimited file if that helps. The simple thing to do is perhaps use the restrict filter and just remove the cr lf from the third field?

I apologize for not clarifying up front better.

The first and second parsed and processed line would look like this
001|DEC2007|I am frustrated with the bank being able to help.....
002|DEC2007|This is crazy. I have never seen anything like this.


Old
001|DEC2007|I am frustrated with the bank
being able to help me
Please call me.
002|DEC2007| This is crazy. I have never seen
anything
like
this.

Posted: Fri Mar 07, 2008 7:42 am
by DataMystic Support
Just send us an email and we can send you the filter above, which fixes the problem.

did you get my email?

Posted: Sat Mar 08, 2008 3:59 am
by sheridany
I sent it to simon.carter at datamystic.com. Is that correct?

Posted: Tue Mar 11, 2008 11:20 am
by DataMystic Support
Yes - we already emailed you the filter (last week). Check your corporate filtering for it.

Posted: Wed Mar 12, 2008 9:55 am
by DataMystic Support
Hi Sheridan,

Clearly something is wrong with your companies email filtering, and I'm prepared to bet that my company is not the only one with the problem. It is a huge waste of our resources to constantly resend and re-reply to emails because of external filtering issues - and understandably you get frustrated with our apparent lack of response.

Please get a gmail account and use that for contacting us in future. I'll be happy to send you the filter to your gmail account.

Email Filtering

Posted: Thu Mar 13, 2008 5:19 am
by sheridany
At least now we know it is on our side. I have a yahoo account that I can access at work. It is the same as my screen name in the forum here @yahoo.com. Will that work? I at least have access to anything that goes to their spam filter and can flag it as otherwise.