Is it possible to delete non-word characters from Start of Line and from End of Line?
Thank You.
Delete non-word characters
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Deleting non-word characters
Use a Perl pattern replace list
Code: Select all
^(\W+) by nothing
(\W+)$ by nothing
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Perl patterns - greedy or not greedy
Click on the button next to the Perl pattern (labelled with 3 dots).
Ensure that greedy matching is ticked.This works with your example.
Ensure that greedy matching is ticked.
Code: Select all
Filter List
-----------
Filter options
| [ ] Log to file
| [X] Append to logfile
| Log filename: textpipe.log
| Threshold 500
|
|--Input from file(s)
| [ ] Confirm before processing each file
| [ ] Confirm before processing read/only files
| [ ] Delete input files after processing
| Process binary files
|
|--Comment...
| Remove non Word characters from start and end of lines
|
|--Perl pattern [^(\W+)] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
| [X] Maximum match (greedy)
| [ ] Allow comments
| [ ] '.' matches newline
| [ ] UTF-8 Support
Perl pattern [(\W+)$] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
| [X] Maximum match (greedy)
| [ ] Allow comments
| [ ] '.' matches newline
| [ ] UTF-8 Support
|
+--Output to file(s)
[ ] Only update date on changed files
[X] Keep original file's date and time
[ ] Append mode
[ ] Change extension to: .txt
Backup mode
Files List
----------
Thanks' it works!!!
There is one more question - can I use IF-condition?
For example I have a line with latin and cyrillic characters:
some text, <cyrillic1> some text <cyrillic2> <cyrillic3> some text <cyrillic4>...
If between cyrillic words there are more then X characters I need to place carriage return after last cyrillic word.
For example in this case I need:
some text, <cyrillic1>
some text <cyrillic2> <cyrillic3>
some text <cyrillic4>...
I know how to identify cyrillic words - [a-z].
Is it possible to use IF-condition, or how to make such transform?
There is one more question - can I use IF-condition?
For example I have a line with latin and cyrillic characters:
some text, <cyrillic1> some text <cyrillic2> <cyrillic3> some text <cyrillic4>...
If between cyrillic words there are more then X characters I need to place carriage return after last cyrillic word.
For example in this case I need:
some text, <cyrillic1>
some text <cyrillic2> <cyrillic3>
some text <cyrillic4>...
I know how to identify cyrillic words - [a-z].
Is it possible to use IF-condition, or how to make such transform?
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Cyrillic and Latin text in the same line
Coping with Cyrillic and Latin in the same line is much more difficult. The Cyrillic text could be encoded either as Unicode or as Codepage 1251 (MS-Windows ANSI), or as various other methods such as KOI8 as used on Apple Macintosh.
Since you are processing stuff found in emails, then presumably you can't be sure what platform the text originated from.
Do you already know how the Cyrillic text is encoded?
Do you anticipate coping with any other scripts apart from Latin and Cyrillic?
btw. TextPipe Standard can process Unicode files, but the special filters needed to convert between Unicode text encodings are in TextPipe Pro.
Since you are processing stuff found in emails, then presumably you can't be sure what platform the text originated from.
Do you already know how the Cyrillic text is encoded?
Do you anticipate coping with any other scripts apart from Latin and Cyrillic?
btw. TextPipe Standard can process Unicode files, but the special filters needed to convert between Unicode text encodings are in TextPipe Pro.