Page 1 of 1
Restrict according to line length?
Posted: Fri Dec 17, 2010 7:50 pm
by dfhtextpipe
I recently encountered a need to restrict a sub-filter according to line length.
The text is UTF-8 (it's actually Arabic script), so length must be based on the number of Unicode characters in the line, rather than the number of bytes.
For one sub-filter, I'd like to restrict to lines shorter than a specified length.
For another sub-filter, I'd like to restrict to lines longer than a specified length.
Any suggestions?
David
Re: Restrict according to line length?
Posted: Fri Dec 17, 2010 9:59 pm
by dfhtextpipe
The following idea already occurred to me.
This filter inserts "£££" at the start of text lines shorter than 30 characters:
Code: Select all
Comment...
| Restrict on line length shorter than specified value
|
|--Add right margin [###]
|
|--Restrict columns:Column 1 .. column 33
| |
| +--Perl pattern [^(.+)###$] with []
| | [X] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [ ] Extract matches
| | Maximum text buffer size 4096
| | [X] Maximum match (greedy)
| | [ ] Allow comments
| | [ ] '.' matches newline
| | [X] UTF-8 Support
| |
| +--Add left margin [£££]
|
+--Perl pattern [###$] with []
[X] Match case
[ ] Whole words only
[ ] Case sensitive replace
[ ] Prompt on replace
[ ] Skip prompt if identical
[ ] First only
[ ] Extract matches
Maximum text buffer size 4096
[ ] Maximum match (greedy)
[ ] Allow comments
[ ] '.' matches newline
[X] UTF-8 Support
It works on ANSI text in the trial area.
Code: Select all
£££You can type sample text in
£££the Trial Run Input Area to
test if your filter is working
properly. Click the [Trial Run]
button below to start the test.
You can also perform Partial Trial
Runs by right-clicking on filters
£££in the Filter list.
To clear this text, just right
click it and select 'Clear Entire
£££Field' from the menu. Most
£££of TextPipe's fields have
£££similar helpful menus.
Not yet tried it with Arabic input file.
Re: Restrict according to line length?
Posted: Fri Dec 17, 2010 10:04 pm
by dfhtextpipe
With UTF-8 text, the problem would be that a restrict filter based on columns would in effect be counting bytes rather than characters.
This is the real difficulty I am seeking to solve.
So still seeking suggestions.
David
Re: Restrict according to line length?
Posted: Mon Dec 20, 2010 5:05 pm
by DataMystic Support
Can you use the UTF-8 mode of perl regex to capture the multi-byte characters?
Re: Restrict according to line length?
Posted: Wed Dec 22, 2010 4:48 am
by dfhtextpipe
Hello Simon,
Actually I gave up on this idea, and resorted to designing a filter to detect and mark "text styles" within the RTF file used at an earlier stage in my file preprocessing.
This was more reliable than using pattern length within a UTF-8 file as a diferentiator between two kinds of text line.
Thanks anyway.
David