Restrict according to line length?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Restrict according to line length?

Post by dfhtextpipe »

I recently encountered a need to restrict a sub-filter according to line length.

The text is UTF-8 (it's actually Arabic script), so length must be based on the number of Unicode characters in the line, rather than the number of bytes.

For one sub-filter, I'd like to restrict to lines shorter than a specified length.
For another sub-filter, I'd like to restrict to lines longer than a specified length.

Any suggestions?

David
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Post by dfhtextpipe »

The following idea already occurred to me.

This filter inserts "£££" at the start of text lines shorter than 30 characters:

Code: Select all

Comment...
|  Restrict on line length shorter than specified value
|
|--Add right margin [###]
|   
|--Restrict columns:Column 1 .. column 33
|  |
|  +--Perl pattern [^(.+)###$] with []
|     |  [X] Match case
|     |  [ ] Whole words only
|     |  [ ] Case sensitive replace
|     |  [ ] Prompt on replace
|     |  [ ] Skip prompt if identical
|     |  [ ] First only
|     |  [ ] Extract matches
|     |  Maximum text buffer size 4096
|     |  [X] Maximum match (greedy)
|     |  [ ] Allow comments
|     |  [ ] '.' matches newline
|     |  [X] UTF-8 Support
|     |
|     +--Add left margin [£££]
|         
+--Perl pattern [###$] with []
      [X] Match case
      [ ] Whole words only
      [ ] Case sensitive replace
      [ ] Prompt on replace
      [ ] Skip prompt if identical
      [ ] First only
      [ ] Extract matches
      Maximum text buffer size 4096
      [ ] Maximum match (greedy)
      [ ] Allow comments
      [ ] '.' matches newline
      [X] UTF-8 Support
    
It works on ANSI text in the trial area.

Code: Select all

£££You can type sample text in 
£££the Trial Run Input Area to
test if your filter is working 
properly. Click the [Trial Run]
button below to start the test.

You can also perform Partial Trial 
Runs by right-clicking on filters
£££in the Filter list.

To clear this text, just right 
click it and select 'Clear Entire 
£££Field' from the menu. Most 
£££of TextPipe's fields have 
£££similar helpful menus.
Not yet tried it with Arabic input file.
David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Post by dfhtextpipe »

With UTF-8 text, the problem would be that a restrict filter based on columns would in effect be counting bytes rather than characters.

This is the real difficulty I am seeking to solve.

So still seeking suggestions.

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Restrict according to line length?

Post by DataMystic Support »

Can you use the UTF-8 mode of perl regex to capture the multi-byte characters?
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict according to line length?

Post by dfhtextpipe »

Hello Simon,

Actually I gave up on this idea, and resorted to designing a filter to detect and mark "text styles" within the RTF file used at an earlier stage in my file preprocessing.

This was more reliable than using pattern length within a UTF-8 file as a diferentiator between two kinds of text line.

Thanks anyway.

David
David
Post Reply