Page 1 of 1

Maximum match size?

Posted: Wed Dec 22, 2010 4:42 am
by dfhtextpipe
In find type there is a dialog for Maxium match size.

For a newly inserted filter, the default is always 4096 bytes, yet the legend has Maxium match size (default is 32768 bytes).

The mouseover tooltip says,
"If the text you are trying to match is longer than 4096 characters,
increase this value to the length of text you are trying to match"

Software bug:
The legend is clearly wrong by referring to a "default" value. This is a UI error.

Yet there does seem to be an absolute maximum limit in terms of how these filtere work.
Entering values higher than 32768 seems to make no difference to what is found.

This begs the question,
Why allow the user to enter a value beyond what the filter is capable of supporting?

Application issue:
For the use I have in mind, 32768 is too small as an absolute limit for the maximum match size.
I'm using a filter to detect matched pairs of double angle quotation marks.
Some of the chunkier quotations in the text file are longer than 32768 characters.

It would seem to be the case that the wrong type of variable is being used for the match counter.
Probably programmed type is byte, rather than long integer. The latter would be a real improvement.

David Haslam

Re: Maximum match size?

Posted: Wed Dec 22, 2010 8:07 am
by DataMystic Support
Hi David,

1. We've fixed the default showingin the caption (thanks for that)
2. The coding allows for a match size of 2GB, although this would likely be very slow. If you have an example where the match size is large and does not work I'd be happy to check it for you.

Re: Maximum match size?

Posted: Thu Dec 23, 2010 4:20 am
by dfhtextpipe
Thanks, Simon.

The original task involved searching for matching pairs of «quotation marks» over many lines of Arabic text.

Today, I adopted a different approach for the issue I was using TextPipe to investigate.
I used TextPipe on the RTF files instead of the UTF-8 text files, and inserted color highlighting.

Code: Select all

Comment...
|  Special filter to highlight all quotations in a Wordpad RTF file
|
|--Comment...
|  |  Insert the color table to define yellow highlighting
|  |
|  +--Comment...
|     |  highlight1 = yellow
|     |  highlight2 = red
|     |  highlight3 = green
|     |  highlight4 = magenta
|     |
|     +--Insert lines at line 2 [{\\\\colortbl ;\\\\red255\\\\green255\\\\blue0;\\\\red255\\\\green0\\\\blue0;\\\\red0\\\\green255\\\\blue0;\\\\red255\\\\green0\\\\blue255;}\r\n]
|         
|--Comment...
|  |  Highlight text from each left pointing double angle quotation mark.
|  |  Unhighlight at each right pointing double angle quotation mark.
|  |
|  +--Perl pattern [^(.*)(\\'ab|\\'bb)(.*)$] with []
|     |  [X] Match case
|     |  [ ] Whole words only
|     |  [ ] Case sensitive replace
|     |  [ ] Prompt on replace
|     |  [ ] Skip prompt if identical
|     |  [ ] First only
|     |  [ ] Extract matches
|     |  Maximum text buffer size 32768
|     |  [X] Maximum match (greedy)
|     |  [ ] Allow comments
|     |  [ ] '.' matches newline
|     |  [X] UTF-8 Support
|     |
|     |--Perl pattern [(\\'ab)] with [\\highlight1$1]
|     |     [X] Match case
|     |     [ ] Whole words only
|     |     [ ] Case sensitive replace
|     |     [ ] Prompt on replace
|     |     [ ] Skip prompt if identical
|     |     [ ] First only
|     |     [ ] Extract matches
|     |     Maximum text buffer size 4096
|     |     [ ] Maximum match (greedy)
|     |     [ ] Allow comments
|     |     [ ] '.' matches newline
|     |     [X] UTF-8 Support
|     |   
|     +--Perl pattern [(\\'bb)] with [$1\\highlight0]
|           [X] Match case
|           [ ] Whole words only
|           [ ] Case sensitive replace
|           [ ] Prompt on replace
|           [ ] Skip prompt if identical
|           [ ] First only
|           [ ] Extract matches
|           Maximum text buffer size 4096
|           [ ] Maximum match (greedy)
|           [ ] Allow comments
|           [ ] '.' matches newline
|           [X] UTF-8 Support
|         
+--Comment...
   |  Find successive left pointing double angle quotation marks 
   |  with no right pointing one in between.
   |
   +--Perl pattern [(\\'ab)(.+)(\\'ab)] with []
      |  [X] Match case
      |  [ ] Whole words only
      |  [ ] Case sensitive replace
      |  [ ] Prompt on replace
      |  [ ] Skip prompt if identical
      |  [ ] First only
      |  [ ] Extract matches
      |  Maximum text buffer size 32768
      |  [ ] Maximum match (greedy)
      |  [ ] Allow comments
      |  [X] '.' matches newline
      |  [X] UTF-8 Support
      |
      +--Perl pattern [(.*)(\\'bb)(.*)] with []
         |  [X] Match case
         |  [ ] Whole words only
         |  [ ] Case sensitive replace
         |  [ ] Prompt on replace
         |  [ ] Skip prompt if identical
         |  [ ] First only
         |  [ ] Extract matches
         |  Maximum text buffer size 32768
         |  [X] Maximum match (greedy)
         |  [ ] Allow comments
         |  [X] '.' matches newline
         |  [X] UTF-8 Support
         |
         +--Perl pattern [^(\\'ab)] with [\\highlight2$1]
               [X] Match case
               [ ] Whole words only
               [ ] Case sensitive replace
               [ ] Prompt on replace
               [ ] Skip prompt if identical
               [ ] First only
               [ ] Extract matches
               Maximum text buffer size 4096
               [ ] Maximum match (greedy)
               [ ] Allow comments
               [ ] '.' matches newline
               [X] UTF-8 Support
             
Shared in case it might inspire others facing a similar challenge.

David

Re: Maximum match size?

Posted: Thu Dec 23, 2010 4:26 am
by dfhtextpipe
Simon,

I just spotted that copying part of my filter to paste into the code lines of my previous reply, there is a quirk.

The actual text my filter inserts for the color table is

{\\colortbl ;\\red255\\green255\\blue0;\\red255\\green0\\blue0;\\red0\\green255\\blue0;\\red255\\green0\\blue255;}

Double backslahes are to force a literal backslash rather than a Perl pattern.

When I copied the filter using the TextPipe context menu, it doubled the quantity of backslashes. Weird!

Is this a bug?

David

Re: Maximum match size?

Posted: Thu Dec 23, 2010 9:05 am
by DataMystic Support
Yes - this is definitely a bug - fixed for the next release.

Let me know if you can reproduce the error with the large match.