Page 1 of 2

TextPipe 8.0 stops during a previously well-behaved filter

Posted: Sun Dec 09, 2007 2:59 am
by dfhtextpipe
One of my filters I've been using successfully without a hitch since I bought TextPipe Standard 7.9.5 in August has suddenly stopped working after I upgraded to version 8.0 and installed it in accordance with instructions.

The filter just stops at a certain number (98,304) of bytes, and I have to Cancel.

Since emailing Simon, I have isolated the problem to be somewhere within the following lines:

Code: Select all

|--Comment...
|  |  Convert the <div2> attribute values using a search & replace list
|  |
|  |--Restrict to attribute title="..."
|  |  |  [ ] Include quotes
|  |  |  [ ] Include text
|  |  |  [ ] Match case
|  |  | Max size: 65536
|  |  |
|  |  +--Perl pattern [(\w\w\w)] with [$1]
|  |     |  [ ] Match case
|  |     |  [ ] Whole words only
|  |     |  [ ] Case sensitive replace
|  |     |  [ ] Prompt on replace
|  |     |  [ ] Skip prompt if identical
|  |     |  [ ] First only
|  |     |  [ ] Extract matches
|  |     |  Maximum text buffer size 4096
|  |     |  [ ] Maximum match (greedy)
|  |     |  [ ] Allow comments
|  |     |  [X] '.' matches newline
|  |     |  [ ] UTF-8 Support
|  |     |
|  |     +--Replace list: C:\Program Files\TextPipe\My Filters\book_names.csv Replace
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches
|  |         
|  +--Restrict to attribute n="..."
|     |  [ ] Include quotes
|     |  [ ] Include text
|     |  [ ] Match case
|     | Max size: 65536
|     |
|     +--Perl pattern [(\w\w\w)] with [$1]
|        |  [ ] Match case
|        |  [ ] Whole words only
|        |  [ ] Case sensitive replace
|        |  [ ] Prompt on replace
|        |  [ ] Skip prompt if identical
|        |  [ ] First only
|        |  [ ] Extract matches
|        |  Maximum text buffer size 4096
|        |  [ ] Maximum match (greedy)
|        |  [ ] Allow comments
|        |  [X] '.' matches newline
|        |  [ ] UTF-8 Support
|        |
|        +--Replace list: C:\Program Files\TextPipe\My Filters\book_numbers.csv Replace
|              [X] Match case
|              [X] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches

Further evidence on the software bug in TextPipe 8.0

Posted: Sun Dec 09, 2007 5:00 am
by dfhtextpipe
I have now isolated the problem to the search filter action "Send matching text to subfilter".
98,304 in decimal is 18000 in hexadecimal.
That this is a round number in hex strongly suggests that this is a software bug.

Changing both the external replace list files to imported replace lists made no difference to the bug, nor did changing the search pattern to an equivalent expression.

Posted: Tue Dec 11, 2007 2:26 pm
by DataMystic Support
Hi David - we'll be getting back to you on this shortly

Further information

Posted: Wed Dec 12, 2007 5:15 am
by dfhtextpipe
The bug is not unique to my own computer. Same thing happened to a contact of mine in Sweden, who was using my filter.

Any progress to report on solving this?

Posted: Thu Dec 13, 2007 10:55 pm
by dfhtextpipe
Simon,

Any progress towards solving this? Do you need any further detailed information from me?

Kind regards,
David

Posted: Fri Dec 14, 2007 5:38 am
by DataMystic Support
Stay tuned David

Email received and sent

Posted: Sat Dec 15, 2007 9:59 pm
by dfhtextpipe
Simon,

I found your email dated 11th only last night, diverted (as by ISP spam filter) to the bulk folder.

Responded with attachments today.

David

Further clues about the replace list bug

Posted: Sun Dec 16, 2007 1:13 am
by dfhtextpipe
If you make a simple filter in which a comment has a replace list as a subfilter, then if you export the filter to clipboard, only the first row of the replace list is indented as part of the subfilter. The remaining rows are NOT indented, but are in the same level as the comment.

Code: Select all

|--Comment...
|  |  Miscellaneous punctuation corrections
|  |
|  +--Replace [..] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [!.] with [!]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [.,] with [,]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [,.] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
This could be the real bug. For this example, the comment being the parent, it has no effect. However, if the parent was a restrict filter, the effect would be very significant. The same bug might just as well apply to a replace list from an external file as to an internal replace list. This observation matches the symptoms.

The above example has been like this for a long time

Posted: Mon Dec 17, 2007 6:48 pm
by dfhtextpipe
I just tried something similar with the earlier version of TextPipe that is installed in my computer at work. Version 7.1.7 is a few years old.
The same issue affects a replace list subfilter of (for example) a restrict filter.

Code: Select all

|   
|--Comment...
|   
|--Restrict lines:Line 8 .. line 18
|  |
|  +--Perl pattern [A] with [Z]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [B] with [Y]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [C] with [X]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
|      
Every row in the replace list table should be indented below the restrict filter, rather than as shown above.

Either the filter itself must be significantly in error, or the export to clipboard feature must be in error.
If it turns out to be merely the latter, then please raise as a separate issue.

Posted: Wed Dec 19, 2007 10:33 pm
by DataMystic Support
Hi David,

The key problem seems to be a horrendously inefficient search filter, which then has 86 search attempts made against it.

We can make this way more efficient by only sending characters that we know will match, and Matching Case to prevent false positives.

Code: Select all

[1234ABCDEGHIJLMNOPRSTWZ][abCdehiJKmoprSTuxz][1abcdeghiJklmnoprstuvz]
This also turns out to be very slow.

We'll look into it some more in the morning.

If so, then why did it work so smoothly in v7.9.5 ?

Posted: Thu Dec 20, 2007 1:40 am
by dfhtextpipe
Dear Simon,

Your response does not explain why my filter worked so smoothly with TextPipe Standard v7.9.5 and just stops in version 8.0

Processing a whole [Bible] VPL text file took less than 5 seconds before. If my filter was really that inefficient, why was it so fast?

Or have I misunderstood something you wrote?

NB. I did not understand the above code snippet at all. What was this intended to do ?

Best regards,
David Haslam

Reverted to use version 7.9.5

Posted: Fri Dec 21, 2007 4:11 am
by dfhtextpipe
Dear Simon,

While DataMystic are still working towards a solution, I have re-installed version 7.9.5 in place of version 8.0 - so that I can continue using the filter as before.

Best regards,
David H.

Any progress to report on solving this?

Posted: Fri Jan 04, 2008 11:15 pm
by dfhtextpipe
Hi Simon,

Happy New Year!

Has there been any further progress towards solving the bug ?

Best regards,
David Haslam

Posted: Tue Jan 08, 2008 7:57 pm
by DataMystic Support
It should be done this week.

Posted: Wed Jan 09, 2008 9:31 am
by DataMystic Support
Status: This bug was found to be due to the inefficient use of filters. In particular, the title attribute of every tag (and not just the div2 tag) was being checked for
\w\w\w (ie 3 word characters)
without the whole word option, which led to a lot of backtracking, and this was in turn searched against a replace list of 30 items.

A div2 tag restriction around the whole lot returned performance to previous levels.