TextPipe 8.0 stops during a previously well-behaved filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

TextPipe 8.0 stops during a previously well-behaved filter

Post by dfhtextpipe »

One of my filters I've been using successfully without a hitch since I bought TextPipe Standard 7.9.5 in August has suddenly stopped working after I upgraded to version 8.0 and installed it in accordance with instructions.

The filter just stops at a certain number (98,304) of bytes, and I have to Cancel.

Since emailing Simon, I have isolated the problem to be somewhere within the following lines:

Code: Select all

|--Comment...
|  |  Convert the <div2> attribute values using a search & replace list
|  |
|  |--Restrict to attribute title="..."
|  |  |  [ ] Include quotes
|  |  |  [ ] Include text
|  |  |  [ ] Match case
|  |  | Max size: 65536
|  |  |
|  |  +--Perl pattern [(\w\w\w)] with [$1]
|  |     |  [ ] Match case
|  |     |  [ ] Whole words only
|  |     |  [ ] Case sensitive replace
|  |     |  [ ] Prompt on replace
|  |     |  [ ] Skip prompt if identical
|  |     |  [ ] First only
|  |     |  [ ] Extract matches
|  |     |  Maximum text buffer size 4096
|  |     |  [ ] Maximum match (greedy)
|  |     |  [ ] Allow comments
|  |     |  [X] '.' matches newline
|  |     |  [ ] UTF-8 Support
|  |     |
|  |     +--Replace list: C:\Program Files\TextPipe\My Filters\book_names.csv Replace
|  |           [X] Match case
|  |           [X] Whole words only
|  |           [ ] Case sensitive replace
|  |           [ ] Prompt on replace
|  |           [ ] Skip prompt if identical
|  |           [ ] First only
|  |           [ ] Extract matches
|  |         
|  +--Restrict to attribute n="..."
|     |  [ ] Include quotes
|     |  [ ] Include text
|     |  [ ] Match case
|     | Max size: 65536
|     |
|     +--Perl pattern [(\w\w\w)] with [$1]
|        |  [ ] Match case
|        |  [ ] Whole words only
|        |  [ ] Case sensitive replace
|        |  [ ] Prompt on replace
|        |  [ ] Skip prompt if identical
|        |  [ ] First only
|        |  [ ] Extract matches
|        |  Maximum text buffer size 4096
|        |  [ ] Maximum match (greedy)
|        |  [ ] Allow comments
|        |  [X] '.' matches newline
|        |  [ ] UTF-8 Support
|        |
|        +--Replace list: C:\Program Files\TextPipe\My Filters\book_numbers.csv Replace
|              [X] Match case
|              [X] Whole words only
|              [ ] Case sensitive replace
|              [ ] Prompt on replace
|              [ ] Skip prompt if identical
|              [ ] First only
|              [ ] Extract matches
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further evidence on the software bug in TextPipe 8.0

Post by dfhtextpipe »

I have now isolated the problem to the search filter action "Send matching text to subfilter".
98,304 in decimal is 18000 in hexadecimal.
That this is a round number in hex strongly suggests that this is a software bug.

Changing both the external replace list files to imported replace lists made no difference to the bug, nor did changing the search pattern to an equivalent expression.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi David - we'll be getting back to you on this shortly
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further information

Post by dfhtextpipe »

The bug is not unique to my own computer. Same thing happened to a contact of mine in Sweden, who was using my filter.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Any progress to report on solving this?

Post by dfhtextpipe »

Simon,

Any progress towards solving this? Do you need any further detailed information from me?

Kind regards,
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Stay tuned David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Email received and sent

Post by dfhtextpipe »

Simon,

I found your email dated 11th only last night, diverted (as by ISP spam filter) to the bulk folder.

Responded with attachments today.

David
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Further clues about the replace list bug

Post by dfhtextpipe »

If you make a simple filter in which a comment has a replace list as a subfilter, then if you export the filter to clipboard, only the first row of the replace list is indented as part of the subfilter. The remaining rows are NOT indented, but are in the same level as the comment.

Code: Select all

|--Comment...
|  |  Miscellaneous punctuation corrections
|  |
|  +--Replace [..] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [!.] with [!]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [.,] with [,]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
Replace [,.] with [.]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
This could be the real bug. For this example, the comment being the parent, it has no effect. However, if the parent was a restrict filter, the effect would be very significant. The same bug might just as well apply to a replace list from an external file as to an internal replace list. This observation matches the symptoms.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

The above example has been like this for a long time

Post by dfhtextpipe »

I just tried something similar with the earlier version of TextPipe that is installed in my computer at work. Version 7.1.7 is a few years old.
The same issue affects a replace list subfilter of (for example) a restrict filter.

Code: Select all

|   
|--Comment...
|   
|--Restrict lines:Line 8 .. line 18
|  |
|  +--Perl pattern [A] with [Z]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [B] with [Y]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
Perl pattern [C] with [X]
|        [ ] Match case
|        [ ] Whole words only
|        [ ] Case sensitive replace
|        [ ] Prompt on replace
|        [ ] Skip prompt if identical
|        [ ] First only
|        [ ] Extract matches
|        Maximum text buffer size 4096
|        [X] Maximum match (greedy)
|        [ ] Allow comments
|        [ ] '.' matches newline
|        [X] UTF-8 Support
|      
Every row in the replace list table should be indented below the restrict filter, rather than as shown above.

Either the filter itself must be significantly in error, or the export to clipboard feature must be in error.
If it turns out to be merely the latter, then please raise as a separate issue.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi David,

The key problem seems to be a horrendously inefficient search filter, which then has 86 search attempts made against it.

We can make this way more efficient by only sending characters that we know will match, and Matching Case to prevent false positives.

Code: Select all

[1234ABCDEGHIJLMNOPRSTWZ][abCdehiJKmoprSTuxz][1abcdeghiJklmnoprstuvz]
This also turns out to be very slow.

We'll look into it some more in the morning.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

If so, then why did it work so smoothly in v7.9.5 ?

Post by dfhtextpipe »

Dear Simon,

Your response does not explain why my filter worked so smoothly with TextPipe Standard v7.9.5 and just stops in version 8.0

Processing a whole [Bible] VPL text file took less than 5 seconds before. If my filter was really that inefficient, why was it so fast?

Or have I misunderstood something you wrote?

NB. I did not understand the above code snippet at all. What was this intended to do ?

Best regards,
David Haslam
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Reverted to use version 7.9.5

Post by dfhtextpipe »

Dear Simon,

While DataMystic are still working towards a solution, I have re-installed version 7.9.5 in place of version 8.0 - so that I can continue using the filter as before.

Best regards,
David H.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Any progress to report on solving this?

Post by dfhtextpipe »

Hi Simon,

Happy New Year!

Has there been any further progress towards solving the bug ?

Best regards,
David Haslam
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

It should be done this week.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Status: This bug was found to be due to the inefficient use of filters. In particular, the title attribute of every tag (and not just the div2 tag) was being checked for
\w\w\w (ie 3 word characters)
without the whole word option, which led to a lot of backtracking, and this was in turn searched against a replace list of 30 items.

A div2 tag restriction around the whole lot returned performance to previous levels.
Post Reply