Help for Capture text filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Help for Capture text filter

Post by dfhtextpipe »

In the help for the Capture text filter, there is no explanation for the tick box option Break on value change.
I've tried it, and it's fairly obvious what it means, but nonetheless, the help file is incomplete and/or out of date.

There is also no explanation of what the dropdown option means Reset at start of each region.
This is really not obvious, as I've no idea what the word region might denote in the context of this filter.

This section of the help needs better explaining!

The help states
It is VERY important to understand how TextPipe processes data to use this filter successfully. For most applications, this will involve placing this filter inside a Restrict to each line in turn filter. If you do not, the text captured will most likely be the very last value found in the file. If you have any questions about this, please contact us.
I can't make sense of how TextPipe processes the data, in order to use this filter successfully.
  • When I include the Restrict to each line in turn filter as advised, I don't get any captured values. The variable remains empty!!!
    When I exclude the Restrict to each line in turn filter, I get some values to the variable, but not all of those shown using Break on value change.
The captured subfilter below illustrates what I'm trying to do, but I could not get it working.

Code: Select all

Comment...
|  Output merged VPL file
|  
|  Bug: The captured variable has 
|  388 values instead of 1190 values.
|  
|  e.g. First value is GENESE 43 when it should be GENESE 1
|  Yet, Break on change value works as expected.
|
+--T-Filter
   |
   |--Comment...
   |  |  Capture book name and chapter number
   |  |
   |  +--Perl pattern [^(.+) (\d+)$] with []
   |     |  [ ] Match case
   |     |  [ ] Whole words only
   |     |  [ ] Case sensitive replace
   |     |  [ ] Prompt on replace
   |     |  [ ] Skip prompt if identical
   |     |  [ ] First only
   |     |  [ ] Extract matches
   |     |  Maximum text buffer size 4096
   |     |  [X] Maximum match (greedy)
   |     |  [ ] Allow comments
   |     |  [ ] '.' matches newline
   |     |  [X] UTF-8 Support
   |     |
   |     +--Capture to variable @BookChapter
   |         Reset: 3
   |         
   +--Comment...
      |  Convert verse lines to VPL format
      |
      |--Remove non-matching lines [^(\d+)\.\t(.+)$]
      |     [ ] Include line numbers
      |     [ ] Include filename
      |     [ ] Match case
      |     [ ] Count matches
      |     Pattern type: 0
      |     [X] UTF8 Support
      |     [ ] Ignore empty matches
      |     Context before: 0
      |     Context after: 0
      |   
      |--Perl pattern [\.\t] with [ ]
      |     [X] Match case
      |     [ ] Whole words only
      |     [ ] Case sensitive replace
      |     [ ] Prompt on replace
      |     [ ] Skip prompt if identical
      |     [ ] First only
      |     [ ] Extract matches
      |     Maximum text buffer size 4096
      |     [ ] Maximum match (greedy)
      |     [ ] Allow comments
      |     [ ] '.' matches newline
      |     [X] UTF-8 Support
      |   
      |--Add left margin [@BookChapter\t]
      |   
      +--Merge output to file D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\VPL\Martin1744.vpl.txt
          
The attached ZIP file contains a text file containing all the values captured. Suitably renamed prior to being compressed.

There are only 388 values captured, yet the whole [French] Bible contains 1189 chapters.
For the shorter books, many of the captures are just the very last value found in the file.
For the longer books, more values are captured, but the proportion thus captured is very uneven.
e.g. 4 out of 50 for GENESE, yet 19 out of 40 for EXODE.
Martin1744.CaptureBookChapter.zip
What actually gets captured - too few values! See text file inside the Zip.
(1.38 KiB) Downloaded 627 times
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Help for Capture text filter

Post by DataMystic Support »

Hi David, updated help entries below.

I'm not sure if the Export is giving the complete picture here, but it looks like the filter
Perl pattern [^(.+) (\d+)$] with []
is removing everything before the capture. It may be set to an action where $1 or $2 is sent to the subfilter - not sure.

I'd like to check it anyway.

Can you please email me (or upload) the filter and source file?

Break on value change
When the value of the global variable is changed, TextPipe displays a window allowing you to review (and edit) the value before continuing.
Reset to initial value
Do not reset - never reset the variable to the initial value unless at the start of the job
Reset at start of each file - set the variable to the Initial Value at the start of each file
Reset at start of each region - set the variable to the Initial Value at the start of each fragment of text coming into a subfilter
Reset for both - set the variable to the Initial Value at the start of each file, and at the start of each fragment of text coming into a subfilter
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Help for Capture text filter

Post by dfhtextpipe »

It's not the Perl pattern that causes the problems, because the same pattern works OK in my simpler test filter with normal output.

Code: Select all

TextPipe Single User Edition
Purchased by: David Haslam, David Haslam

Filter Title: C:\Users\David\TextPipe Filters\Custom\French\Special filter to convert Martin 1744 raw text file to VPL format.fll

Filter List
-----------
Filter options
|  [X] Log to file
|  [ ] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Confirm each binary file
|       Sample size 100 characters
|   
|--Comment...
|  |  Special filter to convert Martin 1744 raw text file to VPL format
|  |  (after cleanup & merge)
|  |  
|  |  Test of Capture to variable filter
|  |  but not using a T-filter or secondary output
|  |
|  +--Restrict to each line in turn
|     |
|     |--Comment...
|     |  |  Capture book name and chapter number to a variable
|     |  |
|     |  |--Comment...
|     |  |     Issue analysis
|     |  |     
|     |  |     When I didn't use 
|     |  |     Restrict to each line in turn 
|     |  |     the captured variable @BookChapter has 
|     |  |     147 values instead of 1190 values.
|     |  |     
|     |  |     e.g. First value is GENESE 6 when it should be GENESE 1
|     |  |     Yet, Break on value change works as expected.
|     |  |     
|     |  |     It all works OK when everything is a subfilter of
|     |  |     Restrict to each line in turn 
|     |  |     
|     |  |     For complex filters this is not always possible,
|     |  |     as some of the earlier filters may need 
|     |  |     to process patterns that span more than one line!
|     |  |     
|     |  |     Furthermore, the simplicity of this test filter meant that it was 
|     |  |     much faster than when I was using a T-filter within my complex one.
|     |  |     
|     |  |     Being speedier, when I don't use 
|     |  |     Restrict to each line in turn 
|     |  |     the number of unique values captured is far fewer
|     |  |     than for the slower more complex one. 
|     |  |      147 compared to 388.
|     |  |     
|     |  |     Ergo: All to do with multi-threading and dual-core.
|     |  |     
|     |  |     The total number of captures probably also depends on what other 
|     |  |     applications are open in the background.
|     |  |     
|     |  |     
|     |  |   
|     |  +--Perl pattern [^(.+) (\d+)$] with []
|     |     |  [ ] Match case
|     |     |  [ ] Whole words only
|     |     |  [ ] Case sensitive replace
|     |     |  [ ] Prompt on replace
|     |     |  [ ] Skip prompt if identical
|     |     |  [ ] First only
|     |     |  [ ] Extract matches
|     |     |  Maximum text buffer size 4096
|     |     |  [X] Maximum match (greedy)
|     |     |  [ ] Allow comments
|     |     |  [ ] '.' matches newline
|     |     |  [X] UTF-8 Support
|     |     |
|     |     +--Capture to variable @BookChapter
|     |         Reset: 0
|     |         
|     +--Comment...
|        |  Convert verse lines to VPL format
|        |
|        |--Remove non-matching lines [^(\d+)\.\t(.+)$]
|        |     [ ] Include line numbers
|        |     [ ] Include filename
|        |     [ ] Match case
|        |     [ ] Count matches
|        |     Pattern type: 0
|        |     [X] UTF8 Support
|        |     [ ] Ignore empty matches
|        |     Context before: 0
|        |     Context after: 0
|        |   
|        |--Perl pattern [\.\t] with [ ]
|        |     [X] Match case
|        |     [ ] Whole words only
|        |     [ ] Case sensitive replace
|        |     [ ] Prompt on replace
|        |     [ ] Skip prompt if identical
|        |     [ ] First only
|        |     [ ] Extract matches
|        |     Maximum text buffer size 4096
|        |     [ ] Maximum match (greedy)
|        |     [ ] Allow comments
|        |     [ ] '.' matches newline
|        |     [X] UTF-8 Support
|        |   
|        |--Add left margin [@BookChapter\t]
|        |   
|        +--** DISABLED ** Comment...
|           |  Optional subfilter to remove verse text and duplicate lines
|           |
|           |--Remove fields:Tab-delimited field 2 .. field END - 0
|           |     Delimiter Type: 1
|           |     Custom delimiter: 
|           |     [ ] Has Header
|           |   
|           +--Remove duplicate lines
|                 [ ] Ignore case
|                 Start column 1
|                 Length 100
|                 [X] Include One
|                 format: %1:s\t%0:d
|               
+--Output to file(s)
      [ ] Only update date on changed files
      [ ] Append mode
      [X] Change extension to: .fr.vpl.txt
      [ ] Open output file
    Only output modified files      [ ] Remove empty output files    

Files List
----------
D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\VPL\Martin1744.raw.txt
Even though the exported clipboard view of the filter has the line

Code: Select all

Perl pattern [^(.+) (\d+)$] with []
this is in fact a Send matching text to subfilter.
The subfilter being Capture to variable @BookChapter.

David
David
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Help for Capture text filter

Post by dfhtextpipe »

My complex filter calls some external replace list files, and the 66 input files were originally from a text file, so that the right book order was used.
It processes the 66 books of the Martin1744 French Bible, each file having been previously saved in Unicode format from Wordpad, hence UTF-16 LE.

cf. When I pasted the whole filter from the clipboard to Notepad++ and counted the occurences of ":\" there are 84 hits.
Subtract 66 inputs, you get 18, some of which are paths to [secondary] output files.

Setting it up to run on your system would be too messy, as you'd need to mimic the directory structure that I am using.
More of a last resort, than the next thing to try.

David
Image view of My Filter List (for the more complex filter).
Image view of My Filter List (for the more complex filter).
My Filter List.png (25.09 KiB) Viewed 6635 times
David
dfhtextpipe
Posts: 988
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Help for Capture text filter

Post by dfhtextpipe »

Observe how I usually make my complex filters well structured into groups of subfilters, each with a descriptive comment.
Sensible way of working, eh?
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Help for Capture text filter

Post by DataMystic Support »

Very neat David! I know that David Johnson, Brent Huesers and I all 'TextPipe' in the same way. If you know of any other TextPipe gurus please let me know!

If you could trim back your complex filter to just the section of interest (by deleting subsequent filters, and generating intermediate files for the preceding filters) then I would love to be able to explain how this works to you!
Post Reply