T-filters - bug with Sort or Count Duplicate Lines?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

T-filters - bug with Sort or Count Duplicate Lines?

Post by dfhtextpipe »

I can't manage to get either the Sort filter or the Count Duplicate Lines filter to work when they are used as a subfilter of a T-Filter,
while 66 input files are being processed, and when the output from the T-Filter is subsequently output to a single merged file.
  • Sort fails to sort.
    Count Duplicate Lines fails to count.
Any ideas?

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: T-filters - bug with Sort or Count Duplicate Lines?

Post by DataMystic Support »

Remember the order here David,

What is most likely happening is that inside the T-Filter, you are passing 66 file fragments through a sort or count duplicates, THEN they get merged as the final step.

You need to add a Filters\Special\Merge (join) files as the first step of the T-Filter, followed by the Sort or Count Duplicate Lines.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: T-filters - bug with Sort or Count Duplicate Lines?

Post by dfhtextpipe »

Simon,

Doing what you suggested didn't work - nothing new got written to the merged output.

The difficulty I have is how to extract the patterns that I wish to count before I do the merge?
And yet to do the counting for the merged output.

I'm trying to count all the old French words that contain the ligatures "œ" and "Œ".
I have an extract matches filter using the Perl pattern

Code: Select all

(\w*)(\x{0152}|\x{0153})(\w*)
I'm using the Extract option of the Find and Replace filter, rather than the Extract filter.

What I get is the [unsorted or] uncounted list of these words.

I think the real difficulty lies in how to get the merged output back as an input stream for the counting sub-filter.

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: T-filters - bug with Sort or Count Duplicate Lines?

Post by DataMystic Support »

Hi David - can you upload your filter (zip it first), or paste an Export here?
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: T-filters - bug with Sort or Count Duplicate Lines?

Post by dfhtextpipe »

Simon,

Rather than exporting the whole filter, which might be confusing, I have just pasted the part that I'm having difficulties with.
NB. Some subfilters are currently disabled, being things I was trying earlier. e.g. The two commented as "Begin" and "End".

The position of the merge subfilter has been changed several times, all without success.

David

Code: Select all

Comment...
|  Temporary output to count words with ligatures
|  
|
+--T-Filter
   |
   |--Comment...
   |  |  Extract words containing ligatures
   |  |
   |  +--Perl pattern [(\w*)(\x{0152}|\x{0153})(\w*)] with []
   |     |  [X] Match case
   |     |  [ ] Whole words only
   |     |  [ ] Case sensitive replace
   |     |  [ ] Prompt on replace
   |     |  [ ] Skip prompt if identical
   |     |  [ ] First only
   |     |  [X] Extract matches
   |     |  Maximum text buffer size 4096
   |     |  [X] Maximum match (greedy)
   |     |  [ ] Allow comments
   |     |  [ ] '.' matches newline
   |     |  [X] UTF-8 Support
   |     |
   |     +--Perl pattern [^(.+)$] with [$1\r\n]
   |           [ ] Match case
   |           [ ] Whole words only
   |           [ ] Case sensitive replace
   |           [ ] Prompt on replace
   |           [ ] Skip prompt if identical
   |           [ ] First only
   |           [ ] Extract matches
   |           Maximum text buffer size 4096
   |           [X] Maximum match (greedy)
   |           [ ] Allow comments
   |           [ ] '.' matches newline
   |           [X] UTF-8 Support
   |         
   |--Comment...
   |  |  Counting or sorting cannot be done after merging!
   |  |
   |  |--** DISABLED ** Comment...
   |  |  |  Begin
   |  |  |
   |  |  |--Replace list: D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\lig2oe.tab Perl pattern
   |  |  |     [X] Match case
   |  |  |     [ ] Whole words only
   |  |  |     [ ] Case sensitive replace
   |  |  |     [ ] Prompt on replace
   |  |  |     [ ] Skip prompt if identical
   |  |  |     [ ] First only
   |  |  |     [ ] Extract matches
   |  |  |     Maximum text buffer size 4096
   |  |  |     [ ] Maximum match (greedy)
   |  |  |     [ ] Allow comments
   |  |  |     [ ] '.' matches newline
   |  |  |     [X] UTF-8 Support
   |  |  |   
   |  |  +--Convert from UTF-8 to ANSI
   |  |      
   |  |--Comment...
   |  |  |  Count or Sort
   |  |  |
   |  |  |--Count duplicate lines
   |  |  |     [ ] Ignore case
   |  |  |     Start column 1
   |  |  |     Length 15
   |  |  |     [X] Include One
   |  |  |     format: %1:s\t%0:d
   |  |  |   
   |  |  +--** DISABLED ** Ascending ANSI sort (case sensitive), remove duplicates, length 16
   |  |      
   |  +--** DISABLED ** Comment...
   |     |  End
   |     |
   |     |--Convert from ANSI to UTF-8
   |     |   
   |     |--Remove BOM (Byte Order Mark)
   |     |   
   |     +--Replace list: D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\oe2lig.tab Perl pattern
   |           [X] Match case
   |           [ ] Whole words only
   |           [ ] Case sensitive replace
   |           [ ] Prompt on replace
   |           [ ] Skip prompt if identical
   |           [ ] First only
   |           [ ] Extract matches
   |           Maximum text buffer size 4096
   |           [ ] Maximum match (greedy)
   |           [ ] Allow comments
   |           [ ] '.' matches newline
   |           [X] UTF-8 Support
   |         
   |--Comment...
   |  |  Merge to file
   |  |
   |  |--** DISABLED ** Add file footer [==================]
   |  |   
   |  |--Merge output to file D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\ligatures.txt
   |  |   
   |  +--** DISABLED ** Merge into D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\ligatures.txt
   |      
   +--** DISABLED ** Comment...
      |  Debug
      |
      +--Display debug window
          
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: T-filters - bug with Sort or Count Duplicate Lines?

Post by DataMystic Support »

HI David,

After the merge filter you still need a Secondary Output Filter - otherwise the output from the part of the T-Filter goes nowhere.

You either put the merge filter above or below your sort/count filter depending on whether you want the sort/count to be on the individual files before merge or on the resulting large file after merge.
Post Reply