Page 1 of 1
T-filters - bug with Sort or Count Duplicate Lines?
Posted: Thu Jun 30, 2011 7:19 am
by dfhtextpipe
I can't manage to get
either the
Sort filter
or the
Count Duplicate Lines filter to work when they are used as a subfilter of a
T-Filter,
while 66 input files are being processed, and when the output from the T-Filter is subsequently output to a single merged file.
- Sort fails to sort.
Count Duplicate Lines fails to count.
Any ideas?
David
Re: T-filters - bug with Sort or Count Duplicate Lines?
Posted: Fri Jul 01, 2011 2:32 pm
by DataMystic Support
Remember the order here David,
What is most likely happening is that inside the T-Filter, you are passing 66 file fragments through a sort or count duplicates, THEN they get merged as the final step.
You need to add a Filters\Special\Merge (join) files as the first step of the T-Filter, followed by the Sort or Count Duplicate Lines.
Re: T-filters - bug with Sort or Count Duplicate Lines?
Posted: Sat Jul 02, 2011 2:26 am
by dfhtextpipe
Simon,
Doing what you suggested didn't work - nothing new got written to the merged output.
The difficulty I have is how to extract the patterns that I wish to count before I do the merge?
And yet to do the counting for the merged output.
I'm trying to count all the old French words that contain the ligatures "œ" and "Œ".
I have an extract matches filter using the Perl pattern
I'm using the Extract option of the Find and Replace filter, rather than the Extract filter.
What I get is the [unsorted or] uncounted list of these words.
I think the real difficulty lies in how to get the merged output back as an input stream for the counting sub-filter.
David
Re: T-filters - bug with Sort or Count Duplicate Lines?
Posted: Sat Jul 02, 2011 6:43 am
by DataMystic Support
Hi David - can you upload your filter (zip it first), or paste an Export here?
Re: T-filters - bug with Sort or Count Duplicate Lines?
Posted: Mon Jul 04, 2011 12:51 am
by dfhtextpipe
Simon,
Rather than exporting the whole filter, which might be confusing, I have just pasted the part that I'm having difficulties with.
NB. Some subfilters are currently disabled, being things I was trying earlier. e.g. The two commented as "Begin" and "End".
The position of the merge subfilter has been changed several times, all without success.
David
Code: Select all
Comment...
| Temporary output to count words with ligatures
|
|
+--T-Filter
|
|--Comment...
| | Extract words containing ligatures
| |
| +--Perl pattern [(\w*)(\x{0152}|\x{0153})(\w*)] with []
| | [X] Match case
| | [ ] Whole words only
| | [ ] Case sensitive replace
| | [ ] Prompt on replace
| | [ ] Skip prompt if identical
| | [ ] First only
| | [X] Extract matches
| | Maximum text buffer size 4096
| | [X] Maximum match (greedy)
| | [ ] Allow comments
| | [ ] '.' matches newline
| | [X] UTF-8 Support
| |
| +--Perl pattern [^(.+)$] with [$1\r\n]
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
| [X] Maximum match (greedy)
| [ ] Allow comments
| [ ] '.' matches newline
| [X] UTF-8 Support
|
|--Comment...
| | Counting or sorting cannot be done after merging!
| |
| |--** DISABLED ** Comment...
| | | Begin
| | |
| | |--Replace list: D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\lig2oe.tab Perl pattern
| | | [X] Match case
| | | [ ] Whole words only
| | | [ ] Case sensitive replace
| | | [ ] Prompt on replace
| | | [ ] Skip prompt if identical
| | | [ ] First only
| | | [ ] Extract matches
| | | Maximum text buffer size 4096
| | | [ ] Maximum match (greedy)
| | | [ ] Allow comments
| | | [ ] '.' matches newline
| | | [X] UTF-8 Support
| | |
| | +--Convert from UTF-8 to ANSI
| |
| |--Comment...
| | | Count or Sort
| | |
| | |--Count duplicate lines
| | | [ ] Ignore case
| | | Start column 1
| | | Length 15
| | | [X] Include One
| | | format: %1:s\t%0:d
| | |
| | +--** DISABLED ** Ascending ANSI sort (case sensitive), remove duplicates, length 16
| |
| +--** DISABLED ** Comment...
| | End
| |
| |--Convert from ANSI to UTF-8
| |
| |--Remove BOM (Byte Order Mark)
| |
| +--Replace list: D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\oe2lig.tab Perl pattern
| [X] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
| [ ] Maximum match (greedy)
| [ ] Allow comments
| [ ] '.' matches newline
| [X] UTF-8 Support
|
|--Comment...
| | Merge to file
| |
| |--** DISABLED ** Add file footer [==================]
| |
| |--Merge output to file D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\ligatures.txt
| |
| +--** DISABLED ** Merge into D:\Download\Java\GoBibleCreator\Captured\French David Martin 1744\ligatures.txt
|
+--** DISABLED ** Comment...
| Debug
|
+--Display debug window
Re: T-filters - bug with Sort or Count Duplicate Lines?
Posted: Mon Jul 04, 2011 12:58 pm
by DataMystic Support
HI David,
After the merge filter you still need a Secondary Output Filter - otherwise the output from the part of the T-Filter goes nowhere.
You either put the merge filter above or below your sort/count filter depending on whether you want the sort/count to be on the individual files before merge or on the resulting large file after merge.