Page 1 of 1

Restrict while variable=value ?

Posted: Sat Mar 09, 2013 8:01 pm
by dfhtextpipe
Suggestion for a new restrict filter type.

If certain variables are being captured from text to a global variable, and supposing the variable changes value several times while an input file is being processed, would it be feasible to implement a restrict filter based on the current value of such a variable?

The available alternative method involves restrict to pattern, for which one needs to have a prior estimate of the number of characters in the section of the input file where the restriction should apply. This is an awkward requirement, and one which when overlooked, can all too easily lead to errors of omission.

David

Re: Restrict while variable=value ?

Posted: Tue Mar 12, 2013 11:08 am
by DataMystic Support
Hi David,

The problem here is that depending on where the variable is being tested in the filter tree, it may only have one value - that of the final occurrence in the file.

To prevent this, the best option is to use a subfilter of the pattern match expression that captures the value into a global, and using a script filter, test the various global values and perhaps set a new global value accordingly which can then be output.

Re: Restrict while variable=value ?

Posted: Wed Mar 13, 2013 10:19 pm
by dfhtextpipe
OK lets give a practical example.

A whole Bible file in which book identifications are (USFM) tagged with

\id GEN
...
\id REV

Unless we know in advance how many characters there are in each book,
we can't set the Maximum match size parameter in the restrict filter Perl pattern.
e.g. Restrict to Perl pattern \\id PSA(.+)\\PRO would cause sub-filters to ignore matches outside the book of Psalms.
I suppose we could overestimate, and set it to 250000, but that might be a bit silly!

However, if we capture the values of the \id tags to a global variable, we should get 66 different values as we process the file, each line in turn.
Each \id <CODE> value can be captured to a variable called Book.

Then restrict while @Book=PSA would be the sort of method I'm looking for.

Does this make the requirement clearer?

Re: Restrict while variable=value ?

Posted: Thu Mar 14, 2013 6:19 am
by DataMystic Support
I understand the problem, a better approach might me to split files on the \id BOOKNAME, and give each new filename the same name as the book.

Then you can use the filename restriction.

And as the last step, join all the files back again.

Using this approach, no extra files should appear on disk - they are split and joined virtually.

Re: Restrict while variable=value ?

Posted: Fri Mar 15, 2013 7:32 pm
by dfhtextpipe
A worked example would be helpful.

Assigning file names to split files (other than numbering them) is something I found especially difficult.

To meet the restrict to filename requirement, the file names would need to include the value of a captured variable.
To ensure the split files are merged in the correct order, the file names would also need to be numbered.


David