Suggestion for a new restrict filter type.
If certain variables are being captured from text to a global variable, and supposing the variable changes value several times while an input file is being processed, would it be feasible to implement a restrict filter based on the current value of such a variable?
The available alternative method involves restrict to pattern, for which one needs to have a prior estimate of the number of characters in the section of the input file where the restriction should apply. This is an awkward requirement, and one which when overlooked, can all too easily lead to errors of omission.
David
Restrict while variable=value ?
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Restrict while variable=value ?
Hi David,
The problem here is that depending on where the variable is being tested in the filter tree, it may only have one value - that of the final occurrence in the file.
To prevent this, the best option is to use a subfilter of the pattern match expression that captures the value into a global, and using a script filter, test the various global values and perhaps set a new global value accordingly which can then be output.
The problem here is that depending on where the variable is being tested in the filter tree, it may only have one value - that of the final occurrence in the file.
To prevent this, the best option is to use a subfilter of the pattern match expression that captures the value into a global, and using a script filter, test the various global values and perhaps set a new global value accordingly which can then be output.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Restrict while variable=value ?
OK lets give a practical example.
A whole Bible file in which book identifications are (USFM) tagged with
\id GEN
...
\id REV
Unless we know in advance how many characters there are in each book,
we can't set the Maximum match size parameter in the restrict filter Perl pattern.
e.g. Restrict to Perl pattern \\id PSA(.+)\\PRO would cause sub-filters to ignore matches outside the book of Psalms.
I suppose we could overestimate, and set it to 250000, but that might be a bit silly!
However, if we capture the values of the \id tags to a global variable, we should get 66 different values as we process the file, each line in turn.
Each \id <CODE> value can be captured to a variable called Book.
Then restrict while @Book=PSA would be the sort of method I'm looking for.
Does this make the requirement clearer?
A whole Bible file in which book identifications are (USFM) tagged with
\id GEN
...
\id REV
Unless we know in advance how many characters there are in each book,
we can't set the Maximum match size parameter in the restrict filter Perl pattern.
e.g. Restrict to Perl pattern \\id PSA(.+)\\PRO would cause sub-filters to ignore matches outside the book of Psalms.
I suppose we could overestimate, and set it to 250000, but that might be a bit silly!
However, if we capture the values of the \id tags to a global variable, we should get 66 different values as we process the file, each line in turn.
Each \id <CODE> value can be captured to a variable called Book.
Then restrict while @Book=PSA would be the sort of method I'm looking for.
Does this make the requirement clearer?
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Restrict while variable=value ?
I understand the problem, a better approach might me to split files on the \id BOOKNAME, and give each new filename the same name as the book.
Then you can use the filename restriction.
And as the last step, join all the files back again.
Using this approach, no extra files should appear on disk - they are split and joined virtually.
Then you can use the filename restriction.
And as the last step, join all the files back again.
Using this approach, no extra files should appear on disk - they are split and joined virtually.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Restrict while variable=value ?
A worked example would be helpful.
Assigning file names to split files (other than numbering them) is something I found especially difficult.
To meet the restrict to filename requirement, the file names would need to include the value of a captured variable.
To ensure the split files are merged in the correct order, the file names would also need to be numbered.
David
Assigning file names to split files (other than numbering them) is something I found especially difficult.
To meet the restrict to filename requirement, the file names would need to include the value of a captured variable.
To ensure the split files are merged in the correct order, the file names would also need to be numbered.
David
David