Restrict while variable=value ?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Restrict while variable=value ?

Post by dfhtextpipe »

Suggestion for a new restrict filter type.

If certain variables are being captured from text to a global variable, and supposing the variable changes value several times while an input file is being processed, would it be feasible to implement a restrict filter based on the current value of such a variable?

The available alternative method involves restrict to pattern, for which one needs to have a prior estimate of the number of characters in the section of the input file where the restriction should apply. This is an awkward requirement, and one which when overlooked, can all too easily lead to errors of omission.

David
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Restrict while variable=value ?

Post by DataMystic Support »

Hi David,

The problem here is that depending on where the variable is being tested in the filter tree, it may only have one value - that of the final occurrence in the file.

To prevent this, the best option is to use a subfilter of the pattern match expression that captures the value into a global, and using a script filter, test the various global values and perhaps set a new global value accordingly which can then be output.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict while variable=value ?

Post by dfhtextpipe »

OK lets give a practical example.

A whole Bible file in which book identifications are (USFM) tagged with

\id GEN
...
\id REV

Unless we know in advance how many characters there are in each book,
we can't set the Maximum match size parameter in the restrict filter Perl pattern.
e.g. Restrict to Perl pattern \\id PSA(.+)\\PRO would cause sub-filters to ignore matches outside the book of Psalms.
I suppose we could overestimate, and set it to 250000, but that might be a bit silly!

However, if we capture the values of the \id tags to a global variable, we should get 66 different values as we process the file, each line in turn.
Each \id <CODE> value can be captured to a variable called Book.

Then restrict while @Book=PSA would be the sort of method I'm looking for.

Does this make the requirement clearer?
David
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Restrict while variable=value ?

Post by DataMystic Support »

I understand the problem, a better approach might me to split files on the \id BOOKNAME, and give each new filename the same name as the book.

Then you can use the filename restriction.

And as the last step, join all the files back again.

Using this approach, no extra files should appear on disk - they are split and joined virtually.
dfhtextpipe
Posts: 986
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Restrict while variable=value ?

Post by dfhtextpipe »

A worked example would be helpful.

Assigning file names to split files (other than numbering them) is something I found especially difficult.

To meet the restrict to filename requirement, the file names would need to include the value of a captured variable.
To ensure the split files are merged in the correct order, the file names would also need to be numbered.


David
David
Post Reply