Split at Pattern - After this many
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Split at Pattern - After this many
The behavior of the After this many control for Split at Pattern was not quite how I understood it.
The filter I am designing is to convert a single whole Bible text file into 66 separate books.
When I changed the value of After this many from 1 to 2, instead of obtaining 67 output files, I got only 34
What I was looking for was a means to ignore the first instance of the pattern only, but to split at all the other occurrences.
This would avoid outputting an empty file before the first pattern that indicates the start of the book of Genesis.
It was a simple task to create earlier sub-filters to generate a suitable pattern for the start of each book.
I am using Split position = Before pattern. I do not wish to remove the pattern for Genesis.
I therefore suggest that the Split at Pattern filter be further enhanced to provide an additional control value Ignore this many.
Please review and advise.
The filter I am designing is to convert a single whole Bible text file into 66 separate books.
When I changed the value of After this many from 1 to 2, instead of obtaining 67 output files, I got only 34
What I was looking for was a means to ignore the first instance of the pattern only, but to split at all the other occurrences.
This would avoid outputting an empty file before the first pattern that indicates the start of the book of Genesis.
It was a simple task to create earlier sub-filters to generate a suitable pattern for the start of each book.
I am using Split position = Before pattern. I do not wish to remove the pattern for Genesis.
I therefore suggest that the Split at Pattern filter be further enhanced to provide an additional control value Ignore this many.
Please review and advise.
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Split at Pattern - After this many
The help for this feature is somewhat ambiguous. It states,
Observations:If the pattern specifies a single character only, such as a form feed (\f), you can split after this many occurrences of the pattern have been found in one file. The maximum is 2147483647.
- The phrase "a single character only" seems to be superfluous. The feature works for any pattern, not just a single character.
The phrase "found in one file" is ambiguous. The actual behavior is this many occurrences must be found for each split.
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Split at Pattern - After this many
Thanks David,
We've altered the text and added an example for clarity:
if 'After this many' is set to 2, splits will occur after the 2, 4, 6, 8, 10.. matches.
Ignore this many - we'd prefer not to add this at this stage. Can you use a search/replace with 'Match first only' to prevent the initial file being created?
We've altered the text and added an example for clarity:
if 'After this many' is set to 2, splits will occur after the 2, 4, 6, 8, 10.. matches.
Ignore this many - we'd prefer not to add this at this stage. Can you use a search/replace with 'Match first only' to prevent the initial file being created?
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Split at Pattern - After this many
Simon,
You asked, "Can you use a search/replace with 'Match first only' to prevent the initial file being created?".
I don't think this would have any effect on a split filter.
I tried using a 'Restrict to lines 2 to END-0" and making the split filter a sub-filter, but it made no difference.
Maybe I am misunderstanding something. Please enlighten me.
David
You asked, "Can you use a search/replace with 'Match first only' to prevent the initial file being created?".
I don't think this would have any effect on a split filter.
I tried using a 'Restrict to lines 2 to END-0" and making the split filter a sub-filter, but it made no difference.
Maybe I am misunderstanding something. Please enlighten me.
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Split at Pattern - After this many
You're splitting on a pattern, right?
So why not use a search/replace prior to the split filter to remove the first occurrence of that pattern?
So why not use a search/replace prior to the split filter to remove the first occurrence of that pattern?
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Split at Pattern - After this many
Because the split filter is before the pattern.
All the split files must contain the pattern.
If I remove the first pattern, then the first required split file would not contain the pattern.
I don't want a trivial zero length file creating just because the first pattern is on the first line of the file being processed.
Any better suggestions?
Code: Select all
Split before pattern ^\\id (\w+)$, filename %2.2d_%f.usfm, number 0 count 1
If I remove the first pattern, then the first required split file would not contain the pattern.
I don't want a trivial zero length file creating just because the first pattern is on the first line of the file being processed.
Any better suggestions?
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Split at Pattern - After this many
I have solved the problem as follows:
The split filter is now simply
The split filter is now simply
Code: Select all
Comment...
| Notes on Split:
|
| 2010-02-25
| Added a formfeed (\f) before each \id (on the same line)
| Removed the first formfeed (\f) using restrict to line range 1 with a subfilter
| Changed the split pattern to \f and "At pattern"
| Changed the split counter to start at 1
|
| This solved the following problem:
| A zero length file was created with filename 00_*.usfm
|
| Possible further enhancement:
| How to include the book ID in the split filename?
|
|--Restrict lines:Line 1 .. line 1
| |
| +--Perl pattern [^\f] with []
| [ ] Match case
| [ ] Whole words only
| [ ] Case sensitive replace
| [ ] Prompt on replace
| [ ] Skip prompt if identical
| [ ] First only
| [ ] Extract matches
| Maximum text buffer size 4096
| [ ] Maximum match (greedy)
| [ ] Allow comments
| [ ] '.' matches newline
| [X] UTF-8 Support
|
+--Split at pattern ^\f, filename %2.2d_%f.usfm, number 1 count 1
David