I have a list for example
1 Corinthians
2 Corinthians
1 Timothy
1 John
2 John
1 Kings
I want to turn it into
1Corinthians
2Corinthians
1Timothy
1John
2John
1Kings
The Regex filter from (RegexBuddy) Perl Flavour I tried to use was
Find Pattern (Perl Style)
([123])\ ([SKCTPJ][aihoe]\w*\ ?\d{0,3}:?)|([12])\ (Thessalonians|THESSALONIANS ?\d{0,3})
Replace with:
\1\2\3\4
It gave me an error although it worked in RegexBuddy and EditPad Pro
It only worked after I read this forum and discovered that I had to use
Replace with:
$1$$2$$3$$4$
What I'm puzzled about is that I couldn't find an example or documentation (Extract below) that indicates I had to use $1$$2$$3$$4$
What am I missing ?
Does anyone in this forum also uses RegexBuddy with TextPipe and able to provide any insights as to why it (\1\2\3\4) worked in RegexBuddy but not with Textpipe ?
Thanks for your help.
Extract from help file:
After \0 up to two further octal digits are read. In both cases, if there are fewer than two digits, just those that are present are used. Thus the sequence \0\x\07 specifies two binary zeros followed by a BEL character (code value 7). Make sure you supply two digits after the initial zero if the character that follows is itself an octal digit.
The handling of a backslash followed by a digit other than 0 is complicated. Outside a character class, PCRE reads it and any following digits as a decimal number. If the number is less than 10, or if there have been at least that many previous capturing left parentheses in the expression, the entire sequence is taken as a back reference. A description of how this works is given later, following the discussion of parenthesized subpatterns.
Inside a character class, or if the decimal number is greater than 9 and there have not been that many capturing subpatterns, PCRE re-reads up to three octal digits following the backslash, and generates a single byte from the least significant 8 bits of the value. Any subsequent digits stand for themselves. For example:
\040 is another way of writing a space
\40 is the same, provided there are fewer than 40
previous capturing subpatterns
\7 is always a back reference
\11 might be a back reference, or another way of
writing a tab
\011 is always a tab
\0113 is a tab followed by the character "3"
\113 might be a back reference, otherwise the
character with octal code 113
\377 might be a back reference, otherwise
the byte consisting entirely of 1 bits
\81 is either a back reference, or a binary zero
followed by the two characters "8" and "1"
Note that octal values of 100 or greater must not be introduced by a leading zero, because no more than three octal digits are ever read.
All the sequences that define a single byte value or a single UTF-8 character (in UTF-8 mode) can be used both inside and outside character classes. In addition, inside a character class, the sequence \b is interpreted as the backspace character (hex 08). Outside a character class it has a different meaning (see below).
Back References
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 991
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Back References
I use TextPipe regularly in connection with work on Biblical texts. I have 9 years experience in this field.
You didn't indicate whether the English Bible book names you gave as examples are part of free text or part of a structured document.
If they are in a structured document, it would be much simpler to use a restrict filter to govern the replacements.
The actual replacement then becomes much simpler.
So the more important question is what kind of structure does your input file have?
Best regards,
David Haslam
An active volunteer for the CrossWire Bible Society
PS. I don't use RegexBuddy.
My two favourite Unicode text editors are Notepad++ and BabelPad.
On rare occasions I have used EditPad Lite for file format conversions.
btw. When quoting from something such as the TextPipe Help file, it's sensible to use the Quote feature of phpBB.
You didn't indicate whether the English Bible book names you gave as examples are part of free text or part of a structured document.
If they are in a structured document, it would be much simpler to use a restrict filter to govern the replacements.
The actual replacement then becomes much simpler.
Code: Select all
Perl pattern [(\d) (\w+)] with [$1$$2]
[X] Match case
[X] Whole words only
[ ] Case sensitive replace
[ ] Prompt on replace
[ ] Skip prompt if identical
[ ] First only
[ ] Extract matches Maximum text buffer size 4096
[X] Maximum match (greedy)
[ ] Allow comments
[ ] '.' matches newline
[X] UTF-8 Support
[ ] Process longest strings first
[ ] Simultaneous search
Best regards,
David Haslam
An active volunteer for the CrossWire Bible Society
PS. I don't use RegexBuddy.
My two favourite Unicode text editors are Notepad++ and BabelPad.
On rare occasions I have used EditPad Lite for file format conversions.
btw. When quoting from something such as the TextPipe Help file, it's sensible to use the Quote feature of phpBB.
David
- DataMystic Support
- Site Admin
- Posts: 2228
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Back References
Back references are used inside the search pattern, not inside the replace pattern.
Different tools use different ways of encoding this, some use %, some $, some \, some @, some [ etc. There is no standard.
We use \ for regex escape sequences (\r\n\t etc), $ for captured variables ($1, $2 etc, or $1$ when it is hard-up against the next variable as in $1$$2$), @ for macros (e.g. @fullInputFilename) or named captured variables (@phonenumber), and % for environment variables (e.g. %PATH).
The help file is quite clear on this!
Different tools use different ways of encoding this, some use %, some $, some \, some @, some [ etc. There is no standard.
We use \ for regex escape sequences (\r\n\t etc), $ for captured variables ($1, $2 etc, or $1$ when it is hard-up against the next variable as in $1$$2$), @ for macros (e.g. @fullInputFilename) or named captured variables (@phonenumber), and % for environment variables (e.g. %PATH).
The help file is quite clear on this!