Search/replace option Simultaneous search bug?
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Search/replace option Simultaneous search bug?
If the search type is Perl, and
if the search pattern contains \x{hhh..} character with hex code hhh... (UTF-8 mode only)
then if you tick the option Simultaneous search,
you get a popup dialog with this kind of error message.
"Error on line #: character value in \x{...} sequence is too large." I see no fundamental reason why this should be invalid, so I guess this must be a software bug.
Best regards,
David
if the search pattern contains \x{hhh..} character with hex code hhh... (UTF-8 mode only)
then if you tick the option Simultaneous search,
you get a popup dialog with this kind of error message.
"Error on line #: character value in \x{...} sequence is too large." I see no fundamental reason why this should be invalid, so I guess this must be a software bug.
Best regards,
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Search/replace option Simultaneous search bug?
Hi David,
Here is what I found, searching within PCRE results:
Note that despite its name, in utf8 mode \x{...} does not match utf8 sequences but rather "real" unicode codepoints.
Here is what I found, searching within PCRE results:
Note that despite its name, in utf8 mode \x{...} does not match utf8 sequences but rather "real" unicode codepoints.
Code: Select all
# cyrillic letter "ю is the codepoint 44E
# its utf8 representation is D1 8E
$u = "Hello \xd1\x8e"; // this string is in utf8
echo $u, "<br>";
echo preg_replace('~\x{44e}~u', '*', $u); // preg matches codepoint
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Search/replace option Simultaneous search bug?
Are you specifying a code point, or a hex value?
What value are you specifying? It looks like a bug in the PCRE regex engine, but I can't confirm without this.
What value are you specifying? It looks like a bug in the PCRE regex engine, but I can't confirm without this.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Search/replace option Simultaneous search bug?
Here's the sub-filter that fails when I tick Simultaneous search:
Here's what it looks like when that option is not ticked:
It would seem that the tick box option has no counterpart in the visual representation of the replace list filter.
This in itself is also a cause for concern.
Code: Select all
Perl pattern [\x{05BE}] with [#]
[X] Match case
[ ] Whole words only
[ ] Case sensitive replace
[ ] Prompt on replace
[ ] Skip prompt if identical
[ ] First only
[ ] Extract matches
Maximum text buffer size 4096
[ ] Maximum match (greedy)
[ ] Allow comments
[ ] '.' matches newline
[X] UTF-8 Support
Further search/replace list phrases (CSV format):
\x{05C0},#
\x{05C3},#
\x{05C6},#
Code: Select all
Perl pattern [\x{05BE}] with [#]
[X] Match case
[ ] Whole words only
[ ] Case sensitive replace
[ ] Prompt on replace
[ ] Skip prompt if identical
[ ] First only
[ ] Extract matches
Maximum text buffer size 4096
[ ] Maximum match (greedy)
[ ] Allow comments
[ ] '.' matches newline
[X] UTF-8 Support
Further search/replace list phrases (CSV format):
\x{05C0},#
\x{05C3},#
\x{05C6},#
This in itself is also a cause for concern.
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Search/replace option Simultaneous search bug?
Hi David,
The Display text now outputs 'simultaneous search' and 'Process longest strings first' options for the next release.
According to the PCRE spec:
\x{hhh..} - character with hex code hhh.. (non-JavaScript mode)
By default, after \x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). Any number of hexadecimal digits may appear between \x{ and }, but the character code is constrained as follows:
8-bit non-UTF mode less than 0x100
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
16-bit non-UTF mode less than 0x10000
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x80000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint
Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called "surrogate" codepoints), and 0xffef.
If characters other than hexadecimal digits appear between \x{ and }, or if there is no terminating }, this form of escape is not recognized. Instead, the initial \x will be interpreted as a basic hexadecimal escape, with no following digits, giving a character whose value is zero.
The Display text now outputs 'simultaneous search' and 'Process longest strings first' options for the next release.
According to the PCRE spec:
\x{hhh..} - character with hex code hhh.. (non-JavaScript mode)
By default, after \x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). Any number of hexadecimal digits may appear between \x{ and }, but the character code is constrained as follows:
8-bit non-UTF mode less than 0x100
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
16-bit non-UTF mode less than 0x10000
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x80000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint
Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called "surrogate" codepoints), and 0xffef.
If characters other than hexadecimal digits appear between \x{ and }, or if there is no terminating }, this form of escape is not recognized. Instead, the initial \x will be interpreted as a basic hexadecimal escape, with no following digits, giving a character whose value is zero.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Search/replace option Simultaneous search bug?
I will try again with the next release, though I suspect that the things you've fixed for filter display do not address the underlying issue.
David
David
David