Page 1 of 1

Search/replace option Simultaneous search bug?

Posted: Sun Mar 23, 2014 12:14 am
by dfhtextpipe
If the search type is Perl, and
if the search pattern contains \x{hhh..} character with hex code hhh... (UTF-8 mode only)
then if you tick the option Simultaneous search,
you get a popup dialog with this kind of error message.
"Error on line #: character value in \x{...} sequence is too large."
Popup screenshot
Popup screenshot
I see no fundamental reason why this should be invalid, so I guess this must be a software bug.

Best regards,

David

Re: Search/replace option Simultaneous search bug?

Posted: Mon Mar 24, 2014 12:21 pm
by DataMystic Support
Hi David,

Here is what I found, searching within PCRE results:

Note that despite its name, in utf8 mode \x{...} does not match utf8 sequences but rather "real" unicode codepoints.

Code: Select all

# cyrillic letter "ю is the codepoint 44E
# its utf8 representation is  D1 8E

$u = "Hello \xd1\x8e"; // this string is in utf8
echo $u, "<br>";
echo preg_replace('~\x{44e}~u', '*', $u); // preg matches codepoint  

Re: Search/replace option Simultaneous search bug?

Posted: Tue Mar 25, 2014 2:47 am
by dfhtextpipe
How does that answer my issue?

Re: Search/replace option Simultaneous search bug?

Posted: Tue Mar 25, 2014 6:21 am
by DataMystic Support
Are you specifying a code point, or a hex value?

What value are you specifying? It looks like a bug in the PCRE regex engine, but I can't confirm without this.

Re: Search/replace option Simultaneous search bug?

Posted: Fri Mar 28, 2014 1:56 am
by dfhtextpipe
Here's the sub-filter that fails when I tick Simultaneous search:

Code: Select all

Perl pattern [\x{05BE}] with [#]
   [X] Match case
   [ ] Whole words only
   [ ] Case sensitive replace
   [ ] Prompt on replace
   [ ] Skip prompt if identical
   [ ] First only
   [ ] Extract matches
   Maximum text buffer size 4096
   [ ] Maximum match (greedy)
   [ ] Allow comments
   [ ] '.' matches newline
   [X] UTF-8 Support

 Further search/replace list phrases (CSV format):
 \x{05C0},#
 \x{05C3},#
 \x{05C6},#
 
Here's what it looks like when that option is not ticked:

Code: Select all

Perl pattern [\x{05BE}] with [#]
   [X] Match case
   [ ] Whole words only
   [ ] Case sensitive replace
   [ ] Prompt on replace
   [ ] Skip prompt if identical
   [ ] First only
   [ ] Extract matches
   Maximum text buffer size 4096
   [ ] Maximum match (greedy)
   [ ] Allow comments
   [ ] '.' matches newline
   [X] UTF-8 Support

 Further search/replace list phrases (CSV format):
 \x{05C0},#
 \x{05C3},#
 \x{05C6},#
 
It would seem that the tick box option has no counterpart in the visual representation of the replace list filter.
This in itself is also a cause for concern.
Screenshot of my replace list filter.
Screenshot of my replace list filter.

Re: Search/replace option Simultaneous search bug?

Posted: Mon Jun 30, 2014 1:04 pm
by DataMystic Support
Hi David,

The Display text now outputs 'simultaneous search' and 'Process longest strings first' options for the next release.

According to the PCRE spec:
\x{hhh..} - character with hex code hhh.. (non-JavaScript mode)

By default, after \x, from zero to two hexadecimal digits are read (letters can be in upper or lower case). Any number of hexadecimal digits may appear between \x{ and }, but the character code is constrained as follows:

8-bit non-UTF mode less than 0x100
8-bit UTF-8 mode less than 0x10ffff and a valid codepoint
16-bit non-UTF mode less than 0x10000
16-bit UTF-16 mode less than 0x10ffff and a valid codepoint
32-bit non-UTF mode less than 0x80000000
32-bit UTF-32 mode less than 0x10ffff and a valid codepoint

Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called "surrogate" codepoints), and 0xffef.

If characters other than hexadecimal digits appear between \x{ and }, or if there is no terminating }, this form of escape is not recognized. Instead, the initial \x will be interpreted as a basic hexadecimal escape, with no following digits, giving a character whose value is zero.

Re: Search/replace option Simultaneous search bug?

Posted: Sun Jul 27, 2014 4:50 am
by dfhtextpipe
I will try again with the next release, though I suspect that the things you've fixed for filter display do not address the underlying issue.

David