Page 1 of 1

Perl Find and Replace Regular expression child filter

Posted: Tue Jul 18, 2017 7:38 am
by slouw
This is related to an attempt to solve this challenge here:
http://www.datamystic.com/forums/viewtopic.php?f=17&t=2398
This is the desired outcome for a sample input (D1 below)

Image

However the output I am getting is below D2.

Image

The parent filter is working perfectly.
The child filter is invoked on the 3 occasions when the SEARCH_STRING is found in the parent filter output. This corresponds to paragraphs 5, 7 and 8.
However on these 3 occasions the entrire paragraph is not extracted.

The screenshot D3 below shows the parent filter having correctly identified paragraph 4. The parent filter behaves in this way for all paragraphs. In the case of non interesting paragraphs (1,2,3,4,6 and 9) the child filter is never shown in this fantastic "Prompt on replace" feature. I wish I had know about this years ago.
Image

The screenshot D4 below shows the parent filter finding paragraph 5. This is the first interesting paragraph (i.e. one for which an output is desired as this contains the target SEARCH_STRING)

Image

Because the replacement text in the parent filter execution (Marker 2 in D4 above) has a non empty output for the child filter, the next screen shows the action of the child filter for the first time (D5 below). The found text (marker 3 in D5 below) is the accurate output of the parent filter (marker 2 D4 above). This is the text being fed into the child filter as I would understand. The child filter search string (Marker 5 D5 below) as you can see is

Code: Select all

(.*SEARCH_STRING.*)
I would expect that this regex search string to find the entire text. Instread it replaces the found text with text truncated at the end of the target SEARCH_STRING as you can see.

Image

Note that for both parent and child filters the Perl matching options are
- Non greedy shortest default match; and
- "." matches new line
Parent filter, which seems to work perfectly is shown in D6 below:

Image

The child filter, which seems to be matching unexpectedly, is shown in D7 below

Image

What am I doing wrong with the child filter?
Very grateful for any help in this matter

Making the child greedy helps but...

Posted: Tue Jul 18, 2017 12:23 pm
by slouw
I thought I had fixed things by changing the child filter to greedy
It started working with my test environment and it worked (Why?)
I tested it with an OR statement as well (STRINGA|STRINGB) and again in testing it was good.

Alas when i tried production data I hit the same issue described above for some paragraphs
For example below. This is the child filter.
The parent has passed the correct text to the child (top box)
But the child is selecting until it finds one of the search pattens (2909 in this case) and not selecting the rest (bottom box)
I am hoping my efforts to illustrate this matter will be rewarded....:)

Image

Re: Perl Find and Replace Regular expression child filter

Posted: Tue Jul 18, 2017 9:37 pm
by DataMystic Support
Wow, this is really heavy duty. I have posted a simple solution in the other thread.

But to debug this further, can I suggest that you insert a Debug filter at various points in the filter list so that you can see what text is passing through each point?

Also, use the right-click on the filter list and choose 'Trial run up to and including this point'. This often solves problems where the filtered text looks different to what you expect.