Perl Find and Replace Regular expression child filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
slouw
Posts: 8
Joined: Wed Dec 18, 2013 4:48 pm

Perl Find and Replace Regular expression child filter

Post by slouw »

This is related to an attempt to solve this challenge here:
http://www.datamystic.com/forums/viewtopic.php?f=17&t=2398
This is the desired outcome for a sample input (D1 below)

Image

However the output I am getting is below D2.

Image

The parent filter is working perfectly.
The child filter is invoked on the 3 occasions when the SEARCH_STRING is found in the parent filter output. This corresponds to paragraphs 5, 7 and 8.
However on these 3 occasions the entrire paragraph is not extracted.

The screenshot D3 below shows the parent filter having correctly identified paragraph 4. The parent filter behaves in this way for all paragraphs. In the case of non interesting paragraphs (1,2,3,4,6 and 9) the child filter is never shown in this fantastic "Prompt on replace" feature. I wish I had know about this years ago.
Image

The screenshot D4 below shows the parent filter finding paragraph 5. This is the first interesting paragraph (i.e. one for which an output is desired as this contains the target SEARCH_STRING)

Image

Because the replacement text in the parent filter execution (Marker 2 in D4 above) has a non empty output for the child filter, the next screen shows the action of the child filter for the first time (D5 below). The found text (marker 3 in D5 below) is the accurate output of the parent filter (marker 2 D4 above). This is the text being fed into the child filter as I would understand. The child filter search string (Marker 5 D5 below) as you can see is

Code: Select all

(.*SEARCH_STRING.*)
I would expect that this regex search string to find the entire text. Instread it replaces the found text with text truncated at the end of the target SEARCH_STRING as you can see.

Image

Note that for both parent and child filters the Perl matching options are
- Non greedy shortest default match; and
- "." matches new line
Parent filter, which seems to work perfectly is shown in D6 below:

Image

The child filter, which seems to be matching unexpectedly, is shown in D7 below

Image

What am I doing wrong with the child filter?
Very grateful for any help in this matter
slouw
Posts: 8
Joined: Wed Dec 18, 2013 4:48 pm

Making the child greedy helps but...

Post by slouw »

I thought I had fixed things by changing the child filter to greedy
It started working with my test environment and it worked (Why?)
I tested it with an OR statement as well (STRINGA|STRINGB) and again in testing it was good.

Alas when i tried production data I hit the same issue described above for some paragraphs
For example below. This is the child filter.
The parent has passed the correct text to the child (top box)
But the child is selecting until it finds one of the search pattens (2909 in this case) and not selecting the rest (bottom box)
I am hoping my efforts to illustrate this matter will be rewarded....:)

Image
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Perl Find and Replace Regular expression child filter

Post by DataMystic Support »

Wow, this is really heavy duty. I have posted a simple solution in the other thread.

But to debug this further, can I suggest that you insert a Debug filter at various points in the filter list so that you can see what text is passing through each point?

Also, use the right-click on the filter list and choose 'Trial run up to and including this point'. This often solves problems where the filtered text looks different to what you expect.
Post Reply