Page 1 of 1

Relocating strings from one place to another

Posted: Sat Mar 29, 2008 9:46 pm
by gerd
I am still struggling with the movement of strings from one place to another in a file. Therefore I constructed the following simple example.
My target is: I want to move the content which is between
<beginstring01> and </beginstring01> to the place where <newstring01> is located
and
<beginstring02> and </beginstring02> to the place where <newstring02> is located

Here is the example file:

Code: Select all

<html>

<beginstring01>
this is just "example text" 01 which I want to move to another place which is located further down.
</beginstring01>

<beginstring02>
this is just example text 02 Which I want to move to another place which is located further down.
</beginstring02>

Here is just other text to fill the gap. And here are the new locations I want the strings to appear:

<newstring01>

and here <newstring02>

</html>
It drives me crazy but I cannot make it. Therefore, I ask for help.
gerd

Posted: Sun Mar 30, 2008 7:27 pm
by DataMystic Support
Search for perl pattern:

<beginstring01>(.*)</beginstring01>(.*)<newstring01>

Replace with

$2$$1$

Posted: Sun Mar 30, 2008 9:36 pm
by gerd
Simon,
thanks a lot, it works fine. My mistake was that that I did not put "(.*)<newstring01>" in the SAME find pattern. Now I know how to move strings around.

Can you also show me how to copy (instead of move) the contents of <beginstring01>(.*)</beginstring01> to the place of <newstring01>. I would like to keep the content between the <beginstring01> tags and have it appear at the place of <newstring01>.

I looked for any filters and I played a little with this and that (e.g. send to subfilter). But I guess I always need an example on which I can refer to. Or is it not possible with TextPipe?
gerd

Posted: Mon Mar 31, 2008 8:18 am
by DataMystic Support
Hi Gerd,

It's easy :-) To copy the string instead of moving it, search for perl pattern:

(<beginstring01>(.*)</beginstring01>)(.*)<newstring01>

Replace with

$1$$3$$2$

What is the extra $ for? Is this documented in help?

Posted: Mon Mar 31, 2008 5:14 pm
by dfhtextpipe
Simon,

This is the first time I have seen replacements like

Code: Select all

$1$$3$$2$
I had wondered how you replace a pattern with more than one subpatterns with nothing in between them.
  • Is this what the extra $ is for in each subpattern?
    Is this aspect documented in the help file?
Best regards,
David Haslam

Posted: Tue Apr 01, 2008 9:17 am
by DataMystic Support
The extra $ is to disambiguate the two possibilities of

$1 followed by the literal '1', and
$11.

- $ marks the end of the captured variable name.

It is not required when there is other text around it e..g
here is my $1 separate to my $2.

If you had written
$1$3$2
this would be interpreted as
$1$ literal '3' $2

Sorry - I didn't understand your other question - can you please give an example?

Re: Relocating strings from one place to another

Posted: Wed Mar 10, 2010 10:11 am
by simoninsing
I couldn't quite understand how to move/copy strings within a line. This function is probably what I need. I have say 800,000 lines of text, each with geographical place names and other bits and pieces within them. My ultimate aim is to determine the relevant Chinese province for each line. Here's an example:
"Changchun / China Life Insurance Company Limited, Changchun City, Chaoyang Branch Company"
For 90% of these 800,000 lines the province is explicitly stated, and these ones are not the problem. The problem is the 100,000 or so where the province is not in fact stated, such as the above example. In fact the province in that case is Jilin, but because of use of conflicting place names in China (eg. Chaoyang can be a district in Beijing or a city in Jilin) I need to develop some "rules" that will derive the correct province. In the above example, the relevant "rule" is "if you see Changchun and Chaoyang in the same line, then the province is Jilin". Now there are two obvious ways to do this, either by setting up a multiple character string search for each line which will look for two character strings and if they are present, stick some unique character string at the end of the line (say) [not sure if TP will do this for me ?]. Or I can simply ask TP to look for any of a list of character strings (Changchun, Chaoyang, and a hundred others) and then copy or move them to the end of the line, preferably with another character like "^" preceding them, so I can then take the output and dump into Excel to run some IF statements to see if my predermined string pairs are present in any lines.
Assistance greatly appreciated.

Simon D

Re: Relocating strings from one place to another

Posted: Wed Mar 10, 2010 10:42 am
by DataMystic Support
Why not restrict to lines matching perl pattern
Changchun(.*)Chaoyang|Chaoyang(.*)Changchun
ie either ordering, then add a subfilter to add a right margin of
Jilin
?

Re: Relocating strings from one place to another

Posted: Wed Mar 10, 2010 3:29 pm
by simoninsing
This looks highly promising and I think will have application to a few other challenges I face ... Thanks in anticipation.

Re: Relocating strings from one place to another

Posted: Fri Mar 12, 2010 3:06 pm
by simoninsing
...and that has indeed helped enormously, and has bumped me into the Perl world which is what I needed.

However sending the Replace command output to the end of the line is not working for me. Right margin as a sub-filter is not doing anything, and I can't find anything in the Perl documentation which tells you how to throw Replace command output somewhere specific (although plenty on ^ and $ used in the Search input side of the equation). Is there some easy way I can get the text that has been replaced dumped somewhere useful (i.e. end or beginning of the line ?)

Re: Relocating strings from one place to another

Posted: Fri Mar 12, 2010 3:55 pm
by DataMystic Support
Ensure the pattern matches the remainder of the line so that Add Right Margin get to place it in the right spot. Use a perl pattern of:

Code: Select all

Changchun(.*)Chaoyang|Chaoyang(.*)Changchun.*$