Relocating strings from one place to another

gerd · Post by **gerd** » Sat Mar 29, 2008 9:46 pm

I am still struggling with the movement of strings from one place to another in a file. Therefore I constructed the following simple example.
My target is: I want to move the content which is between
<beginstring01> and </beginstring01> to the place where <newstring01> is located
and
<beginstring02> and </beginstring02> to the place where <newstring02> is located

Here is the example file:

Code: Select all

<html>

<beginstring01>
this is just "example text" 01 which I want to move to another place which is located further down.
</beginstring01>

<beginstring02>
this is just example text 02 Which I want to move to another place which is located further down.
</beginstring02>

Here is just other text to fill the gap. And here are the new locations I want the strings to appear:

<newstring01>

and here <newstring02>

</html>

It drives me crazy but I cannot make it. Therefore, I ask for help.
gerd

Post by **DataMystic Support** » Sun Mar 30, 2008 7:27 pm

Search for perl pattern:

<beginstring01>(.*)</beginstring01>(.*)<newstring01>

Replace with

$2$$1$

gerd · Post by **gerd** » Sun Mar 30, 2008 9:36 pm

Simon,
thanks a lot, it works fine. My mistake was that that I did not put "(.*)<newstring01>" in the SAME find pattern. Now I know how to move strings around.

Can you also show me how to copy (instead of move) the contents of <beginstring01>(.*)</beginstring01> to the place of <newstring01>. I would like to keep the content between the <beginstring01> tags and have it appear at the place of <newstring01>.

I looked for any filters and I played a little with this and that (e.g. send to subfilter). But I guess I always need an example on which I can refer to. Or is it not possible with TextPipe?
gerd

Post by **DataMystic Support** » Mon Mar 31, 2008 8:18 am

Hi Gerd,

It's easy

To copy the string instead of moving it, search for perl pattern:

(<beginstring01>(.*)</beginstring01>)(.*)<newstring01>

Replace with

$1$$3$$2$

dfhtextpipe · Post by **dfhtextpipe** » Mon Mar 31, 2008 5:14 pm

Simon,

This is the first time I have seen replacements like

Code: Select all

$1$$3$$2$

I had wondered how you replace a pattern with more than one subpatterns with nothing in between them.

Is this what the extra $ is for in each subpattern?
Is this aspect documented in the help file?

Best regards,
David Haslam

Post by **DataMystic Support** » Tue Apr 01, 2008 9:17 am

The extra $ is to disambiguate the two possibilities of

$1 followed by the literal '1', and
$11.

- $ marks the end of the captured variable name.

It is not required when there is other text around it e..g
here is my $1 separate to my $2.

If you had written
$1$3$2
this would be interpreted as
$1$ literal '3' $2

Sorry - I didn't understand your other question - can you please give an example?

simoninsing · Post by **simoninsing** » Wed Mar 10, 2010 10:11 am

I couldn't quite understand how to move/copy strings within a line. This function is probably what I need. I have say 800,000 lines of text, each with geographical place names and other bits and pieces within them. My ultimate aim is to determine the relevant Chinese province for each line. Here's an example:
"Changchun / China Life Insurance Company Limited, Changchun City, Chaoyang Branch Company"
For 90% of these 800,000 lines the province is explicitly stated, and these ones are not the problem. The problem is the 100,000 or so where the province is not in fact stated, such as the above example. In fact the province in that case is Jilin, but because of use of conflicting place names in China (eg. Chaoyang can be a district in Beijing or a city in Jilin) I need to develop some "rules" that will derive the correct province. In the above example, the relevant "rule" is "if you see Changchun and Chaoyang in the same line, then the province is Jilin". Now there are two obvious ways to do this, either by setting up a multiple character string search for each line which will look for two character strings and if they are present, stick some unique character string at the end of the line (say) [not sure if TP will do this for me ?]. Or I can simply ask TP to look for any of a list of character strings (Changchun, Chaoyang, and a hundred others) and then copy or move them to the end of the line, preferably with another character like "^" preceding them, so I can then take the output and dump into Excel to run some IF statements to see if my predermined string pairs are present in any lines.
Assistance greatly appreciated.

Simon D

Post by **DataMystic Support** » Wed Mar 10, 2010 10:42 am

Why not restrict to lines matching perl pattern
Changchun(.*)Chaoyang|Chaoyang(.*)Changchun
ie either ordering, then add a subfilter to add a right margin of
Jilin
?

simoninsing · Post by **simoninsing** » Wed Mar 10, 2010 3:29 pm

This looks highly promising and I think will have application to a few other challenges I face ... Thanks in anticipation.

simoninsing · Post by **simoninsing** » Fri Mar 12, 2010 3:06 pm

...and that has indeed helped enormously, and has bumped me into the Perl world which is what I needed.

However sending the Replace command output to the end of the line is not working for me. Right margin as a sub-filter is not doing anything, and I can't find anything in the Perl documentation which tells you how to throw Replace command output somewhere specific (although plenty on ^ and $ used in the Search input side of the equation). Is there some easy way I can get the text that has been replaced dumped somewhere useful (i.e. end or beginning of the line ?)

Post by **DataMystic Support** » Fri Mar 12, 2010 3:55 pm

Ensure the pattern matches the remainder of the line so that Add Right Margin get to place it in the right spot. Use a perl pattern of:

Code: Select all

Changchun(.*)Chaoyang|Chaoyang(.*)Changchun.*$

DataMystic

Relocating strings from one place to another

Relocating strings from one place to another

What is the extra $ for? Is this documented in help?

Re: Relocating strings from one place to another

Re: Relocating strings from one place to another

Re: Relocating strings from one place to another

Re: Relocating strings from one place to another

Re: Relocating strings from one place to another