Match line but replace only part?
Posted: Fri May 26, 2006 9:23 am
I'm moving and cleaning up a website (about 40k pages) I took over maintence of that has several problems. Some people on a forum recommended I check out text pipe for some of the things I needed to do. So far many of the problems I have appear to be correctable with text pipe, but I don't see how to do one of the really major cleanup tasks I have.
The site has tons of internal page links that currently appear like:
../www.website.com/section/Reference/Entry ... x.htm#e955
../www.website.com/section/Reference/Entry ... x.htm#e968
../www.website.com/section/Reference/Entry ... .htm#e1425
../www.website.com/Definition/a-c/655/index.htm#d655
../www.website.com/Definition/a-c/2265/index.htm#d2265
the ../www.website.com/ and index.htm part of the URLs is constant but the rest is variable and there's several thousand different variations of the code that falls between them. What I want to do is drop everything but the /####/index.htm#xxxx to make simple relative links. So the examples above would eventually become something (I can easily adjust the paths and such after the main URL cleanup is done) like:
../955/index.htm#e955
../968/index.htm#e968
../1425/index.htm#e1425
../655/index.htm#d655
../2265/index.htm#d2265
But I don't see a way for text pipe to do it. I can get it to match the full URLs easily enough with
../www.website.com/[ longest 1 to 40 letters or digits or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> ]/[ longest 1 to 4 digits ]/index.htm[ longest 1 to 5 digits or letters or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> ]
But I don't see any way to get it to remove the preceeding code and leave the part I want to keep nor anyway to match the offending text and just remove it without the part I want to keep.
Is it possible to to do this with Text Pipe?
The site has tons of internal page links that currently appear like:
../www.website.com/section/Reference/Entry ... x.htm#e955
../www.website.com/section/Reference/Entry ... x.htm#e968
../www.website.com/section/Reference/Entry ... .htm#e1425
../www.website.com/Definition/a-c/655/index.htm#d655
../www.website.com/Definition/a-c/2265/index.htm#d2265
the ../www.website.com/ and index.htm part of the URLs is constant but the rest is variable and there's several thousand different variations of the code that falls between them. What I want to do is drop everything but the /####/index.htm#xxxx to make simple relative links. So the examples above would eventually become something (I can easily adjust the paths and such after the main URL cleanup is done) like:
../955/index.htm#e955
../968/index.htm#e968
../1425/index.htm#e1425
../655/index.htm#d655
../2265/index.htm#d2265
But I don't see a way for text pipe to do it. I can get it to match the full URLs easily enough with
../www.website.com/[ longest 1 to 40 letters or digits or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> ]/[ longest 1 to 4 digits ]/index.htm[ longest 1 to 5 digits or letters or <!"#$%&'()*+,-./\:;=?@[]^_`{}~|> ]
But I don't see any way to get it to remove the preceeding code and leave the part I want to keep nor anyway to match the offending text and just remove it without the part I want to keep.
Is it possible to to do this with Text Pipe?