Page 1 of 1
Help Deleting Portions of URLs
Posted: Sun May 09, 2004 11:39 am
by Trieste
Hi,
I need to delete or replace the portion of an url which reads as follows:
http://www.mysite.com/directory/directo ... ename.html
I want to delete everything except the filename. Directory and Directory2 have ever changing names. I want to keep the filename.html which also changes with every occurrence. I cant seem to get it to work. Can anyone help?
Posted: Wed May 12, 2004 10:09 am
by DataMystic Support
Try using this perl pattern
http://(.*)/([^/]*\.html)
Replace with
$2
extracting different file type names using the same filter
Posted: Sun Jun 06, 2004 11:45 pm
by Ian
Hi,
Have a situation where usually an HTML line will have three JPEG file names to extract, so I'm trying this and it seems to work well using TextPipe Pro:
Restrict to lines matching:
TheSameDomainName.com(.*?)/([^/]*\.jpg)
Subfilter - replace with: $2
But, sometimes that HTML line will only have two JPEG files and one .PNG file. Any ideas on how to work that into the above filter and extract all three file names, whether they be three JPEG's, or two JPEG's and a PNG?
Ian @ Ashiya, Japan
Posted: Mon Jun 07, 2004 10:04 am
by DataMystic Support
Is this the same question as above? Or a new topic altogether?
What pattern is the 'Restrict to lines matching:' actually matching?
Posted: Mon Jun 07, 2004 12:07 pm
by Ian
Hi,
Sorry Simon, maybe it should be another topic, but I am trying to work with your suggestion to Trieste as a possible way to do this.
Restrict to lines matching: TheSameDomainName.com
Subfilter Perl pattern: TheSameDomainName.com(.*?)/([^/]*\.jpg)
replace with: $2
Below is an example of the code which I'd like to get all three image files from i.e.
5906123078F.jpg 5906123078O.png 5906123078R.jpg
but, sometimes the PNG file will be a JPEG.
-------------
<table border='0' width='100%'><tr><td align='center'><a href='cardetail_il.asp@cor=59&exh=06123078&typ=1'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078F.jpg' width='300' height='225' name='g1' border='0' alt='????'></a></td><td rowspan='3' align='center'><a href='cardetail_oh.asp@cor=59&exh=06123078'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078O.png' width='470' height='470' name='t1' border='0' alt='?????'></a></td></tr><tr><td align='center'><a href='cardetail_il.asp@cor=59&exh=06123078&typ=2'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078R.jpg' width='300' height='225' name='g2' border='0' alt='????'></a></td></tr><a href='cardetail_il.asp@cor=59&exh=06123078&typ=1'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push1' WIDTH='25' HEIGHT='25'></a><a href='cardetail_il.asp@cor=59&exh=06123078&typ=2'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push2' WIDTH='25' HEIGHT='25'></a><a href='cardetail_oh.asp@cor=59&exh=06123078'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push7' WIDTH='25' HEIGHT='25'></a></table>
-------------
Posted: Mon Jun 07, 2004 12:22 pm
by DataMystic Support
Ok Ian,
Try
Code: Select all
Subfilter Perl pattern: TheSameDomainName.com(.*?)/([^/]*\.(jpg|png))
replace with: $2
For interest, you could also use the following EasyPattern:
Code: Select all
TheSameDomainName.com[ capture(longest 1+ char), '/',
capture(1+ not '/', ( '.jpg' or '.png' ) ) ]
That might be a little clearer.
Posted: Mon Jun 07, 2004 2:21 pm
by Ian
Thank you Simon, that works great!
Posted: Fri Jun 18, 2004 11:55 am
by Guest
Thank you all for your help and comments. I have a related question, I hope you can help with. How could I change the name of a file name ( but not the directories) to lower case.
The format is as follows:
"
http://www.mysite.com/directory/directo ... name.shtml"
or
"
http://www.mysite.com/directory/directo ... name.shtml"
or
"File_name.shtml"
In other words sometimes it is a relative url, and sometimes not. Sometimes there are more than one directory.
The common factor is the .shtml extension. I do not wish to alter .html files.
Thank you
Posted: Fri Jun 18, 2004 12:05 pm
by DataMystic Support
Use a subfilter of the search/replace to convert it to lowercase.
You search pattern might be
Replace with