Help Deleting Portions of URLs

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Trieste

Help Deleting Portions of URLs

Post by Trieste »

Hi,

I need to delete or replace the portion of an url which reads as follows:

http://www.mysite.com/directory/directo ... ename.html

I want to delete everything except the filename. Directory and Directory2 have ever changing names. I want to keep the filename.html which also changes with every occurrence. I cant seem to get it to work. Can anyone help?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Try using this perl pattern

http://(.*)/([^/]*\.html)

Replace with

$2
Ian

extracting different file type names using the same filter

Post by Ian »

Hi,

Have a situation where usually an HTML line will have three JPEG file names to extract, so I'm trying this and it seems to work well using TextPipe Pro:
Restrict to lines matching:
TheSameDomainName.com(.*?)/([^/]*\.jpg)
Subfilter - replace with: $2

But, sometimes that HTML line will only have two JPEG files and one .PNG file. Any ideas on how to work that into the above filter and extract all three file names, whether they be three JPEG's, or two JPEG's and a PNG?

Ian @ Ashiya, Japan
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Is this the same question as above? Or a new topic altogether?

What pattern is the 'Restrict to lines matching:' actually matching?
Ian

Post by Ian »

Hi,

Sorry Simon, maybe it should be another topic, but I am trying to work with your suggestion to Trieste as a possible way to do this.

Restrict to lines matching: TheSameDomainName.com
Subfilter Perl pattern: TheSameDomainName.com(.*?)/([^/]*\.jpg)
replace with: $2

Below is an example of the code which I'd like to get all three image files from i.e.

5906123078F.jpg 5906123078O.png 5906123078R.jpg

but, sometimes the PNG file will be a JPEG.


-------------
<table border='0' width='100%'><tr><td align='center'><a href='cardetail_il.asp@cor=59&exh=06123078&typ=1'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078F.jpg' width='300' height='225' name='g1' border='0' alt='????'></a></td><td rowspan='3' align='center'><a href='cardetail_oh.asp@cor=59&exh=06123078'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078O.png' width='470' height='470' name='t1' border='0' alt='?????'></a></td></tr><tr><td align='center'><a href='cardetail_il.asp@cor=59&exh=06123078&typ=2'><img src='../../../../TheSameDomainName.com/Photos/59/5906123078R.jpg' width='300' height='225' name='g2' border='0' alt='????'></a></td></tr><a href='cardetail_il.asp@cor=59&exh=06123078&typ=1'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push1' WIDTH='25' HEIGHT='25'></a><a href='cardetail_il.asp@cor=59&exh=06123078&typ=2'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push2' WIDTH='25' HEIGHT='25'></a><a href='cardetail_oh.asp@cor=59&exh=06123078'><img src='../images/push.gif' border='0' CLASS='posabs' ID='push7' WIDTH='25' HEIGHT='25'></a></table>
-------------
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Ok Ian,

Try

Code: Select all

Subfilter Perl pattern: TheSameDomainName.com(.*?)/([^/]*\.(jpg|png)) 
replace with: $2 
For interest, you could also use the following EasyPattern:

Code: Select all

TheSameDomainName.com[ capture(longest 1+ char), '/', 
capture(1+ not '/', ( '.jpg' or '.png' ) ) ]
That might be a little clearer.
Ian

Post by Ian »

Thank you Simon, that works great! :D
Guest

Post by Guest »

Thank you all for your help and comments. I have a related question, I hope you can help with. How could I change the name of a file name ( but not the directories) to lower case.


The format is as follows:

"http://www.mysite.com/directory/directo ... name.shtml"

or

"http://www.mysite.com/directory/directo ... name.shtml"

or

"File_name.shtml"


In other words sometimes it is a relative url, and sometimes not. Sometimes there are more than one directory.

The common factor is the .shtml extension. I do not wish to alter .html files.

Thank you
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Use a subfilter of the search/replace to convert it to lowercase.

You search pattern might be

Code: Select all

[^/]*.shtml
Replace with

Code: Select all

$0
Post Reply