Replacing Font Tags in HTML Pages with Dreamweaver Templates

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
Guest

Replacing Font Tags in HTML Pages with Dreamweaver Templates

Post by Guest »

I have an eval version of TextPipe Web which will be purchased IMMEDIATELY if I can use it to solve the following problem :D

I have inherited 5-6,000 web pages built using Dreamweaver templates and lots of horrible <FONT tags. As part of a site revamp, we will be updating the templates to use CSS and reapplying them to the pages in the site. This will fix the template-driven parts of each page (such as the menus), but will not change the content sections of each page - "Editable Areas" in Dreamweaver parlance.

Dreamweaver marks each Editable Area with a pair of HTML comment tags like this

Code: Select all

<!-- #BeginEditable "body text" -->
<p>The HTML for the body content</p>
<!-- #EndEditable -->
Within the content there is a variety of (mostly) standard <FONT tags that I want to transform to normal HTML tags and style using CSS.

For example, I have thousands of pages containing HTML like the following

Code: Select all

<p><font face="Arial, Helvetica, sans-serif" size="3" color="#000000"><a href="media/pressrel/41027p.htm"><font size="2" color="#FF0000">Paper gives church theological reasons to tackle HIV</font></a></font><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"></font><font face="Arial, Helvetica, sans-serif" size="2" color="#000000">Christian Aid has written a paper, <i>Theology and the HIV/AIDS epidemic, </i>that offers churches a theological model on which to base their support for people living with HIV.<b> <font color="#666666">/27.10.04</font></b></font></p>
Note:
  • 1. Empty Tag pairs that need to be eliminated
    2. Nested font tags - often three or more deep
    3. "Standard" groups such as <b> <font color="#666666"> ... </font></b>
Some tasks like point 1 seem easy to implement with regular expressions.
Others are causing me concern where there are nested tags.

For example, I want to convert

Code: Select all

<font face="Arial, Helvetica, sans-serif" size="2" color="#000000">Christian Aid has written a paper, <i>Theology and the HIV/AIDS epidemic, </i>that offers churches a theological model on which to base their support for people living with HIV.<b> <font color="#666666">/27.10.04</font></b></font>
into

Code: Select all

<p>Christian Aid has written a paper, <i>Theology and the HIV/AIDS epidemic, </i>that offers churches a theological model on which to base their support for people living with HIV.<span class="date"> /27.10.04</span></p>
Note that some font tags will be converted to <p>, others to <span> and still others eliminated altogether.

Question
  • 1. How do I do a replace that will match up opening and closing HTML tags (font tags in this case)? I could only find a restrict rule that seemed to know about HTML tags, and that didn't like all the attributes such as <font face="Arial, Helvetica, sans-serif" size="2" color="#000000">
I would be also VERY grateful for any other bits of advice you may have regarding this project.
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi Rob,

I think you should first eliminate empty groups, then the standard groups, then nested groups.

Standard groups can be removed like this:

<b>\s*<font color="#666666">(.*)</font>\s*</b>

Replace with $1.

With the nested groups, you can used a search/replace as a special kind of restriction- just set the replacement text to $0, and then add subfilters to search multiple layers of tags within tags. There is no limit on this nesting.
Guest

Post by Guest »

Simon,

Thanks for your reply. It takes me a step or three closer, but I am still unclear on the issue of nested HTML/XML tags.

By the way, the example code is an example of the TYPE of HTML I am trying to clean and shows the various types of structures. They were created using the FONT-styles feature of Dreamweaver pre-MX versions. While there are a few "standard" constructs (such as the one you gave an example for) there is generally no fixed order, so I need a generic way to handle them. The HTML often has cases where an editor has tried to enforce a particular appearance so has repeatedly applied a font style - resulting in six or more nested (and redundant) sets of font tags. You can imagine the mess :roll: ...

Back to nested tags. If you were trying to fix the following

Code: Select all

<p>
<font face="Arial, Helvetica, sans-serif" size="3" color="#000000"><a href="media/pressrel/41027p.htm"><font size="2" color="#FF0000">Paper gives church theological reasons to tackle HIV</font></a></font>
<font face="Arial, Helvetica, sans-serif" size="2" color="#000000"></font>
<font face="Arial, Helvetica, sans-serif" size="2" color="#000000">Christian Aid has written a paper, <i>Theology and the HIV/AIDS epidemic, </i>that offers churches a theological model on which to base their support for people living with HIV.<b> <font color="#666666">/27.10.04</font></b></font>
</p>
I gather you recommend the following sequence:
  • 1. Remove all empty font tags. Perl Pattern <font[^>]*>\s*</font> seems to work nicely.
    2. Replacing standard tag groups seem to work also using your suggested expression if they contain no other font tags.
    3. However, how do you remove standard groups that (may) include other (nested) font tags. If you switch greedy matching off the closing </font> in the pattern will match on the first </font> tag it encounters, which could be from one of the nested <font> pairs. If you switch greedy on, it should match on the last </font> tag it finds which could be the closing tag for the opening <font but just as likely could be a closing </font for a later <font group. In place of greedy|not greedy, is there some way to force checking of nesting?
    4. A related point - it occurs that you should replace "standard groups" from the "inside out", i.e. in the first lines of the example given first replace the font group inside the <a anchor, then the outer font group. In the example as given you could make it work if you explicitely matched on font size and color, but if your HTML was example was actually
    <font face="Arial, Helvetica, sans-serif" size="2" color="#FF0000"><a href="media/pressrel/41027p.htm"><font size="2" color="#FF0000">Paper gives church theological reasons to tackle HIV</font></a></font>
    and both font tags had the same attributes you would be stumped for the reason explained in point 3 above.
Hope this is clear - I've found trying to explaing regular expressions one of the more interesting ways to stretch my linguistic skills :wink:

Rob
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Hi Rob,

There is no way to make a pattern recognise nested font tags. You may be forced to remove all font tags and start again. The best option might be to replace font tags with an intermediate code such as <font1> or <font2>, *using a VBScript filter to match each one), and then try and match them to the next </font> tag. It's a tough one.
Post Reply