Replacing bad HTML

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
bpobiak
Posts: 7
Joined: Wed Oct 13, 2004 9:37 am
Location: New York City
Contact:

Replacing bad HTML

Post by bpobiak »

I have a number of html pages converted from Word that have variations of bad paragraph endings peppered throughout that affect the space between paragraphs:

<br>
&nbsp;&nbsp;&nbsp; <br>

which should be replaced with

<p>

An exact match works, of course, but I don't trust the exact layout of this example to be universal, so I want to code an inclusive search between any pair of <br> tags ignoring whitespace with oneormore forced spaces ('&nbsp;')

I've tried a number of EZ Pattern variations but am stumped and my trial runs always miss the pattern.

Here is the trial data:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<br>
&nbsp;&nbsp;&nbsp; <br>
You have nominated great State and national tickets, your Governor,<br>
your Senators, your Congressmen, your State officers.<br>
&nbsp;&nbsp;&nbsp; <br>

Thanx in Advance. Textpipe is a miracle worker! :D
-Regards

Bernie Pobiak
Pubcomm Group NYC
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Thanks Bernie,

Just use

Code: Select all

<br>[ 0+ whitespace or '&nbsp;' or cr or lf ]<br>
and replace with

<p>
bpobiak
Posts: 7
Joined: Wed Oct 13, 2004 9:37 am
Location: New York City
Contact:

Almost there...

Post by bpobiak »

That makes sense - but I tried it and for the example below it replaces with many <P>, not a single one. (see result below)

How can it be limited to acting between the <br> tags only once?

Thanx, Simon!

b

New Result:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<p>
<p><p><p> <p>
You have nominated great State and national tickets, your Governor,<p>
your Senators, your Congressmen, your State officers.<p>
<p><p><p> <p>


Sample:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<br>
&nbsp;&nbsp;&nbsp; <br>
You have nominated great State and national tickets, your Governor,<br>
your Senators, your Congressmen, your State officers.<br>
&nbsp;&nbsp;&nbsp; <br>
-Regards

Bernie Pobiak
Pubcomm Group NYC
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Sorry, it should be:

Code: Select all

<br>[ longest 0+ whitespace or '&nbsp;' or cr or lf ]<br>
bpobiak
Posts: 7
Joined: Wed Oct 13, 2004 9:37 am
Location: New York City
Contact:

Post by bpobiak »

Hmmm... still multiple <p> result (see new results below). Is the application of a OneOrMore for the occurrances of &nbsp; possible?

New Result:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<p><p><p><p> <p>You have nominated grea

Thanx.
-Regards

Bernie Pobiak
Pubcomm Group NYC
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Sorry, second mistake.

It should be:

Code: Select all

<br>[ longest 0+ (whitespace or '&nbsp;' or cr or lf) ]<br>
bpobiak
Posts: 7
Joined: Wed Oct 13, 2004 9:37 am
Location: New York City
Contact:

Bingo!

Post by bpobiak »

Perfect! That works exactly right! Thank you Simon!

So that I learn from the experience, let me try to break down the easy pattern:

<br>[ longest 0+ (whitespace or '&nbsp;' or cr or lf) ]<br>

means

Find occurrances where the are 2 <br> codes containing between them the highest number of repetitions of zero or more repetitions of either whitespace or '&nbsp;' or cr or lf

I think I get it. Thanx again!
-Regards

Bernie Pobiak
Pubcomm Group NYC
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Yep - that's correct!
Post Reply