Page 1 of 1

Replacing bad HTML

Posted: Thu Jan 25, 2007 4:04 am
by bpobiak
I have a number of html pages converted from Word that have variations of bad paragraph endings peppered throughout that affect the space between paragraphs:

<br>
&nbsp;&nbsp;&nbsp; <br>

which should be replaced with

<p>

An exact match works, of course, but I don't trust the exact layout of this example to be universal, so I want to code an inclusive search between any pair of <br> tags ignoring whitespace with oneormore forced spaces ('&nbsp;')

I've tried a number of EZ Pattern variations but am stumped and my trial runs always miss the pattern.

Here is the trial data:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<br>
&nbsp;&nbsp;&nbsp; <br>
You have nominated great State and national tickets, your Governor,<br>
your Senators, your Congressmen, your State officers.<br>
&nbsp;&nbsp;&nbsp; <br>

Thanx in Advance. Textpipe is a miracle worker! :D

Posted: Thu Jan 25, 2007 2:15 pm
by DataMystic Support
Thanks Bernie,

Just use

Code: Select all

<br>[ 0+ whitespace or '&nbsp;' or cr or lf ]<br>
and replace with

<p>

Almost there...

Posted: Fri Jan 26, 2007 6:57 am
by bpobiak
That makes sense - but I tried it and for the example below it replaces with many <P>, not a single one. (see result below)

How can it be limited to acting between the <br> tags only once?

Thanx, Simon!

b

New Result:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<p>
<p><p><p> <p>
You have nominated great State and national tickets, your Governor,<p>
your Senators, your Congressmen, your State officers.<p>
<p><p><p> <p>


Sample:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<br>
&nbsp;&nbsp;&nbsp; <br>
You have nominated great State and national tickets, your Governor,<br>
your Senators, your Congressmen, your State officers.<br>
&nbsp;&nbsp;&nbsp; <br>

Posted: Fri Jan 26, 2007 2:21 pm
by DataMystic Support
Sorry, it should be:

Code: Select all

<br>[ longest 0+ whitespace or '&nbsp;' or cr or lf ]<br>

Posted: Fri Jan 26, 2007 10:49 pm
by bpobiak
Hmmm... still multiple <p> result (see new results below). Is the application of a OneOrMore for the occurrances of &nbsp; possible?

New Result:

<p>That means that from this great State of Michigan we want that part of the leadership. After all, you have the Senator who is the head of the Republican Policy Committee in the Senate body. By all means you must send him back and support him with the big delegation that you are capable of sending.<p><p><p><p> <p>You have nominated grea

Thanx.

Posted: Mon Jan 29, 2007 10:20 am
by DataMystic Support
Sorry, second mistake.

It should be:

Code: Select all

<br>[ longest 0+ (whitespace or '&nbsp;' or cr or lf) ]<br>

Bingo!

Posted: Mon Jan 29, 2007 11:14 am
by bpobiak
Perfect! That works exactly right! Thank you Simon!

So that I learn from the experience, let me try to break down the easy pattern:

<br>[ longest 0+ (whitespace or '&nbsp;' or cr or lf) ]<br>

means

Find occurrances where the are 2 <br> codes containing between them the highest number of repetitions of zero or more repetitions of either whitespace or '&nbsp;' or cr or lf

I think I get it. Thanx again!

Posted: Wed Jan 31, 2007 8:33 am
by DataMystic Support
Yep - that's correct!