How to remove duplicate tables where the structure of the table and content of each cell is the same, while ignoring any attributes, id numbers, etc within tags.
Here are five tables [input], four have the same 'content' (not in order)...I want to remove all but one, but..one has a <div> tag that makes the structure non-identical (so only three are identical in content AND structure).
I would like to retain the table order after the duplicates are removed (keep the first duplicate) [output].
Note: I need to retain one duplicate table with all its tag attributes...(otherwise I could remove these attributes and then put everything on a single line, sort and remove...um maybe there is a way to capture the attributes and re-insert at end???)
Any ideas on how to accomplish this?
Thanks,
Brent
Input:
Code: Select all
<table>
<tr id="1">
<td id="1">
<content>XXX</content>
</td>
</tr>
<tr id="2">
<td id="1">
<content>XXX</content>
</td>
</tr>
</table>
<table>
<tr id="3">
<td id="2">
<content>XXX</content>
</td>
</tr>
<tr id="4">
<td id="2">
<content>XXX</content>
</td>
</tr>
</table>
<table>
<tr id="5">
<td id="3">
<content>X</content>
</td>
</tr>
<tr id="6">
<td id="3">
<content>X</content>
</td>
</tr>
</table>
<table>
<tr id="7">
<td id="4">
<content>XXX</content>
</td>
</tr>
<tr id="8">
<td id="4">
<content>XXX</content>
</td>
</tr>
</table>
<table>
<div>
<tr id="9">
<td id="5">
<content>XXX</content>
</td>
</tr>
</div>
<tr id="10">
<td id="5">
<content>XXX</content>
</td>
</tr>
</table>
Code: Select all
<table>
<tr id="1">
<td id="1">
<content>XXX</content>
</td>
</tr>
<tr id="2">
<td id="1">
<content>XXX</content>
</td>
</tr>
</table>
<table>
<tr id="5">
<td id="3">
<content>X</content>
</td>
</tr>
<tr id="6">
<td id="3">
<content>X</content>
</td>
</tr>
</table>
<table>
<div>
<tr id="9">
<td id="5">
<content>XXX</content>
</td>
</tr>
</div>
<tr id="10">
<td id="5">
<content>XXX</content>
</td>
</tr>
</table>