Extract text between tags
Posted: Sun Mar 15, 2009 12:47 am
I have a lot of html files with similar structure:
...
<td>47:28:...numbers</td>
<td>text</td>
<td align=right>numbers</td>
...
How to extract everything between tags to different files?
At this moment I know how to extract numbers between first pair of tags:
Input
Extract lines matching [<td>47:28]
Remove HTML and XML
Remove blanks from Start of Line
Remove blanks from End of Line
Output
I can't guess how to extract text between other pairs. Can anybody help?
...
<td>47:28:...numbers</td>
<td>text</td>
<td align=right>numbers</td>
...
How to extract everything between tags to different files?
At this moment I know how to extract numbers between first pair of tags:
Input
Extract lines matching [<td>47:28]
Remove HTML and XML
Remove blanks from Start of Line
Remove blanks from End of Line
Output
I can't guess how to extract text between other pairs. Can anybody help?