Extraction from HTML Page for inclusion in CSV
Posted: Mon Mar 13, 2006 12:47 pm
Hi:
I am evaluating textpipe and webpipe. I have 1300 articles written and published in html that I want to move into a Drupal site so I need to load them into mysql. I tried mark the portions of the source code for each page by things like <!-- item --> // <!-- enditem --> and <!-- content --> //<!-- endcontent -->.
I confess that I am not bright but at least I'm lazy. I want to be able to process through the webpages, pull out the text (html code) in between the ad hoc section markers and put them into a CSV file in separate columns headed by the description so all of the content for a given article would be in the column "Content" and the title for the same article would be under "Title" etc.
Is there a way to do this? I appreciate any help at all. The two products have blown me away with their stability and power. Outstanding!
BTW, I could avoid all of this if there was a way to convert my web pages into rss to input into Drupal.
Thank you in advance for any thoughts or suggestions!
Tim
I am evaluating textpipe and webpipe. I have 1300 articles written and published in html that I want to move into a Drupal site so I need to load them into mysql. I tried mark the portions of the source code for each page by things like <!-- item --> // <!-- enditem --> and <!-- content --> //<!-- endcontent -->.
I confess that I am not bright but at least I'm lazy. I want to be able to process through the webpages, pull out the text (html code) in between the ad hoc section markers and put them into a CSV file in separate columns headed by the description so all of the content for a given article would be in the column "Content" and the title for the same article would be under "Title" etc.
Is there a way to do this? I appreciate any help at all. The two products have blown me away with their stability and power. Outstanding!
BTW, I could avoid all of this if there was a way to convert my web pages into rss to input into Drupal.
Thank you in advance for any thoughts or suggestions!
Tim