Please help me with this text extraction

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Please help me with this text extraction

Post by tiler »

Hi

Am really new to all this textpipe stuff its far to clever for me and was wondering if someone could lend a hand.

I have worked out how to extract individual items from some files I have converted fron html to text (conversion not in textpipe) but wondered if this was possible.

I have a large number of pages all with the same structure, within the top 50 or so lines of each page is the detail I need. Below is a cut and paste and the details I need in bold.

Using individual filters for telephone and website I can do but is there a way to combine these filters to remove the text I want.


Code: Select all

Tru and Grand - Uk- For all your Ashes


,




,,,,,,,,,,,



,,

,,
,



Tru and Grand 


Ashes specialists Wales and Australia





Company,[b]Tru and Grand [/b],

Click For Website

Contact,[b]Mr Ash[/b](
Address,[b]Unit 1/
Top Road
On a hill
Wales
UK
ABC 123[/b] (MAP)

Telephone,[b]12345 456 234[/b]
Fax,[b]321 8566 999[/b]
Email,[b].......[/b]
Website,[b]wwww.oops.com[/b]

Tru and Grand  was founded in 1650 and is very sorry but a specialist in Ashes
The email is not visible due to a script in the html, I have offline explorer here so maybe I can get it by mining that way unless there is a better idea. The (MAP) reference is a googmaps link


Your help would be greatfully received

Tiler
Last edited by tiler on Mon Jun 13, 2011 10:11 pm, edited 1 time in total.
tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Re: Please help me with this text extraction

Post by tiler »

Hi

Am still struggling with the above, I have got this far :

website,[ 1 + chars ] > website,wwww.oops.com .....................Would like just the web address but I can remove that in excel I guess

Telephone,[ 1 + digits ] > Telephone,12345 456 234 .........................As above really

Address,[ 1 + chars ] > Address,Unit 1/ ..............................Can't get this at all stops on first line of address


The email is a problem as it does not show up in the text page, in the coded page however it shows up just as ( well at least I think this is it ):

Code: Select all

 {
      s=s + t.charAt(l-i);
  }
  
  document.write('<A href=\'mailto:' + s + '?subject=Enquiry from ashes.co.uk\'>' + s + '</a>');
}
</SCRIPT>

Can anyone help me move on with any of the above please ??

Tiler
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Please help me with this text extraction

Post by DataMystic Support »

You can't easily extract email addresses from script.

You would have to attach some script to each web page that runs at the end of the page rendering, which then extracts it.
tiler
Posts: 3
Joined: Sat Jun 11, 2011 2:06 am

Re: Please help me with this text extraction

Post by tiler »

Hi

Thank you for that, we can safely say then that I won't be getting the email addresses as I don't have a clue about that.


I have managed to get the website out using

Code: Select all

(?:website)(?:.+)(?:
)

I can get all the details out together as below,

Company,Tru and Grand ,
Contact,Mr Ash(
Address,Unit 1/
Top Road
On a hill
Wales
UK
ABC 123
(MAP)

Telephone,12345 456 234
Fax,321 8566 999
Email,.......
Website,wwww.oops.com


Using :

Code: Select all

(?:company)(?:.+)(?:
)(?:website)(?:.+)(?:
)
I have also made some adjustments using other filters.


What I can not do and maybe you would be willing to help is :

Get just company and website out together ?

When I get the above into excel they list vertically I want them to list horizontally across the page ?


It has taken me 3 days to get this far as I know nothing of code whatsoever and I am truely stuck now..........

Thank you

Tiler
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Please help me with this text extraction

Post by DataMystic Support »

In the 'replace with' box, put

Code: Select all

$1,$2,$3,$4 
or

Code: Select all

@company,@website,@etc
Post Reply