How do I Extract Text Between Two Fields from HTML files?

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

How do I Extract Text Between Two Fields from HTML files?

Post by pheagila »

Hi all,

I would like to Extract Text Between Two Fields from many HTML files

i.e all the Text between:

<!-- Start Results Section -->
.... Data .....
<!-- End Results Section -->

I would like all the extracted text combined together and output to abc.txt

Below are my current settings, but it is NOT working as it also copies a lot of data 'outside' of the tags

Can anyone help me with what I am doing wrong? (yes I am new to TextPipe)

Code: Select all

Restrict to between tags <<!-- Start Results Section -->>...<<!-- End Results Section -->>
|  [X] Include text
|  [X] Match case
| Max size: 65536
|
+--Merge output to file C:\1\abc.txt  
User avatar
Fixer
Posts: 25
Joined: Thu Jul 31, 2008 6:39 am
Location: European Union > Poland
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Post by Fixer »

I almost always use perl pattern

Code: Select all

Filter List
-----------
Filter options
|  [ ] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Process binary files
|   
|--Perl pattern [<\!-- Start Results Section -->\r\n(.*)\r\n<\!-- End Results Section -->\r\n] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [X] UTF-8 Support
|   
|--Remove blank lines
|   
+--Merge output to file c:\mergefilename.txt
    

Files List
----------
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Post by pheagila »

Fixer wrote:I almost always use perl pattern

Code: Select all

Filter List
-----------
Filter options
|  [ ] Log to file
|  [X] Append to logfile
|  Log filename: textpipe.log
|  Threshold 500
|
|--Input from file(s)
|     [ ] Confirm before processing each file
|     [ ] Confirm before processing read/only files
|     [ ] Delete input files after processing
|     Process binary files
|   
|--Perl pattern [<\!-- Start Results Section -->\r\n(.*)\r\n<\!-- End Results Section -->\r\n] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [ ] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [X] UTF-8 Support
|   
|--Remove blank lines
|   
+--Merge output to file c:\mergefilename.txt
    

Files List
----------
Thanks Fixer

How do I import what you have typed above directly into TextPipe Pro?

Cheers
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Post by DataMystic Support »

What is shown above is just a clipboard export and can't be input directly. Soon we will have an XML export/import facility.

The key is that you are using a restriction and that is not what it is intended for. Please read the help on restrictions.

You just need to use a search/replace filter with the 'Extract matches' option.
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Post by pheagila »

DataMystic Support wrote:What is shown above is just a clipboard export and can't be input directly. Soon we will have an XML export/import facility.

The key is that you are using a restriction and that is not what it is intended for. Please read the help on restrictions.

You just need to use a search/replace filter with the 'Extract matches' option.
thanks DataMystic Support

Can you give me a Clipboard Export example like Fixer for "use search/replace filter with 'Extract matches' option"?
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Post by DataMystic Support »

Sure:

Code: Select all

|--Perl pattern [<!-- Start Results Section -->(.*)<!-- End Results Section -->] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [X] Extract matches
|     Maximum text buffer size 64096
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [ ] UTF-8 Support
|   
+--Merge output to file C:\1\abc.txt
    
pheagila
Posts: 9
Joined: Mon Aug 18, 2008 9:04 pm

Re: How do I Extract Text Between Two Fields from HTML files?

Post by pheagila »

thanks Support but your Perl pattern filter returns 0 bytes

Code: Select all

|--Perl pattern [<!-- Start Results Section -->(.*)<!-- End Results Section -->] with [$1\r\n]
|     [ ] Match case
|     [ ] Whole words only
|     [ ] Case sensitive replace
|     [ ] Prompt on replace
|     [ ] Skip prompt if identical
|     [ ] First only
|     [X] Extract matches
|     Maximum text buffer size 99999
|     [ ] Maximum match (greedy)
|     [ ] Allow comments
|     [X] '.' matches newline
|     [ ] UTF-8 Support
This filter seems to work

Code: Select all

|--Extract [<!-- Start Auction Results Section -->(.*)<!-- End Auction Results Section -->]
|     [ ] Include line numbers
|     [ ] Include filename
|     [X] Match case
|     [ ] Count matches
|     Pattern type: 0
What do I need to change to get your Perl Pattern filter to work ?

Cheers
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: How do I Extract Text Between Two Fields from HTML files?

Post by DataMystic Support »

Please just send us an email referencing this discussion and we can send you a filter.
Post Reply