Extract paragraphs/mulple lines of text

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
slouw
Posts: 8
Joined: Wed Dec 18, 2013 4:48 pm

Extract paragraphs/mulple lines of text

Post by slouw »

Hi there

I have Cisco debug files and Cisco call manager log files which come in the form of multiple line entries separated by a timestamp at the start of each paragraph. The timestamp can act as a SEPARATOR for the PARAGRAPHS that make up the input text. the inout text has the form
4579958: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579959: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579960: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579961: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579962: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579963: Jul 11 16:00:27.027 AEST:
etc.

Lexicon:
SEPARATOR = The timestamp above. This can act as a separator between paragraphs
PARAGRAPH = All the text, starting at the beginning of the SEPARATOR defined above, ending at the beginning of the next SEPARATOR.
SEARCH_STRING = A string of text to search for within the text file(s)

I want to setup TPP so that it searches the input file(s) and give as an output every paragraph containing the SEARCH_STRING.

My preference is perl regex.
Grateful for any ideas regarding how I might tackle this?
Many thanks
slouw
Posts: 8
Joined: Wed Dec 18, 2013 4:48 pm

Re: Extract paragraphs/mulple lines of text

Post by slouw »

I have worked out an approach to this problem and my attempt is detailed here:
http://www.datamystic.com/forums/viewtopic.php?f=17&t=2399
It is not without issue as you will see if you care to look.
For those new to TPP I have discovered an VERY useful way to debug.
I cannot believe that I have only found it now.
It is called "Prompt on replace" see the other post...
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Extract paragraphs/mulple lines of text

Post by DataMystic Support »

I have attached a solution here.

The general approach I used is:
  • Mark each section with a single special character - \xff, and ensure UTF-8 mode is off for search.replace
  • Match and output the search text inside \xff boundaries
Attachments
extract Cisco call manager log files.zip
(897 Bytes) Downloaded 735 times
Post Reply