Page 1 of 1

Extract paragraphs/mulple lines of text

Posted: Mon Jul 17, 2017 12:33 pm
by slouw
Hi there

I have Cisco debug files and Cisco call manager log files which come in the form of multiple line entries separated by a timestamp at the start of each paragraph. The timestamp can act as a SEPARATOR for the PARAGRAPHS that make up the input text. the inout text has the form
4579958: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579959: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579960: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579961: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579962: Jul 11 16:00:27.027 AEST:
<0 or more lines of text>
4579963: Jul 11 16:00:27.027 AEST:
etc.

Lexicon:
SEPARATOR = The timestamp above. This can act as a separator between paragraphs
PARAGRAPH = All the text, starting at the beginning of the SEPARATOR defined above, ending at the beginning of the next SEPARATOR.
SEARCH_STRING = A string of text to search for within the text file(s)

I want to setup TPP so that it searches the input file(s) and give as an output every paragraph containing the SEARCH_STRING.

My preference is perl regex.
Grateful for any ideas regarding how I might tackle this?
Many thanks

Re: Extract paragraphs/mulple lines of text

Posted: Tue Jul 18, 2017 7:42 am
by slouw
I have worked out an approach to this problem and my attempt is detailed here:
http://www.datamystic.com/forums/viewtopic.php?f=17&t=2399
It is not without issue as you will see if you care to look.
For those new to TPP I have discovered an VERY useful way to debug.
I cannot believe that I have only found it now.
It is called "Prompt on replace" see the other post...

Re: Extract paragraphs/mulple lines of text

Posted: Tue Jul 18, 2017 9:35 pm
by DataMystic Support
I have attached a solution here.

The general approach I used is:
  • Mark each section with a single special character - \xff, and ensure UTF-8 mode is off for search.replace
  • Match and output the search text inside \xff boundaries