Removing all lines between 2 unique strings (not inclusive)

Get help with installation and running here.

Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators

Post Reply
APJ
Posts: 4
Joined: Fri May 11, 2007 12:43 pm
Location: Perth - Western Australia

Removing all lines between 2 unique strings (not inclusive)

Post by APJ »

Hi guys,
I have a simple text files with say 100 lines of data (varies in size each time)
say line 5 starts with lineA: yadda yadda....
and line 40 starts with lineB: yadda yadda...
I want to delete all of lines 6 through 39 inclusive..
just leaving lines 1-5 and 40-100 in the file...
now the next file may have different data and line numbers so a simple line count won't cut it. will always be lineA: and lineB: to identify the lines positions though!
How can I trim this using TextPipe..
thanks in advance
AJ
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Just use perl Pattern:

(lineA: [^\r\n]*?).*(lineB: )

Replace with

$1$$2
APJ
Posts: 4
Joined: Fri May 11, 2007 12:43 pm
Location: Perth - Western Australia

You certainly are clever.. but another 1 if I may...

Post by APJ »

I have a txt file with the following sample..

details: ssssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
<<yadda yadda>>
yyyy
yyy
<<yippee yippee>>
yyyyyyyyyy
yy
<< more yadda>>
<< even more>>
zzzzzzzzz
zzzzzzzz
zzzz
Details:
more whatever...
more...
Fault:
next lot of txt of the same format!
all I want to keep is:

Details: ssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
zzzzzzzzz
zzzzzzzz
zzzz

Details: ssssss
whatever...
Fault: xx
xxxxxxxxxxxxxxxxxx
xxxxxx
xxx
zz
zzzzzzz

Details: ssssss
whatever...
Fault: xxxxxxxx
and so on...

So to get rid of everything in between, want to do the same for each subsequent Details: and following data...
there's sometimes several loads of yyyy data seperated with
<< yadda>> lines but there is always a << yadda>> the line preceeding the zzzzzz data which may be 1 or several lins of txt.
xxxxx may be 1 or several lines of txt too but always starts on the Fault: line and has a <<yadda yadda>> on the line directly after the last line of xxxx (also yadda yadda is different in each <<>>)

What I can see.. is to:
locate line with Fault: on it.
KEEP that line and subsequent lines(if any) down to the first <<
and
locate line with Details: on it.
and keep every line above it up to the first line with a >> on it.(going upwards that is)
delete all lines between this lot

hope this is not too cryptic to follow..
thanks
AJ
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Please show exactly what the output should be.
APJ
Posts: 4
Joined: Fri May 11, 2007 12:43 pm
Location: Perth - Western Australia

I'll put in a close example of original and desired output..

Post by APJ »

all I want to keep is:

Details: ssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
zzzzzzzzz
zzzzzzzz
zzzz

Details: ssssss
whatever...
Fault: xx
xxxxxxxxxxxxxxxxxx
xxxxxx
xxx
zz
zzzzzzz

Details: ssssss
whatever...
Fault: xxxxxxxx
and so on...

Here tiz...
Sample Original file:

Details: BC884164 33600036 19/03/07 09:19 223181
Serial unit: dggherhertyere
rtyrtyrtytyyrrty
Resolution: frhgrtyhrtyerthyrethyer
eryrtyretyertyertyeryt
Fault: ghdfhdfhgdfhghdhdhdfhgdhdfhgdfhdfh
<<19/03/2007 ST 11:39 fgfdgdf>>
fgsdfjghofdghothperth
okfghoithir
<<19/03/2007 ST 11:40 fgfdgdf>>
rthretherherht
rterrtrtyeryt
<<20/03/2007 ST 12:39 fgfdgdf>>
Required last work done on this call
Maybe 1 or several lines long though?

Details: AC884164 33600136 19/03/07 09:19 223182
Serial unit: dggherhertyere
rtyrtyrtytyyrrty
Resolution: frhgrtyhrtyerthyrethyer
...
and so on for dozens of logs.

call starts at Details: and finishes before the next Details: but I need to capture the lines after the last <<xxx>> of the ticket..
there's varying lines of detail and lots of other guff that I have already stripped out, just down to this detail now but can't work out how to strip out the unrequired lines.
desired output for this example is below:


Details: BC884164 33600036 19/03/07 09:19 223181
Fault: ghdfhdfhgdfhghdhdhdfhgdhdfhgdfhdfh
Required last work done on this call
Maybe 1 or several lines long though?

Details: AC884164 33600136 19/03/07 09:19 223182
Fault: 2222222222hjhgjhg2jh22h2jh22
Required last work done on this call
Maybe 1 or several lines long though?
3 in this ticket!!
...
and so on for dozens of logs.

So far I have stripped out lots of other fields and put 'Details:' at the bottom of the file as I can see that I need a starting and finishing delimiter
this would be the LAST '>>' of the ticket and 'Details:' of the following 1.
problem is there are varying numbers of <<xxxxxxx>> in each ticket so pinpointing the last one is going to be tricky eh.
hope this is clearer?
thanks
AJ
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Below illustrates the general technique. You will need your thinking cap on!

1. Find EasyPattern:
[ capture( 'Details: ', 1+ not cr or lf ), cr, lf, 1+ chars, capture('Fault:') ]
Replace with
\x00$1
$2\x01

This inserts \x00 and \x01 as markers which need to be removed later.


2. Now we need to mark the last << >> section after each heading:

EasyPattern:
[ '>>', longest 1+ not ascii(00) ]
Replace with
\x02$0

3. Now remove the preceding unwanted << >>:

Find EasyPattern:
[ ascii(01), 1+ not ascii(02) ]
Replace with nothing.
APJ
Posts: 4
Joined: Fri May 11, 2007 12:43 pm
Location: Perth - Western Australia

no go.. maybe if we look at it differently

Post by APJ »

Sample output...
Fault Notes : archiving problems with main system
<<19/03/2007 ST 11:19 TZ 11:19 ab99665>>
contact caller
<<19/03/2007 ST 11:39 TZ 4WA 09:39 ld76340>>

second process seems to have put the control char after the first occurrence of >> and not the last.
anyway...
maybe...
If we could have some code to do just this..
locate "Fault Notes:"
locate "blank line"

now
I want to keep the line with "Fault Notes:" on it and subsequent lines down to the first line starting with a "<<"
I also want to keep all lines from the last line ending with a ">>" up to the blank line.

eg:
Fault Details:
terte
erert
Fault Notes: dfdfff
xxxxxxx
<<grethert>>
rthr
rtr
<<dfg>>
<<fgt>>
yyyyyyy
yyyyyyy

Fault Details:

result is

Fault Details:
terte
erert
Fault Notes: dfdfff
xxxxxxx
yyyyyyy
yyyyyyy

Fault Details: next ticket here...

This needs to be done for each ticket in the txt file.
is this easier to do maybe?
AJ
User avatar
DataMystic Support
Site Admin
Posts: 2227
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Post by DataMystic Support »

Have you purchased? If not, we can provide help - please drop us an email.
Post Reply