Page 1 of 1
Removing all lines between 2 unique strings (not inclusive)
Posted: Fri May 11, 2007 1:10 pm
by APJ
Hi guys,
I have a simple text files with say 100 lines of data (varies in size each time)
say line 5 starts with lineA: yadda yadda....
and line 40 starts with lineB: yadda yadda...
I want to delete all of lines 6 through 39 inclusive..
just leaving lines 1-5 and 40-100 in the file...
now the next file may have different data and line numbers so a simple line count won't cut it. will always be lineA: and lineB: to identify the lines positions though!
How can I trim this using TextPipe..
thanks in advance
AJ
Posted: Mon May 14, 2007 7:16 am
by DataMystic Support
Just use perl Pattern:
(lineA: [^\r\n]*?).*(lineB: )
Replace with
$1$$2
You certainly are clever.. but another 1 if I may...
Posted: Mon May 14, 2007 4:45 pm
by APJ
I have a txt file with the following sample..
details: ssssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
<<yadda yadda>>
yyyy
yyy
<<yippee yippee>>
yyyyyyyyyy
yy
<< more yadda>>
<< even more>>
zzzzzzzzz
zzzzzzzz
zzzz
Details:
more whatever...
more...
Fault:
next lot of txt of the same format!
all I want to keep is:
Details: ssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
zzzzzzzzz
zzzzzzzz
zzzz
Details: ssssss
whatever...
Fault: xx
xxxxxxxxxxxxxxxxxx
xxxxxx
xxx
zz
zzzzzzz
Details: ssssss
whatever...
Fault: xxxxxxxx
and so on...
So to get rid of everything in between, want to do the same for each subsequent Details: and following data...
there's sometimes several loads of yyyy data seperated with
<< yadda>> lines but there is always a << yadda>> the line preceeding the zzzzzz data which may be 1 or several lins of txt.
xxxxx may be 1 or several lines of txt too but always starts on the Fault: line and has a <<yadda yadda>> on the line directly after the last line of xxxx (also yadda yadda is different in each <<>>)
What I can see.. is to:
locate line with Fault: on it.
KEEP that line and subsequent lines(if any) down to the first <<
and
locate line with Details: on it.
and keep every line above it up to the first line with a >> on it.(going upwards that is)
delete all lines between this lot
hope this is not too cryptic to follow..
thanks
AJ
Posted: Mon May 14, 2007 5:44 pm
by DataMystic Support
Please show exactly what the output should be.
I'll put in a close example of original and desired output..
Posted: Tue May 15, 2007 12:21 pm
by APJ
all I want to keep is:
Details: ssssss
whatever...
Fault: xxxxxx
xxxxxxxxxxx
xxxxxxxxxx
zzzzzzzzz
zzzzzzzz
zzzz
Details: ssssss
whatever...
Fault: xx
xxxxxxxxxxxxxxxxxx
xxxxxx
xxx
zz
zzzzzzz
Details: ssssss
whatever...
Fault: xxxxxxxx
and so on...
Here tiz...
Sample Original file:
Details: BC884164 33600036 19/03/07 09:19 223181
Serial unit: dggherhertyere
rtyrtyrtytyyrrty
Resolution: frhgrtyhrtyerthyrethyer
eryrtyretyertyertyeryt
Fault: ghdfhdfhgdfhghdhdhdfhgdhdfhgdfhdfh
<<19/03/2007 ST 11:39 fgfdgdf>>
fgsdfjghofdghothperth
okfghoithir
<<19/03/2007 ST 11:40 fgfdgdf>>
rthretherherht
rterrtrtyeryt
<<20/03/2007 ST 12:39 fgfdgdf>>
Required last work done on this call
Maybe 1 or several lines long though?
Details: AC884164 33600136 19/03/07 09:19 223182
Serial unit: dggherhertyere
rtyrtyrtytyyrrty
Resolution: frhgrtyhrtyerthyrethyer
...
and so on for dozens of logs.
call starts at Details: and finishes before the next Details: but I need to capture the lines after the last <<xxx>> of the ticket..
there's varying lines of detail and lots of other guff that I have already stripped out, just down to this detail now but can't work out how to strip out the unrequired lines.
desired output for this example is below:
Details: BC884164 33600036 19/03/07 09:19 223181
Fault: ghdfhdfhgdfhghdhdhdfhgdhdfhgdfhdfh
Required last work done on this call
Maybe 1 or several lines long though?
Details: AC884164 33600136 19/03/07 09:19 223182
Fault: 2222222222hjhgjhg2jh22h2jh22
Required last work done on this call
Maybe 1 or several lines long though?
3 in this ticket!!
...
and so on for dozens of logs.
So far I have stripped out lots of other fields and put 'Details:' at the bottom of the file as I can see that I need a starting and finishing delimiter
this would be the LAST '>>' of the ticket and 'Details:' of the following 1.
problem is there are varying numbers of <<xxxxxxx>> in each ticket so pinpointing the last one is going to be tricky eh.
hope this is clearer?
thanks
AJ
Posted: Tue May 15, 2007 2:21 pm
by DataMystic Support
Below illustrates the general technique. You will need your thinking cap on!
1. Find EasyPattern:
[ capture( 'Details: ', 1+ not cr or lf ), cr, lf, 1+ chars, capture('Fault:') ]
Replace with
\x00$1
$2\x01
This inserts \x00 and \x01 as markers which need to be removed later.
2. Now we need to mark the last << >> section after each heading:
EasyPattern:
[ '>>', longest 1+ not ascii(00) ]
Replace with
\x02$0
3. Now remove the preceding unwanted << >>:
Find EasyPattern:
[ ascii(01), 1+ not ascii(02) ]
Replace with nothing.
no go.. maybe if we look at it differently
Posted: Tue May 15, 2007 3:21 pm
by APJ
Sample output...
Fault Notes : archiving problems with main system
<<19/03/2007 ST 11:19 TZ 11:19 ab99665>>
contact caller
<<19/03/2007 ST 11:39 TZ 4WA 09:39 ld76340>>
second process seems to have put the control char after the first occurrence of >> and not the last.
anyway...
maybe...
If we could have some code to do just this..
locate "Fault Notes:"
locate "blank line"
now
I want to keep the line with "Fault Notes:" on it and subsequent lines down to the first line starting with a "<<"
I also want to keep all lines from the last line ending with a ">>" up to the blank line.
eg:
Fault Details:
terte
erert
Fault Notes: dfdfff
xxxxxxx
<<grethert>>
rthr
rtr
<<dfg>>
<<fgt>>
yyyyyyy
yyyyyyy
Fault Details:
result is
Fault Details:
terte
erert
Fault Notes: dfdfff
xxxxxxx
yyyyyyy
yyyyyyy
Fault Details: next ticket here...
This needs to be done for each ticket in the txt file.
is this easier to do maybe?
AJ
Posted: Tue May 15, 2007 3:26 pm
by DataMystic Support
Have you purchased? If not, we can provide help - please drop us an email.