Page 1 of 1

Reorder Lines Within Line Groups with no consistency

Posted: Fri Oct 08, 2010 4:16 am
by garcle
If you extract raw text from a powerpoint file which has multiple blocks per slide, often the text lines come out in different orders depending on how the slide was created or modified. Is there a way to look for a logical group of lines and re-order the lines within the group (when the lines could be in any order in the group)? Example:-

Header: Header 1
Subheader: Sub Header 1
P2:Paragraph2
P1:Paragraph1
Page 1

P1:Paragraph3
Header: header 2
P2:Paragraph4
Subheader: Sub Header 2
Page 2


the Page delimiter delimits each slide text block, that is the only consistency... the output from above should be-

Header: Header 1
Subheader: Sub Header 1
P1:Paragraph1
P2:Paragraph2
Page 1

Header: header 2
Subheader: Sub Header 2
P1:Paragraph3
P2:Paragraph4
Page 2

I'm struggling with how to do this or even if it is possible, I have tried getting all the text down to Page and sending that to a sub filter, then having a series of sub sub filters whihc look for header, sub header, p1 and p2 and simple output them in order, but I end up with just duplicatd the file as it is.
ny advice woudl be much appreciated.

Re: Reorder Lines Within Line Groups with no consistency

Posted: Tue Oct 12, 2010 8:00 am
by DataMystic Support
Can you please zip and upload your filter - and ensure to include text in the trial run area?