Page 1 of 1
Process inside compressed files
Posted: Tue Mar 29, 2011 8:40 pm
by dfhtextpipe
Process inside compressed files currently has only these file types: ZIP, DOCX, XLSX and PPTX.
How about adding support for
OpenDocument ? e.g. ODT files, etc.
See
http://en.wikipedia.org/wiki/OpenDocument
It would then be feasible to process the file
content.xml inside an OpenDocument word processing file.
David
Re: Process inside compressed files
Posted: Wed Mar 30, 2011 2:52 pm
by DataMystic Support
Thanks David,
But
OpenDocument files can also take the format of a ZIP compressed archive containing a number of files and directories
So does this mean that the forms
# .odt for word processing (text) documents
# .ods for spreadsheets
# .odp for presentations
could be just XML, or could be a .zip file, optionally? Or are they always zip format these days?
Re: Process inside compressed files
Posted: Wed Mar 30, 2011 9:12 pm
by dfhtextpipe
Simon,
If from Word 2007, I save a file as OpenDocument format (file extension .odt),
I can readily examine the contents of the saved file using an archive manager such as 7-Zip.
The compressed file contains a content.xml file along with other files, etc.
See attached image that illustrates this.
David
Re: Process inside compressed files
Posted: Wed Mar 30, 2011 9:20 pm
by dfhtextpipe
Further note:
content.xml is "linearized", in that all the XML (after the schema) is on a
single line of text.
However, it can be made more legible using the "Pretty-print" feature of
XML Copy Editor.
See attached image. See also
http://xml-copy-editor.sourceforge.net/
David
Re: Process inside compressed files
Posted: Wed Mar 30, 2011 9:37 pm
by dfhtextpipe
The Notepad++ plugin called XML Tools is another means to "Pretty-print" an XML file, and it also has a "Linearize" option.
Just a further suggestion for your developers....
Maybe it would be nice if TextPad could be enhanced to also include such methods by means of various XML sub-filters.
David
Re: Process inside compressed files
Posted: Fri Apr 01, 2011 2:11 pm
by DataMystic Support
Thanks David - that is very detailed and very helpful.
We have added .ODT for the next release of TP.
Also - I have attached a sample XML Linearize filter.
Re: Process inside compressed files
Posted: Fri Apr 08, 2011 12:56 am
by dfhtextpipe
Thanks Simon.
I think you may have the "XML linearize" terminology flipped!
A linearized XML file is one with everything (except the schema) as a single (very long) line.
A "Pretty Print" XML file is one where the XML is "de-linearized" and intelligently indented.
Examining the rudimentary XML Linearize filter, I observe that (as yet) it does not also apply any indenting.
Something to think about for the future, perhaps. Not urgent - I can still use XML Copy Editor.
Also, it would be sensible to tick Enable UTF-8 support in the Perl sub-filters.
David
Re: Process inside compressed files
Posted: Fri Apr 08, 2011 9:36 am
by DataMystic Support
Thanks David - I will create a new 'XML pretty print' filter and create a new XML Linearize filter to simply replace all cr/lfs with space, and optionally to compress spaces.
I will also enable utf-8 support in those filters.
Re: Process inside compressed files
Posted: Fri Apr 08, 2011 10:54 pm
by dfhtextpipe
Simon,
When linearizing XML, please take care over the the XML schema.
Normally this should be on the first line of text.
Sometimes the definition lookup is spread over more than the first line of text.
Some XML validation tools fail when this is the case.
David
Re: Process inside compressed files
Posted: Sat Apr 09, 2011 10:25 am
by DataMystic Support
I have attached updated filters - BTW, I know the pretty printer is far from pretty.
Would you like to check the schema linearizing? - if it is just between <> then it should be put on one line anyway.