Process inside compressed files
Moderators: DataMystic Support, Moderators, DataMystic Support, Moderators, DataMystic Support, Moderators
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Process inside compressed files
Process inside compressed files currently has only these file types: ZIP, DOCX, XLSX and PPTX.
How about adding support for OpenDocument ? e.g. ODT files, etc.
See http://en.wikipedia.org/wiki/OpenDocument
It would then be feasible to process the file content.xml inside an OpenDocument word processing file.
David
How about adding support for OpenDocument ? e.g. ODT files, etc.
See http://en.wikipedia.org/wiki/OpenDocument
It would then be feasible to process the file content.xml inside an OpenDocument word processing file.
David
Last edited by dfhtextpipe on Wed Mar 30, 2011 9:22 pm, edited 1 time in total.
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Process inside compressed files
Thanks David,
But
# .odt for word processing (text) documents
# .ods for spreadsheets
# .odp for presentations
could be just XML, or could be a .zip file, optionally? Or are they always zip format these days?
But
So does this mean that the formsOpenDocument files can also take the format of a ZIP compressed archive containing a number of files and directories
# .odt for word processing (text) documents
# .ods for spreadsheets
# .odp for presentations
could be just XML, or could be a .zip file, optionally? Or are they always zip format these days?
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Process inside compressed files
Simon,
If from Word 2007, I save a file as OpenDocument format (file extension .odt),
I can readily examine the contents of the saved file using an archive manager such as 7-Zip.
The compressed file contains a content.xml file along with other files, etc.
See attached image that illustrates this.
David
If from Word 2007, I save a file as OpenDocument format (file extension .odt),
I can readily examine the contents of the saved file using an archive manager such as 7-Zip.
The compressed file contains a content.xml file along with other files, etc.
See attached image that illustrates this.
David
- Attachments
-
- Inside an OpenDocument file.
- InsideODT.png (70.15 KiB) Viewed 16119 times
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Process inside compressed files
Further note:
content.xml is "linearized", in that all the XML (after the schema) is on a single line of text.
However, it can be made more legible using the "Pretty-print" feature of XML Copy Editor.
See attached image. See also http://xml-copy-editor.sourceforge.net/
David
content.xml is "linearized", in that all the XML (after the schema) is on a single line of text.
However, it can be made more legible using the "Pretty-print" feature of XML Copy Editor.
See attached image. See also http://xml-copy-editor.sourceforge.net/
David
- Attachments
-
- Pretty view of extracted content.xml
- PrettyContentXML.png (105.8 KiB) Viewed 16119 times
David
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Process inside compressed files
The Notepad++ plugin called XML Tools is another means to "Pretty-print" an XML file, and it also has a "Linearize" option.
Just a further suggestion for your developers....
Maybe it would be nice if TextPad could be enhanced to also include such methods by means of various XML sub-filters.
David
Just a further suggestion for your developers....
Maybe it would be nice if TextPad could be enhanced to also include such methods by means of various XML sub-filters.
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Process inside compressed files
Thanks David - that is very detailed and very helpful.
We have added .ODT for the next release of TP.
Also - I have attached a sample XML Linearize filter.
We have added .ODT for the next release of TP.
Also - I have attached a sample XML Linearize filter.
- Attachments
-
- xml linearize.zip
- Linearize XML files
- (804 Bytes) Downloaded 586 times
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Process inside compressed files
Thanks Simon.
I think you may have the "XML linearize" terminology flipped!
A linearized XML file is one with everything (except the schema) as a single (very long) line.
A "Pretty Print" XML file is one where the XML is "de-linearized" and intelligently indented.
Examining the rudimentary XML Linearize filter, I observe that (as yet) it does not also apply any indenting.
Something to think about for the future, perhaps. Not urgent - I can still use XML Copy Editor.
Also, it would be sensible to tick Enable UTF-8 support in the Perl sub-filters.
David
I think you may have the "XML linearize" terminology flipped!
A linearized XML file is one with everything (except the schema) as a single (very long) line.
A "Pretty Print" XML file is one where the XML is "de-linearized" and intelligently indented.
Examining the rudimentary XML Linearize filter, I observe that (as yet) it does not also apply any indenting.
Something to think about for the future, perhaps. Not urgent - I can still use XML Copy Editor.
Also, it would be sensible to tick Enable UTF-8 support in the Perl sub-filters.
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Process inside compressed files
Thanks David - I will create a new 'XML pretty print' filter and create a new XML Linearize filter to simply replace all cr/lfs with space, and optionally to compress spaces.
I will also enable utf-8 support in those filters.
I will also enable utf-8 support in those filters.
-
- Posts: 988
- Joined: Sun Dec 09, 2007 2:49 am
- Location: UK
Re: Process inside compressed files
Simon,
When linearizing XML, please take care over the the XML schema.
Normally this should be on the first line of text.
Sometimes the definition lookup is spread over more than the first line of text.
Some XML validation tools fail when this is the case.
David
When linearizing XML, please take care over the the XML schema.
Normally this should be on the first line of text.
Sometimes the definition lookup is spread over more than the first line of text.
Some XML validation tools fail when this is the case.
David
David
- DataMystic Support
- Site Admin
- Posts: 2227
- Joined: Mon Jun 30, 2003 12:32 pm
- Location: Melbourne, Australia
- Contact:
Re: Process inside compressed files
I have attached updated filters - BTW, I know the pretty printer is far from pretty.
Would you like to check the schema linearizing? - if it is just between <> then it should be put on one line anyway.
Would you like to check the schema linearizing? - if it is just between <> then it should be put on one line anyway.
- Attachments
-
- xml linearize and pretty print2.zip
- (1.54 KiB) Downloaded 560 times