Hi folks, these began as my personal notes from the Standards NZ meeting. I've now fleshed them out with references and links to make them easier to follow.
Each heading is about a topic or question we had, with some background and my suggested action for Standards NZ.
They may not make sense without knowing the context of the meeting and/or what we talked about. If you have any questions please ask and I'll try to post an explanation and update the article with corrections.
I think this should be the end goal, to merge the formats.
It was suggested that this would create a 3rd format. It's not about creating a 3rd format because of course this harmonized format would become the new ODF. It would be about removing unnecessary and pointless differences between ODF and OOXML and making it easier for me (or anyone) to develop with office suites.
My presentation* explained ECMAs given reasons for believing that they are too different but I hope it was clear to everyone that page breaks, table handling, and cell styles aren't any significant technical problem (Gray also mentioned the "mixed content model" as a reason why they can't be merged -- this is a data modelling issue unrelated to any feature set and so it doesn't affect harmonizing the formats as I understand it).
Also there'd be a lot more software to choose from when it's not such a divisive market.
Technically it can be done and many others in the XML and document community think so too. The co-creator of XML itself, Tim Bray says so and so do people from Microsoft such as Alan Yates. We even heard from Gray that it could take 2 years. It could easily take that amount of time to fix the existing problems in OOXML.
Office Suite formats do not move as quick as other parts of the computing industry. We still have to deal with files from Office 2003, or Office '97, for example. I'm happy with around 2-4 years to harmonize the formats considering the benefits it would bring.
Please see the Presentation on Ecma Points in OpenDocument [0.5MB] or Presentation on Ecma Points in PDF [0.4MB].
(SVG). For more information on these see this blog post from a guy who writes conversion software (who's not me!). OOXML and ODF support Unicode (ISO/IEC 10646) in order to render all the letters and characters needed for Māori, many Polynesian, Asian, Aborigine languages, English, and many more.
So characters themselves are well dealt with but cultural expressiveness -- in the case of Office suites -- would of course allow cultural styles, calendar holidays for the Māori new year, etc.
OOXML contains mostly American and/or Christian symbols (Christmas Trees, Easter Eggs, etc.) in a fixed and non-extensible list. This means that we can't add a border style of Matariki (Māori New Year stars), Koru Borders, Taniwhas, or other kiwi styles.
These fixed lists are detailed in sections 2.18.4 (p. 2414, "Border Styles"), 5.1.12.56 (p. 4557, "Preset Shape Types"), and 5.1.12.76 (p. 4645, "Preset Text Shape Types").
It should be instead changed to allow either arbitrary images, or arbitrary images as well as the existing fixed list, in order to allow multi-cultural styles.
(by the way -- ODF solved this through the first suggested solution of using arbitrary images)
There was a comparison of <cell> (ODF) vs <c> (OOXML) in spreadsheets and which was faster. The goals of XML say,
Clearly contradicts the goals of XML and best practices.
The designers of XML knew what they were doing because while we can remember what "c" means in this case it becomes problematic when we get hundreds or thousands of these shorthand references. HTML, the web page language, has some shorthand references like this but then there are only around 20 things to memorize, so in practice it's not a problem. OOXML has hundreds of these cryptic names.
This kind of naming convention stuff happens all the time in technical documents, so it's good to see that ISO Directives, Part 2 "Rules for the Structure and Drafting of International Standards", section 4.3, sets out requirements for avoiding this type of inconsistent use of terminology.
Here's an example of identical wording referring to different things in OOXML,
It should be changed to follow the goals and best practices of XML by using human-legible terms and distinct terminology.
See this blog post for more Open Malaysia: OOXML has poor XML Element names
Within OOXML they have ways of dealing with some historical bugs (eg, autoSpaceLikeWord95). When a future revision of OOXML defines what autoSpaceLikeWord95 means then OOXML implementors will be able to distinguish bugs from how it should be. This is a good approach.
OpenFormula has a similar approach of adding additional flags to be compatible with historical bugs while preventing bug propagation in future documents. See this information on the OpenFormula CEILING function.
However this technique is used selectively within OOXML, Eg,
Microsoft correctly stated that the 1900 bug in Microsoft Excel was done to emulate a bug in Lotus 123. Correct blame is good, but we should still squash this bug now.
I talked quite a bit about techniques for achieving this and there was no argument that this wasn't indeed possible at the meeting. They should fix the 1900 leap year bug, and the numerous mathematical bugs in SpreadSheetML such as CEILING, AVEDEV, ZTEST, CONFIDENCE, CONVERT, NETWORKDAYS, and at least another 30 more are affected. See these resources,
OOXML should fix the formulas and dates so that they remain compatible with them whilst not propagating their quirks to newly created documents.
The term Intellectual Property is of course not a specific thing but an umbrella term for Copyright, Patents, Trademarks, etc... Due dilligence...
The clip art mentioned in sections 2.18.4, 5.1.12.56 and 5.1.12.76 should be checked for copyright issues.
The Microsoft Open Specification Promise has this line about patent grant... which does grant
The Covenant not to Sue has similar wording to do with a required subset of features:
While I Am Not A Lawyer these plainly spoken sentences suggest to me that patents are granted only for a subset of features in OOXML, the required or necessary ones. It would be great to check what licensing the non-required parts are available under, and whether they meet the ISO RAND criteria, and even if my interpretation is slightly correct!
Internationally Microsoft commissioned analysis from London legal firm Baker & Mckenzie into the various patent grants. This analysis which tries to be a human friendly version of the various grants does not discuss the "required portions" distinction.
Would be good to see whether OOXML implementations could use trademarked terms from the spec in their advertising material, websites, etc. I don't know whether this is commonly done in ISO standards though -- is this too much to ask?
My main point here was in response to Chris Auld that the file formats such as OOXML affect accessibility, and that although there are screen readers there is accessibility software that deals with files directly. An example of accessibility software dealing with files directly is "Blynx" (great name for accessibility software, that). Another example is the planned software project that analyses ODF files for accessibility problems
A lot of accessibility benefits come from reusing existing tech (building upon existing standards which have accessibility software available). Other people have described the problems of accessibility in OOXML better than me, so here's a link
My Docvert software is currently used by disabled people to derive structure to poorly made word processing files, which helps them navigate documents (eg, because it can understand headings and such they can be read out section titles of the document and then narrow in on content without having to read the document linearly).
Was on the working group for the NZ E-government Web Guidelines by SSC as an accessibility and web standards guy. I also developed two versions of the E-government Website which of course had to include accessibility features.
It's of note that ODF 1.0 had some minor accessibility problems itself but these were addressed in ODF 1.1, and here's some info on how minor these problems were,
This one wasn't in my notes -- it's new information that's come to light since Thursday about Microsoft-specific technologies in OOXML.
These are some undocumented Microsoft tech present in OOXML,
I think it's important to note that on Windows many non-Microsoft Office Suites (such as OpenOffice, Word Perfect, etc.) have reverse engineered these but the point is that in OOXML it's still undefined and it's still Windows-only tech that
doesn't work on other platforms.
And this one wasn't in my notes either but it's new information from Stéphane Rodriguez,
It's a bit too emotionally written for me personally but you can't deny the evidence. Mr Rodriguez is a brilliant techy.
Comments are appreciated and please keep it civil :)
[*] Presentation on Ecma Points in OpenDocument [0.5MB] and Presentation on Ecma Points in PDF [0.4MB]