I am now allowed to be a bit more open about what I am up to following the public announcement of the chem4word project so I hope to be publishing more regularly about day-to-day (probably more like week-to-week) progress and thoughts.
I am currently preparing a set of exemplars and use cases for the first phase of the project. These provide a good source of example molecules and chemical concepts so that we (those with chemical background) can explain to them (everyone else) what on earth we are talking about. It is all too easy to forget that when we say something we know the implicit semantics but others may not. The preparation of this corpus has involved creating high-quality CML documents which conform to CMLLite (a subset of CML – effectively that required to represent chemistry in print).
CML uses dictionaries (via the
dictRef attribute) liberally, this means that the schema can specify a single element which can be processed the same way each time but can hold different information. For example the
property element can hold both a melting point and a molecular weight.
<cml version="3" convention="CMLLite"
<scalar dataType="xs:double" units="unitsDict:dalton">247.3</scalar>
<scalar dataType="xs:double" units="unitsDict:c" min="202" max="205" />
The document above should be familiar to anyone who has seen any CML before. However, there may be a difference. Each of the dictionary items (URIs in the
dictRef) actually have definitions. I promised myself at the start of the project that I would never hand over any CML document which contained an undefined dictionary reference.
We will be making these dictionaries available, together with examples, during the project. I am also pushing for the dictionary items to be URLs for ease of use.
Oh! and I have also been learning C# and loving it…