Dictionaries in CML

I am now allowed to be a bit more open about what I am up to following the public announcement of the chem4word project so I hope to be publishing more regularly about day-to-day (probably more like week-to-week) progress and thoughts.

I am currently preparing a set of exemplars and use cases for the first phase of the project. These provide a good source of example molecules and chemical concepts so that we (those with chemical background) can explain to them (everyone else) what on earth we are talking about. It is all too easy to forget that when we say something we know the implicit semantics but others may not. The preparation of this corpus has involved creating high-quality CML documents which conform to CMLLite (a subset of CML – effectively that required to represent chemistry in print).

CML uses dictionaries (via the dictRef attribute) liberally, this means that the schema can specify a single element which can be processed the same way each time but can hold different information. For example the property element can hold both a melting point and a molecular weight.

<cml version="3" convention="CMLLite"
xmlns="http://www.xml-cml.org/schema"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:cmlDict="http://www.xml-cml.org/dictionary/cml/"
xmlns:unitsDict="http://www.xml-cml.org/dictionary/units/">
<property dictRef="cmlDict:mw">
<scalar dataType="xs:double" units="unitsDict:dalton">247.3</scalar>
</property>
<property dictRef="cmlDict:mpt">
<scalar dataType="xs:double" units="unitsDict:c" min="202" max="205" />
</property>
</cml>

The document above should be familiar to anyone who has seen any CML before. However, there may be a difference. Each of the dictionary items (URIs in the dictRef) actually have definitions. I promised myself at the start of the project that I would never hand over any CML document which contained an undefined dictionary reference.

We will be making these dictionaries available, together with examples, during the project. I am also pushing for the dictionary items to be URLs for ease of use.

Oh! and I have also been learning C# and loving it…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: