Unchecked Conversion Warnings

November 2, 2009

This post is taken from an email sent round by Jim Downing following a PMR group code review meeting. The topic which caused the most problem was how to remove unchecked conversion warnings from eclipse which were greatly upsetting a few members of the group (not me).

Primarily I just wanted to capture this for reference rather than keeping it in my inbox.

Had a bit of a dig around the unchecked conversion issue.
The solutions to the problem, in my opinion and in Miss World order: –

  1. I think the _worst_ thing to do is the default eclipse behaviour; adding @SuppressWarnings(“unchecked”) to the method declaration, since this will mask other warnings too, some of which can be much more severe than this one.
  2. The next worst thing to do is to suppress warnings by annotating the individual line. I dislike this one because the annotation is just distracting noisy cruft.
  3. Live with it. Stop your IDE whinging about it so much – the code will still compile.
  4. The best solution is given in the answer at


The main point in the answer given above is that doing an unchecked conversion results in the ClassCastException coming from the guts of the compiled code somewhere, rather than from your code, which is a Bad Thing. So the best thing to do is: –

Rather than : –
List<String> whatYoudLike = foo.getUntypedList(); // Exception gets thrown from the guts of whatever the compile generates to do this.

for(String s: whatYoudLike) {

the best way to do it would be: –

for(Object o : foo.getUntypedList()) {
String s = (String) o; // Exception gets thrown here if at all.

This approach can get a bit more verbose, but that is, I’m afraid, tough luck.

First tentative steps in ClojureCLR

August 27, 2009

Jim and Nick (Day) have been using Clojure for a while on various projects and all too frequently I have heard exclamations of delight when they find what would have been 100s of lines of Java can be done in two or three of Clojure. I have spent most of the past year writing in C# and just haven’t been able to join in, although it has been good to have break from Java.

One of the things to come out of the Chem4Word project has been the idea of performing chemical changes via a stateless interface (CID – the Chemistry Interface Definition). This approach was strongly pushed by Savas and Jim (possibly as an excuse to learn yet another language). A stateless system lends itself beautifully to a functional language and as a bonus Clojure has both a CLR and JVM implementation so we can have a single definition which we can use on both platforms.

As a CompSci at Cambridge the first programming language you see is ML – probably because it makes it fairly easy to work out the big O. So I have some fond memories of functional programming (though quite what aroma it would require to really take me back does not bear thinking about) and (conveniently) a need for it.

The CLR implementation is still fairly bleeding edge and the installation process reminded me of the bad old days of Open Source software. Still 3 hours into the process and I had my REPL up and running and a function which would say Hello! not just to the world, but whoever happened to be passed to it. Tomorrow I shall be dusting off the ML part 1a handout by Larry Paulson (who broke off in lectures to teach us how to make bread the traditional way) and attempting any of the exercises I can find.

I’m looking forward to the Cambridge Clojure user group meeting and all this ML and functional talk means that a sneaky peak at F# over the long weekend is probably coming up too.

Chem4Word logo and my own unit

July 30, 2009

I realise that I have been quiet for a long time on here which is annoying because I have so much that I should be telling people – especially about Chem4Word . Luckily, PMR has been keeping the interest ticking over on his blog and there is always twitter for those quick messages. Speaking of PMR – he and I have had many “robust discussion” sessions over the last few weeks (months) as we have been trying to define what semantic chemistry really is, how much is possible before release and what is absolutely necessary. It has sometimes felt like smacking a puppy.

As we near a release date (we have function freeze and are now bug fixing) I was thinking about the branding etc and realised that we don’t have a logo, favicon or anything so was wondering if anyone has any suggestions.

I have to get back to writing a paper but first of all I was pleased to discover (thanks to Helen) that there is a Townsend unit (Td) but slightly less pleased that it is most important in gas discharge physics.

Dictionaries in CML

September 18, 2008

I am now allowed to be a bit more open about what I am up to following the public announcement of the chem4word project so I hope to be publishing more regularly about day-to-day (probably more like week-to-week) progress and thoughts.

I am currently preparing a set of exemplars and use cases for the first phase of the project. These provide a good source of example molecules and chemical concepts so that we (those with chemical background) can explain to them (everyone else) what on earth we are talking about. It is all too easy to forget that when we say something we know the implicit semantics but others may not. The preparation of this corpus has involved creating high-quality CML documents which conform to CMLLite (a subset of CML – effectively that required to represent chemistry in print).

CML uses dictionaries (via the dictRef attribute) liberally, this means that the schema can specify a single element which can be processed the same way each time but can hold different information. For example the property element can hold both a melting point and a molecular weight.

<cml version="3" convention="CMLLite"
<property dictRef="cmlDict:mw">
<scalar dataType="xs:double" units="unitsDict:dalton">247.3</scalar>
<property dictRef="cmlDict:mpt">
<scalar dataType="xs:double" units="unitsDict:c" min="202" max="205" />

The document above should be familiar to anyone who has seen any CML before. However, there may be a difference. Each of the dictionary items (URIs in the dictRef) actually have definitions. I promised myself at the start of the project that I would never hand over any CML document which contained an undefined dictionary reference.

We will be making these dictionaries available, together with examples, during the project. I am also pushing for the dictionary items to be URLs for ease of use.

Oh! and I have also been learning C# and loving it…

Getting a license just got easier

July 28, 2008

I don’t normally like to repost but I am quite happy to do so for this as I think that it is a wonderful idea. Now I just hope people use it.

From savas’s blog:

When I joined Technical Computing, now part of External Research, we wanted to create an ecosystem of tools and services to support researchers worldwide. Today we announced the results of some of our efforts; there is still more going on.

A tool that was discussed was the Creative Commons addin for Microsoft Office XP/2003. We got feedback from researchers that they really liked the functionality but were very surprised that Microsoft didn’t release an update version for Microsoft Office 2007. Well, we contacted the team responsible for it and found out that they had no plans to update it so we requested and got ownership of its future.

I started prototyping some new ideas around a ribbon-based interface, allowing you to create Creative Common licenses that can be shared between Word, Powerpoint, and Excel. The plugin uses the Creative Commons web service when generating new licenses. Finally, we wanted to make the license machine readable so we are including the RDF representation of the license in the OOXML package.*

Download the Creative Common plugin for Microsoft Office 2007. The updated version for XP/2003 (fixing some reported bugs) will be released very soon.

* Unfortunately, due to timing constraints we didn’t get around to avoiding a feature of Office where document properties are URL-encoded. This is mentioned in the documentation that comes with the plugin so you can build crawlers/indexers.

Cool huh?

A challenge for Chemists and OOXML

July 27, 2008

Not all that long ago there were a series of competitions (of the new BBC version in that you could win kudos and little else) on various blogs (PMR, chemspiderman) to identify the number of chemicals in a paragraph of text. These focused largely on the difficultly of deciding what is and what is not a chemical – and consequently there was not necessarily a right answer.

Now I would like to propose a new challenge… and there is a right answer this time. I have randomly selected a preparation from an organic chemistry article a version of which is shown below. There is also a DOCX version available for download, which is fully and correctly formatted – to get the right answer I strongly suggest that you use the DOCX (although the latest version of Microsoft Office Word is not required).

So now for the challenge: how many chemicals do I think there are in the preparation – and for a further bonus point, which chemical did I get wrong?

((4S,5S)-5-Ethynyl-2,2-dimethyl-1,3-dioxolan-4-yl)methanol (14)

To a stirring mixture of 13 (10.0 g, 62.5 mmol) and anhydrous K2CO3 (11.37 g, 81.25 mmol) in dry MeOH (240 mL) at 65 °C was added a solution of Bestmann–Ohira reagent (15.6 g, 81.25 mmol) in dry MeOH (80 mL) dropwise over a period of 6 h under an argon atmosphere. After neutralization with acetic acid, the solvent was removed in vacuo, water was added and the mixture extracted with ethyl acetate (2 × 100 mL). The combined organic extracts were dried over anhydrous Na2SO4, concentrated under reduced pressure and purified by column chromatography (pet. ether–ethyl acetate, 4 : 1) to obtain 14 (6.82 g, 70%) as a colorless liquid. [α]27D -8.6 (c 1.0, MeOH); anal. calcd for C8H12O3: C, 61.52; H, 7.74; found: C, 61.79; H, 7.84; IR (neat) ν max/cm-1, 3452, 3284, 2121, 848, 665; 1H NMR (200 MHz, CDCl3, D2O exchange), δ 1.42 (s, 3H), 1.48 (s, 3H), 2.53 (d, 1H, J = 2.15 Hz), 3.64 (dd, 1H, J = 12.25, 3.67 Hz), 3.87 (dd, 1H, J = 12.25, 3.03 Hz), 4.16 (ddd, 1H, J = 7.58, 3.67, 3.03 Hz), 4.56 (dd, 1H, J = 7.57, 2.15 Hz).

A quick round of MS bashing

May 22, 2008

I read yesterday on Doug Mahugh’s blog about the new support for ODF in Word 2007. I was excited and pleased about this and eager to see what it would mean for programs such as Peter Sefton’s ICE. Then I saw the view of the Georg Greve, president of the Free Software Foundation Europe, who said:

“Support for ODF indicates there are problems with OpenXML that Microsoft cannot resolve easily and quickly.”

Similarly, Peter M-R received a fair amount of criticism when he supported OOXML. I can’t tell you how dispirited this made me. I was thinking of all the positives – people could use all the functionality of ICE and other technology developed against the ODT specification without having to use OpenOffice or similar. Because that is the problem. I have spent far more hours that I would like under the bonnet of both ODT and DOCX documents as part of the SPECTRa-T project. Both are horrible. It is not their fault, (or not entirely) some of the horridness is because they choose to do things one way (and I would have chosen the opposite) and support for legacy items and unicode characters and so on …

The thing is, I was able to get DOCX documents, all I had to do was ask people to send me a copy of their word file. I had to hunt long and hard before I could lay my hands on a ODT file because nobody was using anything that created one. And I work in an office full of people who spend most of their day using computers and writing programs. Do the users care that ODT is better (allegedly) that OOXML? Do they know? Are the users simply using the document creation software that is easiest to use and fundamentally works – I would say yes. There is nothing stopping me downloading OpenOffice now and using it. But there is also nothing making me do so. Why in gods name would I? What can I do with it that I can’t do with MS Word and can I do any of those things more easily?

I suspect that the reason that Office 2007 has not been welcomed with open arms is because people can no longer use it as easily as they used to be able to. Or at least not at first – that ribbon does hide things pretty effectively as well as taking up all that expensive screen real estate – but eventually you learn your way around.

I have been working in the text/data extraction realm for a while now and Word files used to be a dead end, then along came OOXML and suddenly I had a whole new area to work in. All this time ODT has been hanging around being open and accessible and I could data mine it – except that there wasn’t any of it. So now Microsoft are going to add ODT support to Word – this means that users can now use a decent authoring tool and people can get the results in ODT. Maybe this means that we should start caring about ODT but until I see evidence of people using it (and an appreciable fraction of the published documents being in the format) I will continue to concentrate on OOXML and other products that people actually use.