Software Testing vs Scientific Rigour

The CoViD-19 global pandemic has thrust modelling and analysis into the public eye in a way rarely seen before. One particular example is the CoViD model developed by MRC-IDE, which was a contributor to the UK response to the pandemic. The research team behind this model published their code on GitHub. As might be expected when code is reviewed by new people, some new issues or concerns were raised. However one particularly controversial issue was this one which was so concerned about issues with the code that they demanded that any research associated with it be retracted, and its use for guiding public policy was bad. The flurry of comments in this ticket gives a view on how three different professions perceive quality and rigour: scientific researchers, software engineers and professional analysts.

At the time of writing this issue has been frozen, but is still available to read. I haven’t looked into the code, and wouldn’t know enough about C++ development to form my own opinion anyway. It’s also worth saying that software engineering and development is famous for its religious wars , so the opinions offered in the issue may be strongly held, but not universally accepted.

In this post I’ve tried to pick out some of the themes that different professions seemed to bring up. I’ve also tried to naively characterise or generalise each of the three professions to help give context to why they might be particularly concerned about a particular area of development.

Three Perspectives on Quality

Scientific Researcher

Are producers of research. Often work in collaboration with other professions. I know next to nothing about epidemiology, but you can imagine this team may have worked in conjunction with social scientists

Value published, peer-reviewed research more than code

Publish assumptions and context in articles rather than in-line with the code

See their results as one view point within many which go towards advising policy

Regard some science in policy making as better than none

Many of the ‘tests’ carried out are not often automated (and it wouldn’t always be easy to do so): creation, analysis and review of visualisations; comparison of implementations; cross-checking against theoretical expectations.

See the value of unit tests, but also their limitations in testing against the real world

Assume a high degree of specialist domain knowledge, so variable names can be more terse

On the whole seem to prefer to confirm that the results are consistent with what is expected, rather than being completely confident in the implementation details.

Software Engineer

Are builders of complex systems which are often subject to change. A single mistake can in an instant cause financial and reputational damage. Once a system is live it is essential that it can be modified without causing errors.

Concerned that there were only high-level smoke test style assertions.

Concerned that the absence of unit tests means that individual units of logic are not tested and integrated together correctly.

Professional Analyst

Are advisors to decision makers.

Value keeping track of assumptions and data used to support a particular decision

Perspectives Not Considered

The comments on this Github issue are from only a handful of individuals within their respective professional communities. It is worth remembering that other perspectives on ‘quality’ exist. I’ve attempted here to speculate on them, and to be clear I’ve not looked at the MRC code in any detail, these aren’t necessarily my views rather a hypothetical example of what different professions might take issue with.

Other possible perspectives might include:

Software Tester

“I’m concerned that load testing has not been carried out on this model. It looks like for large populations it doesn’t scale well”.

User Research

“I’m concerned that the simulation outputs result a large amount of cognitive friction. How can a researcher, analyst or politician be reasonably expected to use the outputs from this simulation to make results”.

Ethicist

“I’m concerned about the lack of evidence of ethical approval for use of individual’s data.”

Conclusion

Something that stands out for me is why it is important to have multidisciplinary teams. I think you would struggle to claim that any of the individual arguments made are wrong, and if the goal is to achieve greater rigour and confidence in the results than having these different perspectives can be invaluable.