String Comparison Techniques

String comparison is important for topics such as natural language processing and record linkage.  This post gives a few examples of string comparison techniques that you may wish to consider. String Comparison Techniques Each of these string comparison techniques makes different assumptions or simplifications. You may wish to try several techniques or use a hybrid… Read More »

Get Started With PySpark

Pyspark brings together the analytical power and popularity of Python with the distributed-computing capability of Spark. In this post I show how you can use a docker container with pyspark and spark pre-loaded to let you play with pyspark in a Jupyter notebook, rather than having to configure your own spark cluster first. Use Jupyter… Read More »

Immutable Objects In Python

Immutable objects are useful for making sure the data they contain cannot be changed after they are created. Immutable objects can be useful for passing messages between components, and when working with multiple threads. Immutable objects can also be easier to work with and reason about, because once they are created they cannot be changed.… Read More »

Python BDD

Behaviour Driven Development, or BDD, is a valuable collaboration technique for bridging the gap between developers and wider stakeholders. One part of BDD is the tools or frameworks that can be used to convert BDD statements into actual running tests. This post goes through a simple example using the pytest-bdd plugin. Python BDD with pytest-bdd… Read More »