Tag Archives: data science

Monte Carlo Simulation

Monto Carlo simulation is a technique for approximating future behaviour based on randomly sampled numbers. By sampling from different probability distributions it is possible to use Monte Carlo simulation for a range of different situations including physical systems, computer games or finance. This post gives a simple example of Monte Carlo simulation to give some… Read More »

Get Started With PySpark

Pyspark brings together the analytical power and popularity of Python with the distributed-computing capability of Spark. In this post I show how you can use a docker container with pyspark and spark pre-loaded to let you play with pyspark in a Jupyter notebook, rather than having to configure your own spark cluster first. Use Jupyter… Read More »

The ROC Curve

The ROC Curve is a commonly used method for and evaluating the performance of classification models. ROC curves use a combination the false positive rate (i.e. occurrences that were predicted positive, but actually negative) and true positive rate (i.e. occurrences that were correctly predicted) to build up a summary picture of the classification performance. ROC… Read More »