Development

Optimising Go code with Assembler

In this blog post I will explore the steps to optimise a sparse vector dot product operation. We will start with a basic implementation in Go, convert it to assembler and then iteratively optimise it, measuring the effect of each change to check our progress. All the code from the post is available on Github and also forms part of the Golang Sparse matrix package. A vector dot product is a very common kernel, or basic building block, for many computations used within scientific computing and machine learning e.g. matrix multiplication, cosine similarity, etc. The dot product simply multiplies together the respective elements of two vectors and then adds all the results together to give a scalar output. ...

Optimising algorithms in Go for machine learning - Part 3: The hashing trick

This is the third in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents. In Part 2 of this series, I explored sparse matrix formats as a set of data structures for more efficiently storing and manipulating sparsely populated matrices (matrices where most elements contain zero values). We tested the impact of using sparse formats, over the originally implemented dense matrix formats, using Go’s inbuilt benchmark functionality and found that our optimisations led to a reduction in memory consumption and processing time from 1 GB to 150 MB and 3.3 seconds to 1.2 seconds respectively. ...

Optimising algorithms in Go for machine learning - Part 2: Sparse matrix formats

This is the second in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents. In Part 1 of this series, I explored alternative approaches for representing and applying TF-IDF transforms for weighting term frequencies across document corpora. We tested the approaches using Go’s inbuilt benchmark functionality and found that our optimisations materially improved not just memory consumption but also performance (reducing memory consumption and processing time from 7 GB and 41 seconds to 250 KB and 0.8 seconds respectively). In this blog post I shall explore other areas for optimisation, seeking to further reduce memory consumption and processing time. ...

Optimising algorithms in Go for machine learning

In my last blog post I walked through the use of machine learning algorithms in Golang to analyse the latent semantic meaning of documents. These algorithms, like many others in data science, rely on linear algebra and vector space analysis. By their nature, they often have to deal with large data sets, so any inefficiencies in the data structures used or algorithms themselves can result in a large impact on overall performance and/or memory usage. Inefficiencies that are negligable when working with small data sets can have a huge cost applied across extremely large datasets. As memory is a constrained resource, this could end up limiting the size of data sets that may be processed (certainly without having to resort to persistent storage and/or alternative algorithms) or the types of algorithms used. To this end, I decided to see if I could optimise the algorithms I used to consume less memory and improve processing performance without sacrificing too much functionality or accuracy. This is the first in a series of articles sharing my experiences benchmarking and optimising the algorithms and data structures used whilst building out the nlp project. ...

Semantic analysis of webpages with machine learning in Go

I spend a lot of time reading articles on the internet and started wondering whether I could develop software to automatically discover and recommend articles relevant to my interests. There are various aspects to this problem but I have decided to concentrate first on the core part of the problem: the analysis and classification of the articles. To illustrate the problem, lets consider the following string representing an article for the purpose of this example. ...

Socratic questions revisited [infographic]

A little over a year ago, I wrote a blog post examining Socratic Questions. Socratic Questions are a method of pull influencing that can be used to stimulate critical thinking. To help make the question types easier to understand and remember for use in practice, I have gone back and created an infographic illustrating the 6 types of questions. The infographic is shown below (click on the infographic for the full size version). ...

Continuous delivery tool landscape

I have been having a lot of discussions recently about tooling to support continuous delivery and DevOps practices. There is an incredible and ever increasing array of tools available for these practices. Whilst a number of vendors have developed one-stop solutions or suites of integrated tools, many of the tools in the space tend to be tightly focused on addressing a particular problem. Unfortunatley this can be confusing and overwhelming, especially to people starting out, making it difficult to know where to start and which tools to consider. This can also lead to particular tools being used to solve problems where other types of tools may be better suited. It is therefore important to consider tools within the context of the broader ecosystem and understand the role each one plays and the specific goal or problem(s) they aim to address. With this in mind, I thought it might be useful to visualise the broader CD/DevOps tool landscape to provide some context around the available tools and how they each fit within it. ...

Using data to identify the impact of Southern Rail industrial action

I, like many others, have been affected by the ongoing industrial dispute over Driver Only Operation (DOO) on Southern Railways. On some days this amounts to delayed or cancelled trains with extended journey times and the inconvenience of standing all the way into London and on others, like today, strikes leave no viable way of getting to work in London at all. There have been many attempts to measure and demonstrate the impact of the industrial action such as the use of the #todayimissed hashtag on Twitter/X (see below), a recent passenger survey conducted by The Association of British Commuters and even a tongue-in-cheek video game. Whilst certainly compelling, these have all largely been qualitative rather than quantitative. I have heard tales of people losing or missing out on jobs due to continued lateness or based on where they live and, more recently, quite a lot of people moving job or house so they avoid Southern Rail for their commute to/from work. This got me thinking and I started to wonder whether there was any correlation between the industrial action and property prices in the affected areas. ...

Standardisation in the Enterprise

In enterprises there is often a strong desire to standardise. The reasoning is simple: if we are all doing things the same way, using the same technology, then we can simplify our operations, benefit from economies of scale and make our people more fungible. So by extension, not standardising means duplicated effort, resources and expenditure. But are things really this clear cut? Perhaps we should begin by thinking about the meaning of the word standardisation and understanding the alternatives. Wikipedia defines standardisation as: ...

Remote pair programming

During a previous job I spent a lot of time working with delivery teams on other continents, helping them develop software. I was lucky enough to visit them on several occassions for a week at a time, and whilst I was there made lots of progress working with the on-site developers. Unfortunatley I was not able to stay on-site for the duration of the project and so needed to find other ways of collaborating with the teams remotely from back in the UK. ...