In this blog post I will explore the steps to optimise a sparse vector dot product operation. We will start with a basic implementation in Go, convert it to assembler and then iteratively optimise it, measuring the effect of each change to check our progress. All the code from the post is available on Github and also forms part of the Golang Sparse matrix package.
A vector dot product is a very common kernel, or basic building block, for many computations used within scientific computing and machine learning e. »

This is the third in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents.
In Part 2 of this series, I explored sparse matrix formats as a set of data structures for more efficiently storing and manipulating sparsely populated matrices (matrices where most elements contain zero values). »

This is the second in a series of blog posts sharing my experiences working with algorithms and data structures for machine learning. These experiences were gained whilst building out the nlp project for LSA (Latent Semantic Analysis) of text documents.
In Part 1 of this series, I explored alternative approaches for representing and applying TF-IDF transforms for weighting term frequencies across document corpora. We tested the approaches using Go’s inbuilt benchmark functionality and found that our optimisations materially improved not just memory consumption but also performance (reducing memory consumption and processing time from 7 GB and 41 seconds to 250 KB and 0. »

In my last blog post I walked through the use of machine learning algorithms in Golang to analyse the latent semantic meaning of documents. These algorithms, like many others in data science, rely on linear algebra and vector space analysis. By their nature, they often have to deal with large data sets, so any inefficiencies in the data structures used or algorithms themselves can result in a large impact on overall performance and/or memory usage. »

I spend a lot of time reading articles on the internet and started wondering whether I could develop software to automatically discover and recommend articles relevant to my interests. There are various aspects to this problem but I have decided to concentrate first on the core part of the problem: the analysis and classification of the articles.
To illustrate the problem, lets consider the following string representing an article for the purpose of this example. »

_{
Cover photo: Working Late by Thomas Høyrup Christensen, available under a Attribution-Noncommercial-Share Alike 2.0 license
}