… What is Deep Learning?

Hello again! It’s time for another blog post. Reaching this great milestone of a second post means two things – I can call myself a blogger without lying (my first step towards fame, I’m certain) and that I have officially stuck to this project for longer than any hobby other than playing videogames. And people say I don’t take things seriously…

In this post, we’re going to talk a little about Deep Learning (DL) and, more specifically, Neural Networks (NNs). I’m aware that last time I promised a more technically focussed post looking at a specific architecture and I did indeed have a post on Siamese Neural Networks (SNNs) mostly written, but it occurred to me that I was glancing over what was actually happening between the nodes. Since the core of this blog is helping those who are interested to understand the nitty gritty details, I decided that we should examine neural networks in a more general context first and move on to some architectural nuances in later posts.

Another reason behind this post is that it’s actually reasonably difficult to find a clear and straightforward description of exactly what’s happening within a neural network. Nick Taylor, a researcher at the Edinburgh Centre for Robotics, recently talked briefly about this during a SICSA AI meet at Stirling University:

“Neural Networks are the focus of much recent work, but are precisely that which results are most difficult to understand and explain.”

This isn’t just a problem for a single PhD student such as myself, but rather an issue which challenges the research community at large. With that in mind, the only way we’re going to get better at dealing with it is by practicing and developing more resources to help ourselves. So without further ado!

 

… A Quick Introduction to Deep Learning

Deep Learning (DL) is member of a group of machine learning methods which learn the features to extract from input data to develop a new representation of that same data. In many instances, this new representation makes it easier or more efficient to perform traditional machine learning methods such as case-based reasoning and k nearest neighbour [1]. Part of the reasoning behind this is that the newly learned representation is quite flexible, as it is does not suffer from any reliance upon task-specific rules or algorithms which can often cause a lot of trouble if one is using manually-coded features.

Artificial Neural Networks (ANNs) and Deep Neural Networks (DNNs) are particular types of DL algorithm. That said, they are far and away the most popular, and are usually broadly categorised under the singular moniker neural networks, as the difference between them is minor.

What sets neural networks (and DL algorithms in general) apart from other ML techniquesis the belief that a new data representation can be better learned through a ‘deep’ function than a series of shallow functions. Sounds a bit complex? Well, the easiest way to think about this is to think about a computational graph.

Comp_Graph
A simple computational graph

Now for those that haven’t come across these before, computational graphs are just a method to visualise functions whose variables rely upon each other. In the simple example above, we can see that c‘s value is reliant upon a and b while e‘s value is reliant upon c and d.

The easiest way to think about a neural network is to consider a special type of computational graph that has two rules – (1) that each node in one layer is connected to every node in the following layer (though this is not true in every architetcure, we can take it as a rough generalisation for the time being) and (2) the output of each node must be differentiable. Rule (2) is very important in modern neural networks, and we’ll explain why in the next bit.

 

… Training a Neural Network

Neural networks are really a supervised learning method, as their development requires a set of example data, known as the training set, where the desired output is already known. Training a neural network also makes use of a loss function, backpropagation and an optimization algorithm. Now, we’re not going to examine any of these concepts in too much detail here. Each of these three is more than deserving of its own post, and there are various loss functions and optimizers for various situations to boot, but we will set the scene by offering a quick description of each concept.

Loss Function: A loss function’s job is to calculate the difference between a piece of data’s estimated value and its true value to produce an error value. How does the network know the data’s true value? We tell it – which is why we require a training set where the desired output is already known. There are various loss functions that work better in different situations, but the primary objective always remains the same.

Backpropagation: This is one of those scary concepts that was invented because no one wanted to use the term reverse-mode differentiation – because that is exactly what it is. Now I’m not going to explain the full process (mostly because it is done superbly at this blog by Chris Olah [2]), but it is sufficient to say that backpropagation calculates how each and every node in the network contributes to the error value – and you cannot perform it if your network architecture doesn’t follow rule (2)!

Now, I’m cheating a little bit here – you don’t have to use backpropagation to do this calculation, and there are plenty of instances in older research where researchers didn’t. But in modern neural networks with potentially millions of nodes, the cost of doing this calculation without backpropagation would be enormous – enough so that it would increase your training time by  potentially years. See why rule (2) is so important now?

Optimization Algorithm: The optimization algorithm (or optimizer as it is often called) does exactly what it says on the tin; it optimizes the weights and parameters of the nodes of a neural network to minimise the error value. When used in correlation with backpropagation, gradient descent optimizers are most commonly used. Have a look here [3] for a more detailed examination of how they work.

Okay, so let’s put these altogether and walkthrough the process of training a neural network from start to finish.

Data is input to the neural network and is fed through in sequence, with each layer of the network reliant upon the output of the preceding layer. After a piece of data from the training set has been fed through a network, a loss function calculates the difference between the output of the neural network and the desired output. Backpropagation then calculates the error contribution of each node in the network in regards to the output. The weights or parameters of each node are then updated appropriately by the optimizer.

This process is completed for every piece of data in the training set, and is usually repeated over multiple times in order to optimise the nodes. The resultant network can then be applied to data where the desired output is not known and perform its task unsupervised.

And that’s it!

 

Okay, so that’s a very brief (and hopefully clear) introduction to neural networks – below are a few interesting reads that helped inform this article, for those who are interested. Thank you very much for reading and I hope that HAL will let you all leave with your helmets.

[1] Representation Learning: A Review and New Perspectives – Bengio, Courville and   Vincent. 2014.

[2] Calculus on Computational Graphs: Backpropagation – Colah’s Blog. 2015

[3] An Overview of Gradient Descent – Sebastien Ruder’s Blog. 2016

 

 

… a Warm Welcome and AI Explainability

Hi there! Welcome to RGU explAIns…, a new blog from the Robert Gordon University AI research team. In this blog we aim to explore many of the exciting facets of machine learning in a way that is accessible and understandable to all. From sexy new advances in neural networks and deep learning, to augmentations of tried-and-tested Case-Based Reasoning (CBR) techniques, we really hope that this site can act as a window into our current research and interests.

More than that though, we’re hoping that this site can help anyone who is interested in our ramblings to understand Artificial Intelligence (AI) and Machine Learning (ML) just a little bit better. These terms seem to have a constant presence in the news at the moment (and likely for the foreseeable future), but are never fully explained. Part of the reasoning behind this blog is to help us as researchers get better at explaining these things. After all, calling ourselves IT guys or gals just ain’t gonna cut it forever!

Contrary to much popular belief and fear-mongering (I’m looking at you Daily Mail), AI research is not solely concerned with the development of the Terminator. The primary goal of AI research is actually a lot less apocalyptic; to enable computer systems to perform tasks without explicit programming by coding it to simulate intelligent human behaviour. So, we’re actually trying to create something a lot closer to HAL 9000 – feel better now?

Jokes aside, a thorough examination of what exactly constitutes AI  is outwith the scope of this short introductory post. For anyone who is interested in an introduction to the ins-and-outs of what exactly constitutes AI, this excellent post from TechTarget does a great job in explaining it.

Now there’s one question that every researcher with extra-curricular aspirations is always asked when they start a project like this, and that question is…

 

… Why Bother?

Explainability is a growing concern over many different areas of computer science. It’s no longer okay just to say “because the computer said so” when faced with any question that we as IT people can’t answer. People have higher expectations of technology in general these days, and part of these expectations lead to new responsibilities for us as developers and researchers.

Now what do we mean when we say explainability? For a start, my spell checker doesn’t even believe it’s a real word (I assure you, it is). Explainability in computing is a big topic, and we’re only going to look at it with a limited scope here. In the context of ML and AI, we can broadly break down explainability into two topics:

Understandability: One of the things which the majority of AI and ML researchers (myself included) are most guilty of is failing to make the process of our system understandable.  In being able to explain exactly how our algorithms or programs work in such a way that a non-expert can understand them, we tend to come dead last. It is the major effort of this blog to present some of these algorithms in a way that everyone (fledgling researcher or Joe Blogg on the street) can understand exactly how these work.

transparency for ML (2)
It’s not easy to remember that how we as experts see the system does not always reflect how users see the system.

Transparency: There are many times when we know that an intelligent system works, but we struggle to understand how it arrived to a certain result. A typical example is when YouTube recommends a rap music video after you’ve been listening to country music.  Transparency can be seen as the capacity to explain the process which lead to a certain result. This is especially relevant today, when its very realistic to expect that we may have to justify the decisions which an AI system we’ve designed has made.

Now at first glance, understandability and transparency may look identical. They’re not, but they are very similar. If we were to look at these concepts from the angle of the developer then I think it becomes slightly clearer; understandability defines your ability to explain exactly how your system works, while transparency is your capacity to justify or explain an action which your system has taken.

Being able to explain your AI system and justify its actions are soon to become legal requirements in most European countries. The EU General Data Protection Regulation (EGDPR) comes into force next year, and with it arrives a whole bunch of new regulations. Most notably for AI and ML system developers, transparency will become a requirement, not just an ethical consideration. Users of the system can, on demand, request the justification of any decision which AI has made in regards to them and can even request to be excluded from these decisions altogether.

These requirements obviously make it much more difficult for developers and researchers, but I think it’s important to understand it from a user’s point of view. Simply put, AI is scary to those who don’t understand it and the only way to allay those fears is to help people to understand it by getting better at explaining it.

Hence the birth of this blog.

Well, I feel I’ve rambled on long enough for what was meant to be a simple hello. Thank you for taking the time to read a long post from a new blog. I promise in our next post we’ve planned a more focused discussion on  a specific topic (either Memory Networks or Deep Learning, there’s a bit of a two camp situation going on in the office just now). If you’ve got a specific topic you would like to see, please feel free to comment and we’ll do our best.

Until next time, may your computer be smart and John Connor smarter!