In this blog, we will be diving into the exciting field of neural networks and building our very own neural network from scratch using PyTorch. We will not only be building it but also implementing loss functions, deriving gradients, and training it! This blog assumes that you are not new to the concept of neural networks and have a working knowledge of python. Don't worry if you're new to PyTorch or using it to build neural networks. That's what you are here to learn.
By the end of this blog, you will have an understanding of the building blocks of neural networks. Keep in mind that in reality, we do not code neural networks from scratch and use existing libraries such as PyTorch and TensorFlow to create custom models. This blog is a learning exercise (for me and you) to understand what is going on under the hood? We will not use any high level modules to create neural networks. We will however at the end implement the same network using torch.nn module so that you can fully understand the concept.
Well, if you are reading this blog it is highly likely that you have some idea about what a neural network is. A neural network is a machine learning model based on the functioning of biological neurons in the human brain. Similar to neurons in human brain, they have an input, output, and a bias (threshold ). These neurons are organized into layers, with the input layer receiving the raw data and the output layer producing the model's predictions.
A single neuron is depicted in fig. 1. It has n inputs and each of the inputs have a weight associated with them. These inputs are linearly combined with the weights to generate an output. We also have a bias term that is added to this combination but it is not shown here. The output is then passed through an activation function (shown as a) which limits the output to a range depending on activation function that is used. Turns out that a single neuron with a sigmoid activation is equivalent to a multiple logistic regression model. However, when we have to model more complex functions, we prefer using multiple neurons arranged over multiple layers of them.
A multi-layer neural network is shown in fig. 2 on the left. There are n inputs as before. Now, there are m such neurons as in Fig. 1 arranged in a layer. There are L such layers in which the number of neurons can vary depending on network architecture. The final layer is termed L+1 and is the output layer with c neurons. This network has L hidden layers, one input layer, and one output layer. To simplify notations, the input layer is visualized as the activation or output of the 0th layer. Thus, we have the activation which denotes the activation from the input layer to the last layer. Each layer has a weight matrix associated with it. The values of the weight matrix describe how the inputs are combined to form the output of a layer.
Fig. 1: A single neuron
Fig. 2: A multi-layer neural network
To explain the concepts, there were multiple ways to go about but I preferred to not put images of the code interspersed with text and my explanations as it hinders people from learning. It does not allow you to copy the code and then run it quickly on your own as you progress. Although I have provided the github link for the notebook at the end but I feel setting it up the way I have done helps. Do check out the entire code on github.
So, here is how I have set up everything. The notebook code blocks are inserted in the blog. I have documented the code very liberally so that you understand each and every step. Google sites does not allow to insert native JS so we cannot directly put LaTex code. This would have made me put images of those equations which lets be honest don't look good. So, the math part is also inserted as a notebook.
Now that that is out of the way, let's get to it. The first snippet deals with importing the libraries, defining the goals of the notebook, and implementation of single layer network with raw pytorch.
I hope it was easy to follow the code snippet above. Until now, we have created a single layer neural network model with 2 hidden neurons, 3 inputs, and 1 output. Now, you may wonder why have I not kept the size of hidden layer as 2, which is the size of input? We always see it everywhere that the size of first hidden layer is same as number of inputs. Yes, and it is almost often the case. The reason I kept it different is to generalize your concepts and learning, as well as mine. Think of the input to hidden layer as not coming from an input layer but from an arbitrary layer in a larger network. Do you think that it would of same size as the hidden layer? Unlikely, right? Thus, in general, the input can be seen as the output of a previous layer in a much larger network. This might seem very simplistic but it is a powerful way to think about it. Let's move ahead!
So far we have not seriously looked at any math or math symbols. And truth be told, I have been deliberately hiding it from you! But, it is now or never. And I must indulge you with some of the very important concepts that you need to know to fully grasp what is going on? It is all great to have created a neural network from scratch. Yes, but, in order to train it we require a specific set of mathematical tools which will help us achieve our goal. So let's not delay it anymore. Shall we?
Now that we have covered the theoretical aspects let us code it. It may be helpful to find the gradients yourself using pen and paper and convince yourself. Trust me it really helps that way. It may take some time and you may get stuck but only trying will tell you. Leave a comment if you have any doubts about how to derive the gradients and I will try to address them as soon as I can. Now let's see all the things that we have learnt so far in action.
Pro tip: Always try to document your bugs while coding, it helps in future if you encounter the same or a similar bug. Assuming that you remember where you have documented that particular bug!!
As we can see from the output, as the iteration increases, the predicted y gets closer and closer to actual y. This tells us that the code is working and indeed the model is able to adjust its parameters to predict the desired value. Now, if you were to run this loop for a few more iterations you would see the values converge almost exactly.
We have come a long way from defining a neural network using just raw PyTorch and then putting the pieces together one by one. I hope it has helped you to understand the concepts. I highly recommend copying the code and running it on your own and tweaking and breaking things. But, our journey is not complete now.
It is obvious that this is not the way people train neural networks or let alone use them. High level libraries like PyTorch and TensorFlow abstract away these intricacies and we are done with making objects and calling functions. So, to complete this learning process, we will now implement the same network using torch.nn module. Let's go.
Hurray! We have finished what we started. I hope you enjoyed this post reading and learning as much as I enjoyed writing it. Feel free to point out any changes, suggestions, or improvements in the code or the blog in general. Thank you for giving it a read. See you in the next blog.
Github link: https://github.com/Nirbhayr/neural-networks/tree/main/notebooks