Feedforward Neural Network (FNN) Implementation from Scratch Using Python

8 min readJun 17, 2022

Hi, if you are coming from the first part of the tutorial! If you are not, that’s totally OK too. Please kindly go through them first here where I explained the theory behind a neural network. I will also provide a quick review of the first part of the tutorial in this article. Oh, don’t forget to open your notes and code editor as we will get more mathematical and technical. Sit tight and let us start!

Quick Review

So, as you might have known, the process of a neural network consists of two processes: forward propagation and backward propagation. One forward pass and one backward pass are called an epoch. Hence, in each epoch, the network will update all of its weights and biases to minimize the cost (or loss) function — recall that our objective is to minimize the loss.

The flow of one epoch is such that

1. The inputs of several — possibly one — samples are fed into the network. Forward propagation is done when we have computed all the activation functions until the last or output layer.

2. From the last layer, we compute the loss using the cost function.

3. After that, we compute the error of the last layer, update its weights and biases, and continue to walk backward.

4. We walk backward by updating the weights and biases of the l-th layer using the error of (l+1)-th layer. Then, we stop at the first hidden layer since we will define the first layer as the input layer. Remember not to confuse the first layer and first hidden layer.

If you are still confused with the flow, I recommend you watch this video. If you are good, let’s continue. In the next sections, we will try to solve a simple regression task using a feedforward neural network. Before we start to code, I think we should plan the “attack” to tackle the problem.

Plan of Attack

We will define several things here, such as the architecture of the neural networks: how many layers we want in the network, how many neurons in each layer, what is the learning rate, and so forth.

Okay, so, I have made several decisions about the attack and I think it is pretty good — I’ve tested it. Here’s my plan:

Here, as you can see, we will use Rectified Linear Unit since our problem is a regression one. I will also take the learning rate as 0.0001 and ten thousand epochs.

Of course, you can try to design your plan, but I suggest you finish the tutorial first to see if your code works properly. Then, you can try to change the design here and there. Be careful in deciding them, though, as you might make your network broken (it does not learn anything)! In deep learning terms, I think it is called the vanishing gradient problem. It is when your weights and biases do not significantly change in learning, thereby making the loss keep at a constant value.

Initial Setup: Data and Variables

If you recount, from the first article I’ve defined some of the notation. We will use a similar notation here. That means we will use z, a, b, and w. However, I will give you an early notice since we will modify z’s formula from

because with this, we will minimize the computational load of transposing a matrix.

Now, let’s make a file named “lib.py” to store our cost and activation functions. We will make two functions for each type: one for the function itself (used in forward propagation) and one for its derivative (used in backward propagation).

I will start by defining it in mathematical form.

Then, in the Python file,

import numpy as np# Activation
def RelU_forward(inputs):
    return np.maximum(0, inputs, dtype=np.float64)def RelU_backward(inputs):
    return np.array(inputs > 0, dtype=np.float64)# Cost
def cost_forward(ypred, ytrue):
    return 1 / 2 * (np.sum(np.square(ytrue - ypred)))def cost_backward(ypred, ytrue):
    return ypred – ytrue

After that, we will define our data. It will only be a simple linear line since we just want to simulate the network (you can try a quadratic line too if you want).

We make a new folder named “fnn.py” and then start by initializing some variables and parameters such that in our plan. Remember to put numpy seed to make the result consistent over time.

import numpy as np
from lib import *x = np.array([[1, 2, 3, 4],[5, 8, 2, 6]])
y = np.array([14, 25, 9, 23])  # 2x1 + 3x2 - 3
number_of_samples = len(x[0])
np.random.seed(0)z = []
a = []
w = []
b = []neurons = [2, 3, 2, 1]
number_of_layers = len(neurons) - 1
lr = 1e-4  # 0.0001 = 10^-4
epochs = 10000

Forward Propagation

Now, we have ready to jump into the forward propagation part. If you have learned linear algebra, this part should be easy for you. Although most illustrations only take one input sample at a time, we will not. Since we only have 4 samples, it should be safe to feed them all to the network at once. Before that, I want to give you my sketch of the shape analysis I have done.

Don’t worry if you don’t get it. I just want to attach that here if any of you are curious. If you are more convenient to learn by printing the variables or creating breakpoints, you are welcome to do so.

Okay, now we will continue our code. In each layer, we will initialize our weights and biases first, then we compute z and a. Initialization technique can be random — it’s okay since we will update them in backpropagation. Here’s the code.

# FIRST HIDDEN LAYER
w.append(np.random.randn(neurons[1], neurons[0]))
b.append(np.ones((neurons[1], 1)))
z.append(w[0] @ x + b[0])
a.append(RelU_forward(z[0]))assert a[0].shape == (neurons[1], number_of_samples)# SECOND HIDDEN LAYER
w.append(np.random.randn(neurons[2], neurons[1]))
b.append(np.ones((neurons[2], 1)))
z.append(w[1] @ a[0] + b[1])
a.append(RelU_forward(z[1]))assert a[1].shape == (neurons[2], number_of_samples)# OUTPUT LAYER
w.append(np.random.randn(neurons[3], neurons[2]))
b.append(np.ones((neurons[3], 1)))
z.append(w[2] @ a[1] + b[2])
a.append(RelU_forward(z[2]))assert a[2].shape == (neurons[3], number_of_samples)

The assert lines do not really matter since they just serve as a checker. You can erase them if you want.

Yay, congratulations, you have done half epoch. Let’s move to a more challenging process: backward propagation. I believe you can do it too!

Backward Propagation

If you recall, I have given you 4 important equations in the first article. Basically, we will just implement those here. Let’s start from the last layer using this equation.

The corresponding code is

# OUTPUT LAYER
delta = cost_backward(a[2][0], y) * RelU_backward(z[2])

Since we have the delta, we will also update our weights and bias of the last layer. This means we are also implementing gradient descent using these equations. The derivatives are

and we update using

Respectively, the code is

w[2] = w[2] — lr * (delta @ a[1].T)
b[2] = b[2] — lr * (delta)

After that, we will move one step backward, updating our second hidden layer. To compute the layer’s delta, we will use this equation this time.

Be cautious on the transpose in the weight term. This time, we will need them, otherwise, we will have shape dimension conflict. Here’s the code for computing error and optimization for the second hidden layer.

# SECOND HIDDEN LAYER
delta = (w[2].T @ delta) * RelU_backward(z[1])
w[1] = w[1] - lr * (delta @ a[0].T)
b[1] = b[1] - lr * (delta)

The same will be applied to the first hidden layer. But, the difference is in the activation of the (l-1)th layer (see dC/dW). You will use the input samples x (when you are updating weights) since you don’t have a[-1]. Thus, the code is

#  FIRST HIDDEN LAYER
delta = (w[1].T @ delta) * RelU_backward(z[0])
w[0] = w[0] - lr * (delta @ x.T)
b[0] = b[0] - lr * (delta)

Kudos, you just finished one epoch! You deserve yourself a pat on the back. I’m proud of you! Now your networks are a little bit better than last time. If you want to see the progression, you can print the loss of the network epoch after epoch. Oh, I have not told you how to loop the epoch, haven’t I? Here’s it.

for i in range(1, epochs):
    # Forward propagation
    for j in range(number_of_layers):
        if j == 0:
            z[j] = w[j] @ x + b[j]
        else:
            z[j] = w[j] @ a[j - 1] + b[j]        a[j] = RelU_forward(z[j])     print(f"Epoch ({i + 1}/{epochs}), loss = {cost_forward(a[2], y)}")    # Backward propagation
    for j in range(number_of_layers - 1, -1, -1):
        if j == number_of_layers - 1:
            delta = cost_backward(a[j][0], y) * RelU_backward(z[j])
        else:
            delta = (w[j + 1].T @ delta) * RelU_backward(z[j])        if j == 0:
            w[j] = w[j] - lr * (delta @ x.T)
        else:
            w[j] = w[j] - lr * (delta @ a[j - 1].T)        b[j] = b[j] - lr * (delta)

Now your code is fully functional. You should expect this on your console if you run “fnn.py”.

Epoch (10000/10000), loss = 0.00030291977689243294
last y_train predicted (a[2]) = [[14.02074786 24.98681486  8.99915625 23.001178  ]]

It is actually pretty good, huh? Of course, this is not a realistic scenario since we do not split our data to train and test, thereby making our model prone to overfitting. But, that’s okay. The purpose of this tutorial is to give you the mathematical headache behind neural networks, haha!

Code Summary

Here, I will provide you with 3 versions of the code. The redundant one (the one we use in the tutorial above) and the concise one. I have also tried to make it more modular here.

That’s all for this feedforward neural network tutorial. I hope that helps you. As usual, if you see any concept or typing mistake, please don’t hesitate to let me know. If you are still keen to learn, you can try to implement these codes in object-oriented programming, like the one in those famous libraries. Thank you for visiting and happy learning!

Feedforward Neural Network (FNN) Implementation from Scratch Using Python

Written by Maria Khelli

No responses yet