Deep Neural Networks

The emergence of ChatGPT has been one of the most transformative advancements in recent years, reshaping the landscape of artificial intelligence. Its impact has been profound, inspiring a new generation of engineers, including myself. Motivated by this groundbreaking technology, I’ve decided to write a series of blogs to help engineers dive deeper into the world of AI. Given that ChatGPT is powered by a deep neural network, specifically a transformer model, it’s fitting to begin this series with the fundamentals of deep neural networks. In this blog, I’ll explore these basics using a publicly available dataset and TensorFlow. I’ve also shared my code on GitHub.. For those seeking a more formal course, I highly recommend MIT’s publicly available course, which is among the best in the field.

Perceptron

Perceptron was introduced by Frank Rosenblatt in 1957. The perceptron is one of the simplest types of artificial neural networks and serves as a fundamental building block for more complex models. Here’s a detailed breakdown of its components:

Input Layer

The input layer consists of one or more neurons, each of which receives input signals either from the external world or from previous layers in the network. These inputs are the features of your data.

Weights

Each input neuron is associated with a weight, a parameter that signifies the importance or strength of the connection between that input neuron and the perceptron’s output neuron.

Bias

A bias term is added to the perceptron to provide additional flexibility. It acts as a threshold that adjusts the weighted sum of the inputs before it is passed through the activation function. This helps the perceptron to model more complex patterns in the input data, as it can shift the activation function left or right.

Weighted Sum

The perceptron computes a weighted sum of the inputs plus the bias. This sum is calculated as follows:

z=w1​⋅x1​+w2​⋅x2​+…+wn​⋅xn​+b

Where b is the bias, w’s are the weights and x’s are the inputs

Activation Function

The activation function determines the output of the perceptron based on the weighted sum of the inputs plus the bias. It introduces non-linearity into the model, enabling it to learn more complex patterns. Common activation functions are Sigmoid Function and ReLU (Rectified Linear Unit) Function.

Output

The output layer consists of the final neuron that generates the output of the perceptron.

Multi layer perceptron 

Building on the basic understanding of a perceptron, we can now delve into the multi-layer perceptron (MLP), a more complex and powerful neural network architecture. An MLP consists of multiple layers of perceptrons and can solve more complex problems by learning higher-level representations of the input data.

An MLP typically consists of three types of layers:

  • Input Layer
  • Hidden Layers
  • Output Layer

1. Input Layer

The input layer consists of neurons that receive the input features from the dataset. Each neuron in the input layer represents one feature of the input data.

2. Hidden Layers

Hidden layers are the core of MLPs, where multiple layers of neurons (perceptrons) are stacked. Each neuron in a hidden layer performs a weighted sum of its inputs, adds a bias, and applies an activation function to produce its output. These outputs are then fed to the next layer.

Key Aspects of Hidden Layers:
  • Non-Linearity: The activation functions in hidden layers introduce non-linearity, enabling the MLP to learn complex patterns.
  • Depth: The depth (number of hidden layers) and the width (number of neurons in each hidden layer) determine the model’s capacity to learn intricate data representations.
  • Activation Functions: Common activation functions used in hidden layers include ReLU, sigmoid, and tanh.

3. Output Layer

The output layer consists of neurons that produce the final output of the network. The number of neurons in the output layer depends on the nature of the problem being solved.

  • Regression: For regression tasks, the output layer typically has a single neuron with a linear activation function.
  • Binary Classification: For binary classification, the output layer usually has one neuron with a sigmoid activation function.
  • Multi-Class Classification: For multi-class classification, the output layer has as many neurons as there are classes, typically with a softmax activation function to produce probability distributions over classes.

Deep neural network explained

Building on the concept of a multi-layer perceptron (MLP), we can now delve into the world of deep neural networks (DNNs). A DNN is essentially an MLP with a greater number of hidden layers, which allows it to model complex relationships and learn hierarchical representations of data. Hence like MLP, DNN also consists of three types of layers: Input Layer, Hidden Layers and Output Layer.

Deep neural networks training

Before we get into the example, let’s understand the steps of training deep neural networks:

  1. Data Preparation: Clean the data by handling missing values, normalizing features, and performing any necessary transformations. Divide the dataset into training, validation, and test sets to evaluate the model’s performance. 
  2. Forward Propagation: Forward propagation is the process where input data is passed through the network to generate predictions.
  3. Loss Calculation: The loss function quantifies the difference between the predicted output and the actual target values. It guides the optimization process. Common Loss Functions are mean squared error (used for regression tasks) and binary cross-entropy (used for binary classification tasks).
  4. Backpropagation: Backpropagation is the process of computing the gradients of the loss function with respect to each weight in the network. These gradients indicate the direction and magnitude of weight updates needed to minimize the loss. Gradient Calculation: The error is propagated backward from the output layer to the input layer, and gradients are computed for each weight and bias.
  5. Optimization: algorithms use the gradients computed during backpropagation to update the weights and biases of the network, aiming to minimize the loss function. Gradient Descent is the most common optimization algorithm, which updates weights in the direction of the negative gradient of the loss function.
  6. Iteration and Convergence: Training a DNN involves iteratively performing forward propagation, loss calculation, backpropagation, and optimization until the model converges to a solution with minimal loss.

Let us walk through an example

Github link

Prepare the data

column_names = ['school','sex','age','address','famsize','Pstatus','Medu','Fedu',
                'Mjob','Fjob','reason','guardian','traveltime','studytime','failures',
                'schoolsup','famsup','paid','activities','nursery','higher','internet',
                'romantic','famrel','freetime','goout','Dalc','Walc','health','absences','G1','G2','G3']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=';', skipinitialspace=True, skiprows=1)

raw_dataset["sex"] = np.where(raw_dataset["sex"] == "F", 0, 1)
raw_dataset["school"] = np.where(raw_dataset["school"] == "GP", 0, 1)
raw_dataset["address"] = np.where(raw_dataset["address"] == "R", 0, 1)
raw_dataset["famsize"] = np.where(raw_dataset["famsize"] == "GT3", 0, 1)
raw_dataset["Pstatus"] = np.where(raw_dataset["Pstatus"] == "A", 0, 1)
raw_dataset["schoolsup"] = np.where(raw_dataset["schoolsup"] == "yes", 0, 1)
raw_dataset["famsup"] = np.where(raw_dataset["famsup"] == "yes", 0, 1)

Divide the dataset into training, validation, and test sets to evaluate the model’s performance.

train_dataset = raw_dataset.sample(frac=0.8, random_state=0)
test_dataset = raw_dataset.drop(train_dataset.index)
train_features = train_dataset.copy()
test_features = test_dataset.copy()
train_labels = train_features.pop('G1')
test_labels = test_features.pop('G1')

Build a deep neural network

def build_and_compile_model(norm):
  model = keras.Sequential([
      norm,
      layers.Dense(64, activation='relu'),
      layers.Dense(32, activation='relu'),
      layers.Dense(1)
  ])

See trainable and non trainable parameters

Calculate loss, optimize and train
Tensor flow libraries make this super easy for us.

model.compile(loss='mean_absolute_error', optimizer=tf.keras.optimizers.Adam(0.001))

train_features_tf = tf.convert_to_tensor(train_features, dtype=tf.float32)
normalizer.adapt(train_features_tf)

dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()

Finally look at the results, it doesn’t seem all that bad 🙂 

test_predictions = dnn_model.predict(test_features_tf).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 30]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)

Conclusion

Deep neural networks represent a transformative technology with vast potential across various domains. By grasping the fundamentals and staying abreast of new developments, practitioners and enthusiasts can effectively harness the capabilities of deep neural networks to solve complex problems and drive technological progress.

References

MIT course: link

Diagram from: link

Leave a comment