When compared to artificial neural networks, the human brain is a lot more modular. The authors believe that this is because the loss function of ANN regularizers like weight decay are not dependent on the permutations of neurons on each layer.
The cost of connecting 2 biological neurons which are far apart is much more than when they’re closer together. But this is not the case for ANNs.
In order to impose a similar phenomenon to that of brains, the authors propose the following steps:
Number 2 aims to keep neurons that need to communicate as close together as possible
There is a difference between weight layers and neuron layers.
import torch.nn as nn
= nn.Sequential(
model ## ...there can be more layers over here
# neuron layer with 10 neurons
10, 7), ## weight layer
nn.Linear(# neuron layer with 7 neurons
7, 5), ## weight layer
nn.Linear(# neuron layer with 5 neurons
5,3) ## weight layer
nn.Linear(# neuron layer with 3 neurons
)
Weight layers: actual nn.Linear
modules. Where num_inputs
and num_outputs
are the number of neurons incoming and outgoing from a single layer. A model with L
linear layers -> there are L
weight layers
Neuron Layers: The neuron layers are basically the outputs of each layer.
In order to represent distance in between each neuron in each layer. We arrange all neurons in a 2D plane such that x
refers to the neuron index and y
refers to the layer index.
In between the neuron layers, lies a weight layer with weights W
.
This layer contains a matrix of size (num_input_neurons, num_output_neurons)
.
Let us imagine a simple pytorch model:
The weight that connects neuron index 4 of first neuron layer to the neuron index 2 of the 2nd neuron layer is: W[4, 2]
Given 2 neurons n1: (x1, y1)
and n2: (x2, y2)
where (x,y)
is the position of the neurons in 2D space, the distance between them can either be the L1 distance or the L2 Norm.
abs(x1 - x2) + abs(y1 - y2)
(x1 - x2)**2 + (y1 - y2)**2
WIP