We can define convolutions for time signals, convolutions for images, and convolutions on graphs. The latter is the one that is of interest to us and is defined as
\begin{align}\label{eqn_conv_gsp}
\bby = \sum_{k} \bbS^k\bbx \, ,
\end{align}
where $\bbx$ is an input signal, $\bby$ is an output signal, the $h_k$ are filter coefficients, and $\bbS$ is a matrix representation of the graph. In this post, we explore connections between \eqref{eqn_conv_gsp} and the usual definitions of convolutions for time signals and images.
Convolutions in Time and on Graphs
The convolution in time, as usually defined in signal processing courses, is equivalent to the graph convolution when particularized to a line graph. To see that this is true, let $\bbx=[x_0; \ldots; x_{N-1}]$ be an input signal and $\bby= [y_0; \ldots; y_{N-1}]$ the output of a convolutional filter with coefficients $h_k$ applied to input $\bbx$. As per the usual definition of convolution, the output signal $\bby$ is the one with entries
\begin{align}\label{eqn_conv_usual}
y_n = \sum_{k} h_k x_{n-k} \, ,
\end{align}
where we adopt the convention $x_{n-k}=0$ if the index $n-k$ is outside the range of valid indexes $\{0,N-1\}$.
An alternative way of writing \eqref{eqn_conv_usual} is to define a time shift operator $\ccalS$. This is a linear operator that when applied to signal $\bbx$ produces the signal $\ccalS\bbx$ with entries
\begin{align}\label{eqn_shift_time}
(\ccalS\bbx)_{n} = x_{n-1} \, .
\end{align}
Adopting the convention that $\ccalS^k$ denotes composition of the operator $\ccalS$, we can write $x_{n-k} = (\ccalS^k\bbx)_n$. Substituting this fact into
\eqref{eqn_conv_usual} yields
\begin{align}\label{eqn_conv_usual_operator_pointwise}
y_n = \sum_{k} (\ccalS^k\bbx)_n \, .
\end{align}
The beauty of \eqref{eqn_conv_usual_operator_pointwise} is that it is an elementwise relationship. The $n$th component of the vector $\bby$ is related to the $n$th component of the vector $\ccalS^k\bbx$. We can therefore just write
\begin{align}\label{eqn_conv_usual_operator}
\bby = \sum_{k} \ccalS^k\bbx \,.
\end{align}
To go from \eqref{eqn_conv_usual_operator} to the definition of convolutions on graphs, we just need to instantiate the operator $\ccalS$ as the adjacency matrix $\bbS$ of a directed line graph. That is, we just have to note that if we multiply $\bbx$ with the adjacency matrix of a line graph we produce a signal with components $[\bbS\bbx]_{n} = x_{n-1}$. Thus, we can equivalently write \eqref{eqn_conv_usual_operator} as
\begin{align}\label{eqn_conv_gsp_time}
\bby = \sum_{k} \bbS^k\bbx \,,
\end{align}
which is the definition of a graph convolution.
All of this is a little pedantic. It is, for the most part, a matter of definitions. At the end of the day, the only point that matters to show the equivalency of \eqref{eqn_conv_usual} and \eqref{eqn_conv_gsp_time} is that the product $\bbS\bbx$ is such that $[\bbS\bbx]_{n} = x_{n-1}$ when $\bbS$ is the adjacency matrix of a line graph.
For those of you that know representation theory, this pedantry should sound interesting. It tells you that graph signal processing is deeply related to representation theory. The expressions in \eqref{eqn_conv_usual_operator} and \eqref{eqn_conv_gsp_time} are different instantiations of the algebra of polynomials on the vector space of signals.
Convolutions on Images and Graphs
The convolution on images, as usually defined, is {\it not} equivalent to the graph convolution particularized to a grid graph. There is an interesting connection, however. Let $\bbx$ be an input image with pixels $x_{mn}$ and $\bby$ be the output of a convolution whose pixels we denote as $y_{mn}$. An image convolution requires filter coefficients $h_{kl}$ with two indexes and is given by
\begin{align}\label{eqn_conv_usual_images}
y_{mn} = \sum_{k,l} h_{kl} x_{(m-k)(n-l)} \,,
\end{align}
where we adopt the convention $x_{(m-k)(n-l)}=0$ if either the index $m-k$ or the index $n-l$ is outside the range of valid indexes $\{0,N-1\}$. This is straightforward generalization of \eqref{eqn_conv_usual} to two dimensions. Given that we can move in the vertical and horizontal directions, we incorporate shifting in the vertical and horizontal direction.
It is ready to see that we can repeat the steps in \eqref{eqn_shift_time}-\eqref{eqn_conv_usual_operator} if we define two shift operators instead of one. To make this clear let $\ccalQ$ be a vertical shift operator and $\ccalR$ a horizontal shift operator. These operator produce images whose pixels are shifted in the respective directions
\begin{align}\label{eqn_shift_operators_images}
(\ccalQ\bbx)_{mn} = x_{(m-1)n} \, , \qquad
(\ccalR\bbx)_{mn} = x_{m(n-1)} \, .
\end{align}
We can now rewrite shifting using these operators and conclude that the image convolution in \eqref{eqn_conv_usual_images} is equivalent to
\begin{align}\label{eqn_conv_images_operators}
\bby = \sum_{k,l} h_{kl} \ccalQ^k \ccalR^l \bbx \, .
\end{align}
This form of the image convolution is not much different from \eqref{eqn_conv_usual_operator_pointwise}. The difference is that \eqref{eqn_conv_usual_operator_pointwise} is a polynomial of a single variable, the operator $\ccalS$, whereas \eqref{eqn_conv_images_operators} is a polynomial on two variables, the operators $\ccalQ$ and $\ccalR$. This is as it should be, since we have two different ways of doing shifting on images.
The challenge in drawing a connection to \eqref{eqn_conv_gsp} is that we cannot pigeonhole the 2-variable polynomial in \eqref{eqn_conv_images_operators} into the 1-variable polynomial in \eqref{eqn_conv_gsp}. Thus, we cannot write arbitrary image convolutions as convolutions on graphs. This can be done if we restrict the filter coefficients $h_{kl}$ in \eqref{eqn_conv_usual_images} to have some specific form. In whch case we will end up with a convolution on a grid graph. But this is besides the point here.
The interesting point we can make is that our three definitions of convolutions, namely, the graph convolution in \eqref{eqn_conv_gsp}, the time convolution in \eqref{eqn_conv_usual_operator} and the image convolution in \eqref{eqn_conv_images_operators} are all polynomials on linear operators. This suggests that there is a more abstract formulation of convolutional filters that encompasses the three of them. This is true and it is called an Algebraic filter. For those of you versant in representation theory, convolutions in time and convolutions on graphs are generated by the algebra of single variable polynomials. Convolutions on images are generated by the algebra of two variable polynomials. The incompatibility of image convolutions with graph convolutions stems from the fact that the algebra of two variable polynomials contains two generators (that we map to $\ccalQ$ and $\ccalR$), whereas the algebra of single variable polynomials contains one generator (that we map to $\bbS$ or $\ccalS$ depending on the application).
We may get to talk about Algebraic convolutions and Algebraic neural networks at the end of the term. They are an interesting abstraction that encompasses some other architectures. They’re practical applicability, beyond times, graphs, and groups has not been explored much. Check “Algebraic Neural Networks: Stability to Deformations” to read about this.