# Graph Neural Networks

Graph Neural Networks (GNNs) are information processing architectures for signals supported on graphs. They have been developed and are presented in this course as generalizations of the convolutional neural networks (CNNs) that are used to process signals in time and space. Depending on how much you have heard of neural networks (NNs) and deep learning, this is a sentence that may sound strange. Aren’t CNNs just particular cases of NN? And isn’t the same true of GNNs? In a strict sense they are, but our focus on this course is in large scale problems involving high dimensional signals. In these settings NNs fail to scale. CNNs are the tool for enabling scalable learning for signals in time and space. GNNS are the tool for enabling scalable learning for signals supported on graphs.

## How to use this site (I am a student at Penn)

If you are a student at Penn, the instructors are looking forward to working with you. This award-winning course runs during fall terms. It is open to all graduate students (masters and doctoral). An undergraduate degree in any engineering discipline, computer science, math, statistics, or physics is sufficient preparation for the course. If you know calculus, linear algebra, and can write 100 lines of computer code, you know enough. Undergraduate students that ask for and receive an instructor permit are welcome to participate as well. This is not because you lack prerequisites but because grad level courses are more intense than undergraduate level courses and you have to convince me that you understand what this means. Let us know if you have any questions.

When the course runs, there are 3 different things going on: Recorded video lectures, Lab assignments, and Interactive discussion meetings. In a typical week of class you watch video lectures on Monday and attend interactive meetings on Wednesdays and Fridays. In parallel, you work on lab assignments. Each of which spans two weeks. Check the logistics page for more detailed information.

From time to time we also make blog posts. These are usually prompted by questions that are too long to answer during class. They are not required reading, but they are often interesting. Incidentally, we have written a post with links to the papers that have inspired this course. These papers are part of the work on graph neural networks going on at Alelab.

The catalog code for this course is ESE5140.

## How to use this site (I am not at Penn)

If you are a not a student at Penn, the instructors are honored that you consider the materials worth checking. We wish we had the time to work with you, alas, we do not. However, we think there is quite a lot that you can learn by watching the recorded video lectures and by working on the lab assignments. We have designed the materials with the goal of making them useable in self directed learning. How much we have succeeded at that is for you to decide.

When you check the video lectures you will see that they come with a handout and a script. The handout and the script are designed to be used in conjunction with the videos. The videos are the union of the handout and the script. In addition to video lectures, you will find lab assignments and their solutions accessible through the lab webpage. The labs are designed with a complexity progression in mind. Start from Lab 1 if you have never worked in machine learning. Start from Lab 2 if you have ever encountered GNNs. Labs 3 and onwards are serious applications of GNNs to practical problems.

If you are hardcore and would rather read papers, this post has links to the papers that have inspired this course. These papers are part of the work on graph neural networks going on at Alelab. They are not a comprehensive literature review. You can find a little bit of that in this tutorial article in the signal processing magazine and in this more comprehensive review in the Proceedings of the IEEE.

If there is something you think we could to to help. We are happy to hear suggestions.

## What we learn in this course

In this course we will cover graph convolutional filters and graph filter banks, before moving on to the study of single feature and multiple feature GNNs. We will also cover related architectures such as recurrent GNNs. Particular emphasis will be placed on studying the equivariance to permutation and stability to graph deformations of GNNs. These properties provide a measure of explanation respecting the good performance of GNNs that can be observed empirically. We will also study GNNs in the limit of large numbers of nodes to explain the transferability of GNNs across networks with different number of nodes. To find out more about the papers on which this course is based, refer to the resources in this blog post.

### Machine Learning on Graphs

Graphs can represent product or customer similarities in recommendation systems, agent interactions in multiagent robotics, or transceivers in a wireless communication network. Although otherwise disparate, these application domains share the presence of signals associated with nodes (ratings, perception or signal strength) out of which we want to extract some information (ratings of other products, control actions, or transmission opportunities). If data is available, we can formulate empirical risk minimization (ERM) problems to learn these data-to-information maps. However, it is a form of ERM in which a graph plays a central role in describing relationships between signal components. Therefore, one in which the graph should be leveraged. Graph Neural Networks (GNNs) are parametrizations of learning problems in general and ERM problems in particular that achieve this goal.

In any ERM problem we are given input-output pairs in a training set and we want to find a function that best approximates the input-output map according to a given risk. This function is later used to estimate the outputs associated with inputs that were not part of the training set. We say that the function has been *trained*and that we have *learned* to estimate outputs. This simple statement hides the well known fact that ERM problems are nonsensical unless we make assumptions on how the function *generalizes* from the training set to unobserved samples. We can, for instance, assume that the map is linear, or, to be in tune with the times, that the map is a neural network.

If properly trained, the linear map and the GNN will make similar predictions for the samples they have observed. But they will make different predictions for unobserved samples. That is, they will generalize differently. An important empirical observation is that neither the linear transform not the neural network will generalize particularly well unless we are considering problems with a small number of variables. This is something that we could call the fundamental problem of machine learning. How do we devise a method that can handle large dimensional signals? GNNs and CNNs respond that question in the same way: With the use of convolutional filters.

### Graph Filters and Graphs Neural Networks

We have seen that a characteristic shared by arbitrary linear and fully connected neural network parametrizations is that they do not scale well with the dimensionality of the input signals. This is best known in the case of signals in Euclidean space (time and images) where scalable linear processing is based on *convolutional* filters and scalable nonlinear processing is based on *convolutional* neural networks (CNNs). The reason for this is the ability of convolutions to exploit the *structure* of Euclidean space.

In this course we will describe *graph* filters and *graph* neural networks as analogous of convolutional filters and CNNs, but adapted to the processing of signals supported on graphs. Both of these concepts are simple. A graph filter is a polynomial on a matrix representation of the graph. Out of this definition we build a graph perceptron with the addition of a pointwise nonlinear function to process the output of a graph filter. Graph perceptrons are composed (or layered) to build a multilayer GNN. And individual layers are augmented from single filters to filter banks to build multiple feature GNNs.

### Equivariance, Stability and Transferability

The relevant question at this juncture is whether graph filters and GNNs do for signals supported on graphs what convolutional filters and CNNs do for Euclidean data. To wit, do they enable scalable processing of signals supported on graphs? A growing body of empirical work shows that this is true to some extent — although results are not as impressive as is the case of voice and image processing. As an example that we can use to illustrate the advantages of graph filters and GNNs, consider a recommendation system in which we want to use past ratings that customers have given to products to predict future ratings. Collaborative filtering solutions build a graph of product similarities and interpret the ratings of separate customers as signals supported on the product similarity graph. We then use past ratings to construct a training set and learn to fill in the ratings that a given customer would give to products not yet rated. Empirical results do show that graph filters and GNNs work in recommendation systems with large number of products in which linear maps and fully connected neural networks fail. In fact, it is easy enough to arrive at three empirical observations that motivate this paper:

**Observation (O1).** Graph filters produce better rating estimates than arbitrary linear parametrizations and GNNs produce better estimates than arbitrary (fully connected) neural networks.

**Observation (O2).** GNNs predict ratings better than graph filters.

**Observation (O3).** A GNN that is trained on a graph with a certain number of nodes can be executed in a graph with a larger number of nodes and still produce good rating estimates.

Observations (O1)-(O3) support advocacy for the use of GNNs, at least in recommendation systems. But they also spark three interesting questions: (Q1) Why do graph filters and GNNs outperform linear transformations and fully connected neural networks? (Q2) Why do GNNs outperform graph filters? (Q3) Why do GNNs transfer to networks with different number of nodes? In this paper we present three theoretical analyses that help to answer these questions:

**Equivariance.** Graph filters and GNNs are equivariant to permutations of the graph.

**Stability.** GNNs provide a better tradeoff between discriminability and stability to graph perturbations.

**Transferability.** As graphs converge to a limit object, a graphon, GNN outputs converge to outputs of a corresponding limit object, a graphon neural network.

These properties show that GNNs have strong *generalization* potential. Equivariance to permutations implies that nodes with analogous neighbor sets making analogous observations perform the same operations. Thus, we can learn to, say, fill in the ratings of a product from the ratings of another product in another part of the network if the local structures of the graph are the same. This helps explain why graph filters outperform linear transforms and GNNs outperform fully connected neural networks [cf. observation (O1)]. Stability to graph deformations affords a much stronger version of this statement. We can learn to generalize across different products if the local neighborhood structures are similar, not necessarily identical. Since GNNs possess better stability than graph filters for the same level of discriminability, this helps explain why GNNs outperform graph filters [cf. observation (O2)]. The convergence of GNNs towards graphon neural networks delineated under the transferability heading explains why GNNs can be trained and executed in graphs of different sizes [cf. observation (O3)]. It is germane to note that analogous of these properties hold for CNNs. They are equivariant to translations and stable to deformations of Euclidean space and have well defined continuous time limits. For a more in-depth discussion of GNNs and their properties, refer to the resources in this blog post.