Graph Neural Networks (GNNs) have emerged as the tool of choice for machine learning on graphs and are rapidly growing as the next deep learning frontier. Indeed, as the 2010’s were the time of convolutional neural networks (CNNs) applied to learning with images and time signals, the 2020’s are shaping up to be the time of GNNs for learning on graphs. This is the right time for practitioners and researchers to have the opportunity to learn about GNNs and their use in machine learning on graphs.
In this course, we present GNNs as generalizations of CNNs based on the generalization of convolutions in time and space to convolutions on graphs. The main focus of the course is to teach students how to formulate and solve machine learning problems with GNNs. We place emphasis on showing how the use of a convolutional architecture enables scalability to high-dimensional problems. We also explore three fundamental properties of GNNs: Equivariance to label permutations, stability to deformations, and transferability across dimensions.
Course Description
The course is organized in 4 sets of two lectures. The first set describes machine learning on graphs and provides an introduction to learning parameterizations. The second set introduces graph filters and GNNs. The third set of lectures presents stability properties of graph filters and GNNs. The fourth set of lectures presents graphons, graphon signal processing, and their use in the study of GNNs in the limit of large-scale graphs.
Day 1 (Lectures 1 and 2): Machine Learning on Graphs
We learn that graphs can represent product similarities in recommendation systems, agent interactions in multi-agent robotics, or transceivers in a wireless communication network. Although otherwise disparate, these application domains share the presence of signals associated with nodes (ratings, perception or signal strength) out of which we want to extract some information (ratings of other products, control actions, or transmission opportunities). If data is available, we can formulate empirical risk minimization (ERM) problems to learn these data-to-information maps. However, it is a form of ERM in which a graph plays a central role in describing relationships between signal components, therefore, one in which the graph should be leveraged. Graph Neural Networks (GNNs) are parameterizations of learning problems in general, and ERM problems in particular, that achieve this goal.
In any ERM problem, we are given input-output pairs in a training set and we want to find a function that best approximates the input-output map according to a given risk. This function is later used to estimate the outputs associated with inputs that were not part of the training set. We say that the function has been trained and that we have learned to estimate outputs. This simple statement hides the well-known fact that ERM problems are nonsensical unless we make assumptions on how the function generalizes from the training set to unobserved samples. We can, for instance, assume that the map is linear, or, to be in tune with the times, that the map is a neural network.
If properly trained, the linear map and the GNN will make similar predictions for the samples they have observed. But they will make different predictions for unobserved samples. That is, they will generalize differently. An important empirical observation is that neither the linear transform not the neural network will generalize particularly well unless we are considering problems with a small number of variables. This is something that we could call the fundamental problem of machine learning. How do we devise a method that can handle high-dimensional signals? GNNs and CNNs respond that question in the same way: With the use of convolutional filters.
Day 2 (Lectures 3 and 4): Graph Filters and Graph Neural Networks
We have seen that a characteristic shared by arbitrary linear and fully-connected neural network parameterizations is that they do not scale well with the dimensionality of the input signals. This is best known in the case of images and time signals, where scalable linear processing is based on convolutional filters and scalable nonlinear processing is based on convolutional neural networks (CNNs). The reason for this is the ability of convolutions to exploit the structure of Euclidean space.
In this set of lectures, we will describe graph filters and graph neural networks as analogous of convolutional filters and CNNs, but adapted to the processing of signals supported on graphs. Both of these concepts are simple. A graph filter is a polynomial on a matrix representation of the graph. Out of this definition we build a graph perceptron with the addition of a pointwise nonlinear function to process the output of a graph filter. Graph perceptrons are composed (or layered) to build a multilayer GNN. And individual layers are augmented from single filters to filter banks to build multiple-feature GNNs.
Day 3 (Lectures 5 and 6): Equivariance and Stability to Deformations
The relevant question at this juncture is whether graph filters and GNNs do for signals supported on graphs what convolutional filters and CNNs do for Euclidean data. To wit, do they enable scalable processing of signals supported on graphs? A growing body of empirical work shows that this is true. Consider, for instance, a recommendation system in which we want to use past ratings that customers have given to products to predict future ratings. Collaborative filtering solutions build a graph of product similarities and interpret the ratings of separate customers as signals supported on the product similarity graph. We then use past ratings to construct a training set and learn to fill in the ratings that a given customer would give to products not yet rated. Empirical results do show that graph filters and GNNs work in recommendation systems with large number of products in which linear maps and fully-connected neural networks fail. In fact, it is easy enough to make two empirical observations:
- Observation (O1). Graph filters produce better rating estimates than arbitrary linear parametrizations and GNNs produce better estimates than arbitrary (fully-connected) neural networks.
- Observation (O2). GNNs predict ratings better than graph filters.
Observations (O1)-(O2) support advocacy for the use of GNNs, but they also spark two interesting questions: (Q1) Why do graph filters and GNNs outperform linear transformations and fully-connected neural networks? (Q2) Why do GNNs outperform graph filters? In this set of lectures, we present two analyses that help to answer these questions:
- Equivariance. Graph filters and GNNs are equivariant to permutations of the graph.
- Stability. GNNs provide a better tradeoff between discriminability and stability to graph deformations.
These properties show that GNNs have strong generalization potential. Equivariance to permutations implies that nodes with analogous neighbor sets making analogous observations perform the same operations. Thus, we can learn to, say, fill in the ratings of a product from the ratings of another product in another part of the network if the local structures of the graph are the same. This helps explain why graph filters outperform linear transforms and GNNs outperform fully-connected neural networks [cf. observation (O1)]. Stability to graph deformations affords a much stronger version of this statement. We can learn to generalize across different products if the local neighborhood structures are similar, not necessarily identical. Since GNNs possess better stability than graph filters for the same level of discriminability, this helps explain why GNNs outperform graph filters [cf. observation (O2)]. It is germane to note that analogous of these properties hold for CNNs. They are equivariant to translations and stable to deformations of Euclidean space.
Day 4 (Lectures 7 and 8): Graphon Neural Networks and Transferability at Scale
In this final set of lectures, we focus on a third empirical observation that we can make about GNNs:
- Observation (O3). A GNN that is trained on a graph with a certain number of nodes can be executed in a graph with a larger number of nodes and still produce good rating estimates.
As was the case of (O1) and (O2), Observation (O3) not only advocates for the use of GNNs but also raises an interesting empirical question: (Q3) Why do GNNs transfer to networks with different numbers of nodes? To answer this question, we introduce the theory of graphon signal processing.
A graphon is a bounded function defined on the unit square that can be conceived as the limit of a sequence of graphs whose number of nodes and edges grows up to infinity. This framework provides tools and insights that facilitate the understanding of structures like GNNs when the number of nodes in the graph is large. By means of graphons, we can translate graph signal processing into harmonic analysis on the unit interval allowing us to exploit consolidated tools from functional analysis. This provides indeed a beautiful connection of two notions of Fourier decompositions apparently disconnected.
We will further use the convergence of graph filters to establish the following property:
- Transferability. As graphs converge to a limit object, a graphon, GNN outputs converge to outputs of a corresponding limit object, a graphon neural network.
The convergence of GNNs towards graphon neural networks explains why GNNs can be trained and executed in graphs of different sizes [cf. observation (O3)].
Graph Neural Networks Lab (Optional)
For students that want to further their practical understanding of GNNs, we will offer support to complete a lab on recommendation systems with GNNs. Students will have two weeks after the end of the course to complete this lab assignment. Completion of this lab is optional, but we will hold virtual office hours to support students that choose to complete this lab.
In a recommendation system, we want to predict the ratings that customers would give to a certain product using the product’s rating history and the ratings that these customers have given to similar products. Collaborative filtering solutions build a graph of product similarities using past ratings and consider the ratings of individual customers as graph signals supported on the nodes of the product graph. The underlying assumption is that there exists an underlying set of true ratings or scores, but that we only observe a subset of those scores. The set of scores that are not observed can be estimated from the set of scores that have been observed. This problem can thus be seen as an empirical risk minimization problem where we can learn to fill in rating predictions.
Our goal in this lab is to compare the ability of several learning parametrizations to solve it: (1) An arbitrary linear parametrization. (2) A fully-connected neural network. (3) A graph filter. (4) A GNN. Besides gaining experience in formulating and training learned models, students will also gain an understanding of the value of choosing the learning parameterization properly. In particular, we will observe that: (A) Graph filters and GNNs work better than arbitrary linear transformations and fully-connected neural networks. (B) GNNs work better than graph filters. We will explain (A) and (B) in terms of stability and transferability properties of GNNs.
A solution to this lab using Pytorch is available here. The zip file includes the Jupyter Notebook of the lab and the data file required for execution.
Previous Delivery Experience
This course is based on ESE 5140: Graph Neural Networks which has been offered at the University of Pennsylvania (Penn) three times. The course is popular and well regarded. We fill two sessions capped at 50 students each and we receive average teaching evaluation scores of 3.84/4.00. The course at Penn consists of recorded video lectures and live discussion sessions. Video lectures are publicly available and have accumulated 70 thousand views and 4 thousand hours of watch time over the last 2 years.
Alejandro Ribeiro developed this course in 2020 and taught it in 2020 and 2021. Alejandro Parada-Mayorga and Luana Ruiz participated in the development of materials in 2020 and served as TAs in 2020 and 2021. Charilaos Kanatsoulis, Navid NaderiAlizadeh, and Alejandro Parada-Mayorga are lecturers in 2022.
Alejandro Ribeiro has given six tutorials at different conferences sponsored by the IEEE Signal Processing Society. He delivered three tutorials at ICASSP on Graph Neural Networks (2020), Graph Signal Processing (2017) and Optimal Design of Wireless Networks (2008). All were well attended.
Target Audience
This course is of interest to a broad audience and has a low entry barrier. The latter is true because the course builds GNNs from basic signal processing principles and is thus accessible to any student with knowledge of convolutions and frequency representations. The former is true because the course introduces the theory and practice of GNNs. Students will finish this course with experience in formulating and training learning models with GNNs and will have a good understanding of their fundamental properties.
The audience of the course at Penn is a mix of undergraduate, Master’s, and doctoral students.
Presenters’ Biographies
Alejandro Ribeiro (aribeiro@seas.upenn.edu) received the B.Sc. degree in electrical engineering from the Universidad de la Repu ́blica Oriental del Uruguay, Montevideo, Uruguay, in 1998, and the M.Sc. and Ph.D. degrees in electrical en- gineering from the University of Minnesota, Minneapolis, MN, USA, in 2005 and 2007, respectively. He joined the University of Pennsylvania, Philadelphia, PA, USA, in 2008, where he is currently a Professor of Electrical and Systems Engi- neering. His research interests include applications of statistical signal processing to the study of networks and networked phenomena, structured representations of networked data structures, graph signal processing, network optimization, robot teams, and networked control. Papers coauthored by Dr. Ribeiro received the 2022 IEEE Signal Processing Society Best Paper Award, the 2022 IEEE Brain Initiative Student Paper Award, the 2021 Cambridge Ring Publication of the Year Award, the 2020 IEEE Signal Processing So- ciety Young Author Best Paper Award, the 2014 O. Hugo Schuck best paper award, and paper awards at EUSIPCO 2021, ICASSP 2020, EUSIPCO 2019, CDC 2017, SSP Workshop 2016, SAM Workshop 2016, Asilomar SSC Conference 2015, ACC 2013, ICASSP 2006, and ICASSP 2005. His teaching has been recognized with the 2017 Lindback award for distinguished teaching and the 2012 S. Reid Warren, Jr. Award presented by Penn’s undergraduate student body for outstanding teaching.
Charilaos I. Kanatsoulis (kanac@seas.upenn.edu) is a postdoctoral re- searcher in the Department of Electrical and Systems Engineering at the Uni- versity of Pennsylvania. He received his Diploma in electrical and computer engineering from the National Technical University of Athens, Greece, in 2014, and his Ph.D. degree in electrical and computer engineering from the University of Minnesota, Twin Cities, in 2020. His research studies the interplay between machine learning and signal processing. He is particularly interested in principled convolutional and graph neural network design, tensor and multi-view analysis, representation learning and explainable artificial intelligence.
Navid NaderiAlizadeh (nnaderi@seas.upenn.edu) is a Postdoctoral Re- searcher in the Department of Electrical and Systems Engineering at the Univer- sity of Pennsylvania. He received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 2011, the M.S. degree in electri- cal and computer engineering from Cornell University, Ithaca, NY, USA, in 2014, and the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, CA, USA, in 2016. Navid spent more than four years as a Research Scientist at Intel Labs and HRL Laboratories. His research inter- ests span machine learning, signal processing, and information theory, and their applications for resource allocation in wireless networks. In addition to serving as a TPC member of several IEEE conferences, Navid has served as an Associate Editor for the IEEE Journal on Selected Areas in Communications and as a Guest Editor for the IEEE IoT Magazine.
Alejandro Parada-Mayorga (alejopm@seas.upenn.edu) is a postdoctoral researcher in the Department of Electrical and Systems Engineering at Univer- sity of Pennsylvania. He received his B.Sc. and M.Sc. degrees in electrical engi- neering from Universidad Industrial de Santander, Colombia, in 2009 and 2012, respectively, and his Ph.D. degree in electrical engineering from the University of Delaware, Newark, DE, in 2019. His research focuses on the foundations of infor- mation processing and learning using generalized convolutional signal processing. His research interests include algebraic signal processing, applied representation theory of algebras, geometric deep learning, applied category theory, graph neural networks, and topological signal processing.
Luana Ruiz (ruizl@mit.edu) is a FODSI and METEOR postdoctoral fellow at MIT. She received the Ph.D. degree in electrical engineering from the Uni- versity of Pennsylvania in 2022, and the M.Eng. and B.Eng. double degree in electrical engineering from the E ́cole Sup ́erieure d’Electricit ́e, France, and the University of S ̃ao Paulo, Brazil, in 2017. Luana’s work focuses on large-scale graph information processing and graph neural network architectures. She was awarded an Eiffel Excellence scholarship from the French Ministry for Europe and Foreign Affairs between 2013 and 2015; nominated an iREDEFINE fellow in 2019, a MIT EECS Rising Star in 2021, a Simons Research Fellow in 2022, and a METEOR fellow in 2023; and received best student paper awards at the 27th and 29th European Signal Processing Conferences. She currently serves as a member of the MLSP TC.