Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach

グラフニューラルネットワークにおけるノード単位フィルタリング：Mixture of Expertsアプローチ

Haoyu Han, Juanhui Li, Wei Huang, Xianfeng Tang, Hanqing Lu, Chen Luo, Hui Liu, Jiliang Tang

Abstract

概要

Graph Neural Networks (GNNs) have proven to be highly effective for node classification tasks across diverse graph structural patterns.
グラフニューラルネットワーク（GNNs）は、多様なグラフ構造パターンにおけるノード分類タスクに対して非常に効果的であることが証明されている。

Traditionally, GNNs employ a uniform global filter—typically a low-pass filter for homophilic graphs and a high-pass filter for heterophilic graphs.
従来、GNNsは均一なグローバルフィルタ——典型的にはホモフィリックグラフにはローパスフィルタ、ヘテロフィリックグラフにはハイパスフィルタ——を採用している。

However, real-world graphs often exhibit a complex mix of homophilic and heterophilic patterns, rendering a single global filter approach suboptimal.
しかし、実世界のグラフはしばしばホモフィリックパターンとヘテロフィリックパターンの複雑な混合を示し、単一のグローバルフィルタアプローチを次善のものにしている。

In this work, we theoretically demonstrate that a global filter optimized for one pattern can adversely affect performance on nodes with differing patterns.
本研究では、一つのパターンに最適化されたグローバルフィルタが、異なるパターンを持つノードの性能に悪影響を及ぼし得ることを理論的に示す。

To address this, we introduce a novel GNN framework NODE-MOE that utilizes a mixture of experts to adaptively select the appropriate filters for different nodes.
この問題に対処するため、異なるノードに対して適切なフィルタを適応的に選択するためにmixture of expertsを利用する新しいGNNフレームワークNODE-MOEを導入する。

Extensive experiments demonstrate the effectiveness of NODE-MOE on both homophilic and heterophilic graphs.
広範な実験により、ホモフィリックグラフとヘテロフィリックグラフの両方におけるNODE-MOEの有効性が示される。

1 Introduction

1 はじめに

Graph Neural Networks (GNNs) [1, 2] have emerged as powerful tools in representation learning for graph structure data, and have achieved remarkable success on various graph learning tasks [3, 4], especially the node classification task.
グラフニューラルネットワーク（GNNs）[1, 2]は、グラフ構造データの表現学習における強力なツールとして台頭し、様々なグラフ学習タスク[3, 4]、特にノード分類タスクにおいて顕著な成功を収めている。

GNNs usually can be designed and viewed from two domains, i.e., spatial domain and spectral domain.
GNNsは通常、二つの領域、すなわち空間領域とスペクトル領域から設計・考察することができる。

In the spatial domain, GNNs [1, 5, 6] typically follow the message passing mechanism [7], which propagate messages between neighboring nodes.
空間領域において、GNNs [1, 5, 6]は典型的にメッセージパッシング機構[7]に従い、隣接ノード間でメッセージを伝播する。

In the spectral domain, GNNs [8, 9] apply different filters on the graph signals in the spectral domain of the graph Laplacian matrix.
スペクトル領域において、GNNs [8, 9]はグラフラプラシアン行列のスペクトル領域でグラフ信号に異なるフィルタを適用する。

Most GNNs have shown great effectiveness in the node classification task of homophilic graphs [2, 10, 6, 11], where connected nodes tend to share the same labels.
多くのGNNsは、接続されたノードが同じラベルを共有する傾向があるホモフィリックグラフ[2, 10, 6, 11]のノード分類タスクにおいて大きな有効性を示している。

These GNNs usually leverage the low-pass filters, where the smoothed signals are preserved.
これらのGNNsは通常、平滑化された信号が保持されるローパスフィルタを活用する。

However, the heterophilic graphs exhibit the heterophilic patterns, where the connected nodes tend to have different labels.
しかし、ヘテロフィリックグラフはヘテロフィリックパターンを示し、接続されたノードは異なるラベルを持つ傾向がある。

As a result, several GNNs [12–14] designed for heterophilic graphs introduce the high-pass filter to better handle such diversity.
その結果、ヘテロフィリックグラフ向けに設計されたいくつかのGNNs [12–14]は、そのような多様性をより良く扱うためにハイパスフィルタを導入している。

To adapt to both homophilic and heterophilic graphs, GNNs with learnable graph convolution [9, 15–17] can automatically learn different types of filters for different types of graphs.
ホモフィリックグラフとヘテロフィリックグラフの両方に適応するために、学習可能なグラフ畳み込みを持つGNNs [9, 15–17]は、異なるタイプのグラフに対して異なるタイプのフィルタを自動的に学習することができる。

Despite the great success, these GNNs usually apply a uniform global filter across all nodes.
大きな成功にもかかわらず、これらのGNNsは通常、全てのノードに対して均一なグローバルフィルタを適用する。

figure1(p.2)

However, real-world graphs often display a complex interplay of homophilic and heterophilic patterns [18, 19], challenging this one-size-fits-all filtering approach.
しかし、実世界のグラフはしばしばホモフィリックパターンとヘテロフィリックパターンの複雑な相互作用を示し[18, 19]、この画一的なフィルタリングアプローチに挑戦している。

Specifically, while some nodes tend to connect with others that share similar labels, reflecting homophilic patterns, others are more inclined to form connections with nodes that have differing labels, indicative of heterophilic patterns.
具体的には、一部のノードは類似のラベルを共有する他のノードと接続する傾向があり、ホモフィリックパターンを反映する一方で、他のノードは異なるラベルを持つノードとの接続を形成する傾向があり、ヘテロフィリックパターンを示している。

Applying a uniform type of filter, tailored for just one of these patterns, across all nodes may hurt the performance of other patterns.
これらのパターンの一つだけに合わせた均一なタイプのフィルタを全てのノードに適用することは、他のパターンの性能を損なう可能性がある。

To illustrate this, we provide an example as shown in Figure 1(a), where different colors represent distinct node features, and numbers indicate node labels.
これを説明するために、Figure 1(a)に示す例を提供する。ここで異なる色は異なるノード特徴を表し、数字はノードラベルを示す。

The nodes are marked as either solid or dotted circles to denote homophilic and heterophilic patterns, respectively.
ノードはそれぞれホモフィリックパターンとヘテロフィリックパターンを示すために、実線または点線の円として示されている。

Applying a global low-pass filter, such as the adjacency matrix A, uniformly across all nodes results in a scenario where nodes on the left possess the same feature, while those on the right possess another.
隣接行列Aのようなグローバルローパスフィルタを全てのノードに均一に適用すると、左側のノードが同じ特徴を持ち、右側のノードが別の特徴を持つシナリオが生じる。

Therefore, all the left nodes or the right nodes will have the same prediction.
したがって、左側の全てのノードまたは右側の全てのノードは同じ予測を持つことになる。

However, nodes on the left or right don’t share the same label.
しかし、左側または右側のノードは同じラベルを共有していない。

Consequently, this global filtering approach leads to misclassification of nodes.
結果として、このグローバルフィルタリングアプローチはノードの誤分類につながる。

This toy example clearly illustrates the limitations of a one-size-fits-all filtering strategy and motivates the need for a more tailored approach.
このトイ例は画一的なフィルタリング戦略の限界を明確に示し、より個別化されたアプローチの必要性を動機付ける。

To address this, we propose applying different filters to nodes based on their specific structural patterns.
これに対処するため、特定の構造パターンに基づいてノードに異なるフィルタを適用することを提案する。

Figure 1(b) provides an example that we apply a low-pass filter, such as A, to homophilic nodes, and a high-pass filter, such as $-A$, to heterophilic nodes.
Figure 1(b)は、ホモフィリックノードにはAのようなローパスフィルタを、ヘテロフィリックノードには$-A$のようなハイパスフィルタを適用する例を示している。

From the results, nodes in the same class would have the same features.
結果から、同じクラスのノードは同じ特徴を持つことになる。

Therefore, this node-wise filtering approach allows for the perfect classification of all nodes in this example.
したがって、このノード単位フィルタリングアプローチは、この例における全てのノードの完全な分類を可能にする。

Present work. In this work, we observe that nodes in many real-world graphs not only exhibit diverse structural patterns but also that these patterns vary significantly among different communities within the same graph.
本研究. 本研究では、多くの実世界グラフのノードが多様な構造パターンを示すだけでなく、これらのパターンが同一グラフ内の異なるコミュニティ間で大きく異なることを観察する。

Utilizing the CSBM model to generate graphs with mixed structural patterns, we theoretically demonstrate that a global filter optimized for one pattern may incur significant losses for nodes with other patterns, while node-wise filtering can achieve linear separability for all nodes under mild conditions.
混合構造パターンを持つグラフを生成するためにCSBMモデルを利用し、一つのパターンに最適化されたグローバルフィルタが他のパターンを持つノードに対して大きな損失を招く可能性がある一方、ノード単位フィルタリングが緩やかな条件下で全てのノードの線形分離可能性を達成できることを理論的に示す。

Building on these insights, we propose a Node-wise filtering method - NODE-MOE, which leverages a Mixture of Experts framework to adaptively select appropriate filters for different nodes.
これらの知見に基づき、異なるノードに対して適切なフィルタを適応的に選択するためにMixture of Expertsフレームワークを活用するノード単位フィルタリング手法NODE-MOEを提案する。

Extensive experiments validate the effectiveness of the proposed NODE-MOE on both homophilic and heterophilic graphs, illustrating significant performance improvement.
広範な実験により、ホモフィリックグラフとヘテロフィリックグラフの両方における提案手法NODE-MOEの有効性が検証され、顕著な性能向上が示される。

2 Preliminary

2 予備知識

In this section, we explore the structural patterns present in various graph datasets, which usually exhibit mixed homophilic and heterophilic patterns.
本節では、通常ホモフィリックパターンとヘテロフィリックパターンの混合を示す様々なグラフデータセットに存在する構造パターンを探求する。

Then, we theoretically demonstrate that a global filter often fails in graphs characterized by such mixed structural patterns.
次に、そのような混合構造パターンによって特徴づけられるグラフにおいて、グローバルフィルタがしばしば失敗することを理論的に示す。

In contrast, node-wise filtering can achieve linear separability under mild conditions.
対照的に、ノード単位フィルタリングは緩やかな条件下で線形分離可能性を達成できる。

Before we start, we first define the notations used in this paper and background knowledge.
開始する前に、まず本論文で使用する記法と背景知識を定義する。

Notations. We use bold upper-case letters such as $\mathbf{X}$ to denote matrices.
記法. $\mathbf{X}$のような太字大文字を行列の表記に使用する。

$\mathbf{X}i$ denotes its i-th row and $\mathbf{X}{ij}$ indicates the i-th row and j-th column element.
$\mathbf{X}i$はそのi番目の行を表し、$\mathbf{X}{ij}$はi番目の行とj番目の列の要素を示す。

We use bold lower-case letters such as $\mathbf{x}$ to denote vectors.
$\mathbf{x}$のような太字小文字をベクトルの表記に使用する。

Let $G = (V, E)$ be a graph, where $V$ is the node set, $E$ is the edge set, and $	V	= n$.
$G = (V, E)$をグラフとし、$V$はノード集合、$E$はエッジ集合、$	V	= n$とする。

$N_i$ denotes the neighborhood node set for node $v_i$.
$N_i$はノード$v_i$の近傍ノード集合を表す。

The graph can be represented by an adjacency matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$, where $\mathbf{A}{ij} > 0$ indices that there exists an edge between nodes $v_i$ and $v_j$ in $G$, or otherwise $\mathbf{A}{ij} = 0$.
グラフは隣接行列$\mathbf{A} \in \mathbb{R}^{n \times n}$で表現でき、$\mathbf{A}{ij} > 0$は$G$においてノード$v_i$と$v_j$の間にエッジが存在することを示し、そうでなければ$\mathbf{A}{ij} = 0$である。

For a node $v_i$, we use $N(v_i) = {v_j : \mathbf{A}{ij} > 0}$ to denote its neighbors.
ノード$v_i$に対して、$N(v_i) = {v_j : \mathbf{A}{ij} > 0}$をその近傍を表すために使用する。

Let $\mathbf{D} = diag(d_1, d_2, \ldots, d_n)$ be the degree matrix, where $d_i = \sum_j \mathbf{A}{ij}$ is the degree of node $v_i$.
$\mathbf{D} = diag(d_1, d_2, \ldots, d_n)$を次数行列とし、$d_i = \sum_j \mathbf{A}{ij}$はノード$v_i$の次数である。

Furthermore, suppose that each node is associated with a $d$-dimensional feature $\mathbf{x}$ and we use $\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_n]^\top \in \mathbb{R}^{n \times d}$ to denote the feature matrix.
さらに、各ノードが$d$次元の特徴$\mathbf{x}$と関連付けられていると仮定し、$\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_n]^\top \in \mathbb{R}^{n \times d}$を特徴行列の表記に使用する。

Besides, the label matrix is $\mathbf{Y} \in \mathbb{R}^{n \times c}$, where $c$ is the number of classes.
また、ラベル行列は$\mathbf{Y} \in \mathbb{R}^{n \times c}$であり、$c$はクラス数である。

We use $y_u$ to denote the label of node $u$.
$y_u$をノード$u$のラベルの表記に使用する。

Graph Laplacian. The graph Laplacian matrix is defined as $\mathbf{L} = \mathbf{D} - \mathbf{A}$.
グラフラプラシアン. グラフラプラシアン行列は$\mathbf{L} = \mathbf{D} - \mathbf{A}$として定義される。

We define the normalized adjacency matrix as $\tilde{\mathbf{A}} = \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}$ and the normalized Laplacian matrix as $\tilde{\mathbf{L}} = \mathbf{I} - \tilde{\mathbf{A}}$.
正規化隣接行列を$\tilde{\mathbf{A}} = \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}}$、正規化ラプラシアン行列を$\tilde{\mathbf{L}} = \mathbf{I} - \tilde{\mathbf{A}}$と定義する。

Its eigendecomposition can be represented by $\tilde{\mathbf{L}} = \mathbf{U} \mathbf{\Lambda} \mathbf{U}^\top$, where the $\mathbf{U} \in \mathbb{R}^{n \times n}$ is the eigenvector matrix and $\mathbf{\Lambda} = diag(\lambda_1, \lambda_2, \ldots, \lambda_n)$ is the eigenvalue matrix.
その固有分解は$\tilde{\mathbf{L}} = \mathbf{U} \mathbf{\Lambda} \mathbf{U}^\top$と表現でき、$\mathbf{U} \in \mathbb{R}^{n \times n}$は固有ベクトル行列、$\mathbf{\Lambda} = diag(\lambda_1, \lambda_2, \ldots, \lambda_n)$は固有値行列である。

Specifically, $0 \leq \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n < 2$.
具体的には、$0 \leq \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n < 2$である。

The filtered signals can be represented by $\hat{\mathbf{X}} = \mathbf{U} f(\mathbf{\Lambda}) \mathbf{U}^\top \mathbf{X}$, where $f$ is the filter function.
フィルタリングされた信号は$\hat{\mathbf{X}} = \mathbf{U} f(\mathbf{\Lambda}) \mathbf{U}^\top \mathbf{X}$と表現でき、$f$はフィルタ関数である。

As a result, the graph convolution $\tilde{\mathbf{A}} \mathbf{X}$ can be viewed as a low-pass filter, with the filter $f(\lambda_i) = 1 - \lambda_i$.
その結果、グラフ畳み込み$\tilde{\mathbf{A}} \mathbf{X}$はフィルタ$f(\lambda_i) = 1 - \lambda_i$を持つローパスフィルタと見なすことができる。

Similarly, the graph convolution $-\tilde{\mathbf{A}} \mathbf{X}$ is a high-pass filter with filter $f(\lambda_i) = \lambda_i - 1$.
同様に、グラフ畳み込み$-\tilde{\mathbf{A}} \mathbf{X}$はフィルタ$f(\lambda_i) = \lambda_i - 1$を持つハイパスフィルタである。

Homophily metrics. Homophily metrics in graphs measure the tendency of edges to connect nodes with similar labels [20].
ホモフィリー指標. グラフにおけるホモフィリー指標は、類似のラベルを持つノードを接続するエッジの傾向を測定する[20]。

There are several commonly used homophily metrics, such as edge homophily [21], node homophily [22], and class homophily [23].
エッジホモフィリー[21]、ノードホモフィリー[22]、クラスホモフィリー[23]など、いくつかの一般的に使用されるホモフィリー指標がある。

In this paper, we adopt the node homophily $H(\mathcal{G}) = \frac{1}{	V	} \sum_{v_i \in V} h(v_i)$, where $h(v_i) = \frac{	{u \in N(v_i) : y_u = y_v}	}{d_i}$ measures the label similarity between node $v_i$ with its neighbors.
本論文では、ノードホモフィリー$H(\mathcal{G}) = \frac{1}{	V	} \sum_{v_i \in V} h(v_i)$を採用する。ここで$h(v_i) = \frac{	{u \in N(v_i) : y_u = y_v}	}{d_i}$はノード$v_i$とその近傍間のラベル類似度を測定する。

A node with higher $h(v)$ exhibits a homophilic pattern while a low $h(v)$ indicates a heterophilic pattern.
高い$h(v)$を持つノードはホモフィリックパターンを示し、低い$h(v)$はヘテロフィリックパターンを示す。

2.1 Structural Patterns in Existing Graphs

2.1 既存グラフにおける構造パターン

In this subsection, we examine the structural patterns present in existing graph datasets.
本小節では、既存のグラフデータセットに存在する構造パターンを調べる。

Specifically, we select two widely used homophilic datasets, i.e., Cora and CiteSeer [24], and two heterophilic datasets, i.e., chameleon and squirrel [25].
具体的には、広く使用されている二つのホモフィリックデータセット、すなわちCoraとCiteSeer [24]、および二つのヘテロフィリックデータセット、すなわちchameleonとsquirrel [25]を選択する。

We first calculate the homophily distribution for all nodes in the graph.
まず、グラフ内の全てのノードに対するホモフィリー分布を計算する。

figure2(p.3)

figure3(p.3)

As shown in Figure 2, while the majority of nodes in homophilic graphs predominantly exhibit homophilic patterns, and those in heterophilic graphs display heterophilic patterns, exceptions are evident.
Figure 2に示されるように、ホモフィリックグラフのノードの大部分は主にホモフィリックパターンを示し、ヘテロフィリックグラフのノードはヘテロフィリックパターンを示すが、例外は明らかである。

Notably, some nodes in homophilic graphs show heterophilic tendencies, and conversely, some nodes in heterophilic graphs demonstrate homophilic patterns.
注目すべきことに、ホモフィリックグラフの一部のノードはヘテロフィリック傾向を示し、逆にヘテロフィリックグラフの一部のノードはホモフィリックパターンを示す。

Consequently, all these graphs exhibit a mixture of homophilic and heterophilic patterns, which aligns with the findings in the previous works [18, 19].
結果として、これら全てのグラフはホモフィリックパターンとヘテロフィリックパターンの混合を示しており、これは先行研究[18, 19]の知見と一致する。

We further analyze the position of nodes with different structural patterns within the graphs.
さらに、グラフ内の異なる構造パターンを持つノードの位置を分析する。

To do this, we divide each graph into several subgraphs using community detection algorithms [26].
これを行うために、コミュニティ検出アルゴリズム[26]を用いて各グラフを複数のサブグラフに分割する。

We focus on the largest 10 communities and calculate the homophily level for each subgraph.
最大の10コミュニティに注目し、各サブグラフのホモフィリーレベルを計算する。

The results, as shown in Figure 3, reveal significant variations in homophily across different communities.
Figure 3に示される結果は、異なるコミュニティ間でホモフィリーに大きな変動があることを明らかにする。

For instance, in the Cora dataset, homophily levels in some communities approach 1, indicating strong homophily, while in some communities it drops below 0.5.
例えば、Coraデータセットでは、一部のコミュニティのホモフィリーレベルが1に近づき、強いホモフィリーを示す一方、一部のコミュニティでは0.5を下回る。

Similarly, in the chameleon dataset, the lowest homophily levels are near 0, with the highest reaching above 0.6.
同様に、chameleonデータセットでは、最も低いホモフィリーレベルは0に近く、最も高いものは0.6を超える。

These findings highlight the considerable diversity in node interaction patterns, even within the same graph, underscoring the complexity of graph structures in real-world datasets.
これらの知見は、同一グラフ内であってもノード相互作用パターンにかなりの多様性があることを強調し、実世界データセットにおけるグラフ構造の複雑さを浮き彫りにする。

The variability in homophily levels clearly illustrates that nodes in various parts of the graph may require distinct processing approaches.
ホモフィリーレベルの変動性は、グラフの様々な部分のノードが異なる処理アプローチを必要とする可能性があることを明確に示している。

Therefore, applying the same global filter to all nodes may lead to suboptimal performance.
したがって、全てのノードに同じグローバルフィルタを適用することは次善の性能につながる可能性がある。

2.2 Analysis based on CSBM model

2.2 CSBMモデルに基づく分析

To further illustrate why applying a global filter may result in suboptimal performance, we utilize the Contextual Stochastic Block Model (CSBM) [27], which has been widely applied to graph analysis [28, 29], such as analyzing the behavior of GNNs [30, 11, 31].
グローバルフィルタの適用がなぜ次善の性能をもたらす可能性があるかをさらに説明するために、グラフ分析[28, 29]、例えばGNNsの挙動分析[30, 11, 31]に広く適用されているContextual Stochastic Block Model (CSBM) [27]を利用する。

The CSBM is a generative model, which is often used to generate graph structures and node features.
CSBMは生成モデルであり、グラフ構造とノード特徴の生成によく使用される。

Typically, CSBMs are based on the assumption that graphs are generated following a uniform patter, such as nodes with the same label are connected with probability $p$ while nodes with different labels are connected with probability $q$ [31].
典型的には、CSBMsは、同じラベルを持つノードが確率$p$で接続され、異なるラベルを持つノードが確率$q$で接続されるといった、均一なパターンに従ってグラフが生成されるという仮定に基づいている[31]。

However, the real-world complexity of graphs features a mixture of homophilic and heterophilic patterns, as illustrated in section 2.1.
しかし、セクション2.1で示したように、実世界のグラフの複雑さはホモフィリックパターンとヘテロフィリックパターンの混合を特徴としている。

We adapt the CSBM by mixing two CSBMs to generate one graph, mirroring the approach [19].
アプローチ[19]に倣い、二つのCSBMsを混合して一つのグラフを生成するようにCSBMを適応させる。

Definition 1. $CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$.
定義1. $CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$.

The generated nodes consist of two classes, $C_0 = {i \in [n] : y_i = 0}$ and $C_1 = {j \in [n] : y_j = 1}$.
生成されたノードは二つのクラス$C_0 = {i \in [n] : y_i = 0}$と$C_1 = {j \in [n] : y_j = 1}$から構成される。

For each node, consider $\mathbf{X} \in \mathbb{R}^{n \times d}$ to be the feature matrix such that each row $\mathbf{X}_i$ is an independent $d$-dimensional Gaussian random vectors with $\mathbf{X}_i \sim N(\boldsymbol{\mu}, \frac{1}{d}\mathbf{I})$ if $i \in C_0$ and $\mathbf{X}_j \sim N(\boldsymbol{\nu}, \frac{1}{d}\mathbf{I})$ if $j \in C_1$.
各ノードに対して、$\mathbf{X} \in \mathbb{R}^{n \times d}$を特徴行列とし、各行$\mathbf{X}_i$は独立な$d$次元ガウス確率ベクトルであり、$i \in C_0$の場合$\mathbf{X}_i \sim N(\boldsymbol{\mu}, \frac{1}{d}\mathbf{I})$、$j \in C_1$の場合$\mathbf{X}_j \sim N(\boldsymbol{\nu}, \frac{1}{d}\mathbf{I})$とする。

Here $\boldsymbol{\mu}, \boldsymbol{\nu}$ are the fixed class mean vector with $|\boldsymbol{\mu}|_2, |\boldsymbol{\nu}|_2 \leq 1$ and $\mathbf{I}$ is the identity matrix.
ここで$\boldsymbol{\mu}, \boldsymbol{\nu}$は$|\boldsymbol{\mu}|_2, |\boldsymbol{\nu}|_2 \leq 1$を満たす固定されたクラス平均ベクトルであり、$\mathbf{I}$は単位行列である。

Suppose there are two patterns of nodes in the adjacency matrix $\mathbf{A} = (a_{ij})$, i.e., the homophilic pattern: $H_0 = {i \in [n] : a_{ij} = \text{Ber}(p_0) \text{ if } y_i = y_j \text{ and } a_{ij} = \text{Ber}(q_0) \text{ if } y_i \neq y_j, p_0 > q_0}$ and the heterophilic pattern: $H_1 = {i \in [n] : a_{ij} = \text{Ber}(p_1) \text{ if } y_i = y_j \text{ and } a_{ij} = \text{Ber}(q_1) \text{ if } y_i \neq y_j, p_1 < q_1}$.
隣接行列$\mathbf{A} = (a_{ij})$にはノードの二つのパターンがあると仮定する。すなわち、ホモフィリックパターン：$H_0 = {i \in [n] : a_{ij} = \text{Ber}(p_0) \text{ if } y_i = y_j \text{ and } a_{ij} = \text{Ber}(q_0) \text{ if } y_i \neq y_j, p_0 > q_0}$、およびヘテロフィリックパターン：$H_1 = {i \in [n] : a_{ij} = \text{Ber}(p_1) \text{ if } y_i = y_j \text{ and } a_{ij} = \text{Ber}(q_1) \text{ if } y_i \neq y_j, p_1 < q_1}$である。

$P$ denotes the probability that a node is in the homophilic pattern.
$P$はノードがホモフィリックパターンに属する確率を表す。

We also assume the nodes follow the same degree distribution $p_0 + q_0 = p_1 + q_1$.
また、ノードは同じ次数分布$p_0 + q_0 = p_1 + q_1$に従うと仮定する。

For simplification, we consider a linear model with parameters $\mathbf{w} \in \mathbb{R}^d$ and $b \in \mathbb{R}$, following the approach [11].
簡略化のため、アプローチ[11]に従い、パラメータ$\mathbf{w} \in \mathbb{R}^d$および$b \in \mathbb{R}$を持つ線形モデルを考える。

The predicted label for nodes is given by $\hat{y} = \sigma(\tilde{\mathbf{X}}\mathbf{w} + b\mathbf{1})$, where $\sigma(x) = (1 + e^{-x})^{-1}$ is the sigmoid function, and $\tilde{\mathbf{X}}$ represents the features after filtering.
ノードの予測ラベルは$\hat{y} = \sigma(\tilde{\mathbf{X}}\mathbf{w} + b\mathbf{1})$で与えられ、$\sigma(x) = (1 + e^{-x})^{-1}$はシグモイド関数、$\tilde{\mathbf{X}}$はフィルタリング後の特徴を表す。

The binary cross-entropy loss over nodes $V$ is formulated as $L(V, \mathbf{w}, b) = -\frac{1}{	V	} \sum_{i \in V} y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)$.
ノード$V$上の二値交差エントロピー損失は$L(V, \mathbf{w}, b) = -\frac{1}{	V	} \sum_{i \in V} y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)$と定式化される。

Theorem 1. Suppose $n$ is relatively large, the graph is not too sparse with $p_i, q_i = \omega(\log^2(n)/n)$ and the feature center distance is not too small with $|\boldsymbol{\mu} - \boldsymbol{\nu}| = \omega\left(\frac{\log n}{\sqrt{dn(p_0 + q_0)}}\right)$ and $|\mathbf{w}| \leq R$.
定理1. $n$が十分大きく、グラフが$p_i, q_i = \omega(\log^2(n)/n)$で疎すぎず、特徴中心間距離が$|\boldsymbol{\mu} - \boldsymbol{\nu}| = \omega\left(\frac{\log n}{\sqrt{dn(p_0 + q_0)}}\right)$で小さすぎず、$|\mathbf{w}| \leq R$であると仮定する。

For the graph $G(V, E, X) \sim CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$, we have the following:
グラフ$G(V, E, X) \sim CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$に対して、以下が成り立つ：

If the low-pass global filter, i.e., $1 - \lambda$, is applied to the whole graph $G$, we can find a optimal $\mathbf{w}^, b^$ that achieve near linear separability for the homophilic node set $H_0$.
ローパスグローバルフィルタ、すなわち$1 - \lambda$がグラフ$G$全体に適用される場合、ホモフィリックノード集合$H_0$に対してほぼ線形分離可能性を達成する最適な$\mathbf{w}^, b^$を見つけることができる。

However, the loss for the heterophilic node set $H_1$ can be relatively large with:
しかし、ヘテロフィリックノード集合$H_1$に対する損失は比較的大きくなりうる：

\[L(H_1, \mathbf{w}^*, b^*) \geq \frac{R(q_1 - p_1)}{2(q_1 + p_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o_d(1)).\]

If different filters are applied to homophilic and heterophilic sets separately, we can find an optimal $\mathbf{w}^, b^$ that all the nodes are linear separable with the probability:
異なるフィルタがホモフィリック集合とヘテロフィリック集合に別々に適用される場合、全てのノードが以下の確率で線形分離可能となる最適な$\mathbf{w}^, b^$を見つけることができる：

\[\mathbb{P}\left(\left(\tilde{\mathbf{X}}_i\right)_{i \in V} \text{ is linearly separable}\right) = 1 - o_d(1).\]

The proof of these results is detailed in Appendix A.
これらの結果の証明はAppendix Aに詳述されている。

Theorem 1 reveals critical insights into the filtering strategies for graphs with mixed homophilic and heterophilic patterns, as generated by the CSBM model.
定理1は、CSBMモデルによって生成されるホモフィリックパターンとヘテロフィリックパターンが混合したグラフに対するフィルタリング戦略に関する重要な知見を明らかにする。

The first part of the theorem illustrates that applying a global low-pass filter can create an optimal classifier for homophilic nodes, achieving near-linear separability.
定理の第一部は、グローバルローパスフィルタの適用がホモフィリックノードに対する最適な分類器を作成し、ほぼ線形分離可能性を達成できることを示している。

However, this classifier may result in a substantial loss for heterophilic nodes, highlighting the limitations of a uniform filtering strategy.
しかし、この分類器はヘテロフィリックノードに対して大きな損失をもたらす可能性があり、均一なフィルタリング戦略の限界を浮き彫りにしている。

Conversely, the second part of the theorem demonstrates that by applying different filters to different patterns of nodes separately, it is possible to achieve linear separability across all nodes.
逆に、定理の第二部は、異なるパターンのノードに別々に異なるフィルタを適用することで、全てのノードにわたる線形分離可能性を達成できることを示している。

These findings strongly motivate the exploration of a node-wise filtering method, which can automatically apply different filters to distinct nodes based on their specific patterns, to improve the overall performance.
これらの知見は、全体的な性能を向上させるために、特定のパターンに基づいて異なるノードに異なるフィルタを自動的に適用できるノード単位フィルタリング手法の探求を強く動機付ける。

3 The Proposed Method

3 提案手法

The investigations presented in Section 2 underscore the complex nature of real-world datasets, revealing a mixture of homophilic and heterophilic patterns within them.
セクション2で提示された調査は、実世界データセットの複雑な性質を強調し、その中にホモフィリックパターンとヘテロフィリックパターンの混合が存在することを明らかにしている。

Additionally, these patterns are not uniformly distributed throughout the graph; rather, the level of homophily varies significantly across different communities.
さらに、これらのパターンはグラフ全体に均一に分布しているわけではなく、ホモフィリーのレベルは異なるコミュニティ間で大きく異なる。

Our theoretical analysis further demonstrates that global filtering, as commonly employed in numerous GNNs, may not effectively capture such complex patterns, often leading to suboptimal performance.
我々の理論的分析はさらに、多くのGNNsで一般的に採用されているグローバルフィルタリングが、そのような複雑なパターンを効果的に捉えられず、しばしば次善の性能につながる可能性があることを示している。

In contrast, node-wise filtering, which applies distinct filters to individual nodes based on their specific patterns, shows great promise in handling the intricacies of such complex graphs.
対照的に、特定のパターンに基づいて個々のノードに異なるフィルタを適用するノード単位フィルタリングは、そのような複雑なグラフの複雑さを扱う上で大きな可能性を示している。

However, implementing the node-wise filtering approach presents two significant challenges.
しかし、ノード単位フィルタリングアプローチの実装には二つの重要な課題がある。

First, how can we incorporate various filters into a single unified framework?
第一に、様々なフィルタを単一の統一フレームワークにどのように組み込むことができるか？

It requires a flexible architecture that can seamlessly accommodate multiple filtering mechanisms without compromising the efficiency and scalability of the model.
モデルの効率性とスケーラビリティを損なうことなく、複数のフィルタリング機構をシームレスに収容できる柔軟なアーキテクチャが必要である。

Second, without ground truth on node patterns, how can we select the appropriate filters for different nodes?
第二に、ノードパターンのグラウンドトゥルースがない状態で、異なるノードに対して適切なフィルタをどのように選択できるか？

In the following subsections, we aim to address these challenges.
以下の小節では、これらの課題に対処することを目指す。

figure4(p.5)

3.1 NODE-MOE: Node-wise Filtering via Mixture of Experts

3.1 NODE-MOE：Mixture of Expertsによるノード単位フィルタリング

Mixture of Experts (MoE) [32, 33], which follows the divide-and-conquer principle to divide the complex problem space into several subspaces so that each one can be easily addressed by specialized experts, have been successfully adopted across various domains [34–36].
Mixture of Experts (MoE) [32, 33]は、分割統治の原理に従い、複雑な問題空間を複数の部分空間に分割し、各部分空間を専門化されたエキスパートが容易に対処できるようにするものであり、様々な領域で成功裏に採用されている[34–36]。

For node classification tasks in graphs exhibiting a mixture of structural patterns, the diversity of node interactions necessitates applying distinct filters to different nodes as we discussed in Sections 2.
構造パターンの混合を示すグラフにおけるノード分類タスクでは、セクション2で議論したように、ノード相互作用の多様性が異なるノードに異なるフィルタを適用することを必要とする。

This necessity aligns well with the MoE methodology, which processes different samples with specific experts.
この必要性は、異なるサンプルを特定のエキスパートで処理するMoE手法と良く合致する。

Building on this principle, we introduce a flexible and efficient Node-wise Filtering via Mixture of Experts (NODE-MOE) framework, designed to dynamically apply appropriate filters to nodes based on their structural characteristics.
この原理に基づき、構造的特性に基づいてノードに適切なフィルタを動的に適用するよう設計された、柔軟で効率的なMixture of Expertsによるノード単位フィルタリング（NODE-MOE）フレームワークを導入する。

The overall NODE-MOE framework is illustrated in Figure 4, which consists of two primary components: the gating model and the multiple expert models.
NODE-MOEフレームワーク全体はFigure 4に示され、二つの主要コンポーネント：ゲーティングモデルと複数のエキスパートモデルから構成される。

With the graph data as input, the gating model $g(\cdot)$ computes the weight assigned to each expert for every node, reflecting the relevance of each expert’s contribution to that specific node.
グラフデータを入力として、ゲーティングモデル$g(\cdot)$は各ノードに対する各エキスパートに割り当てられる重みを計算し、その特定のノードに対する各エキスパートの寄与の関連性を反映する。

Each expert model, implemented as any GNN with different filters, generates node representations independently.
各エキスパートモデルは、異なるフィルタを持つ任意のGNNとして実装され、ノード表現を独立に生成する。

The final node classification is determined by a weighted sum of these representations, where the weights are those assigned by the gating model.
最終的なノード分類は、これらの表現の重み付き和によって決定され、重みはゲーティングモデルによって割り当てられたものである。

The prediction for node $i$ can be represented by:
ノード$i$に対する予測は以下で表現できる：

\[\hat{y}_i = h\left(\sum_{o=1}^{m} g(\mathbf{A}, \mathbf{X})_{i,o} E_o(\mathbf{A}, \mathbf{X})_i\right),\]

where $m$ is the number of experts, $E_o$ denotes the $o$-th expert, $g(\mathbf{A}, \mathbf{X}){i,o}$ represents the weight assigned to the $o$-th expert for node $i$ by the gating model, and $h$ is a classifier, which could be a model like a neural network or a simple activation function like Softmax.
ここで$m$はエキスパートの数、$E_o$は$o$番目のエキスパートを表し、$g(\mathbf{A}, \mathbf{X}){i,o}$はゲーティングモデルによるノード$i$に対する$o$番目のエキスパートに割り当てられた重みを表し、$h$はニューラルネットワークのようなモデルやSoftmaxのような単純な活性化関数であり得る分類器である。

In the following, we will delve into the specific designs of the gating model and the expert models within the proposed Node-MoE framework.
以下では、提案されたNode-MoEフレームワーク内のゲーティングモデルとエキスパートモデルの具体的な設計について詳述する。

3.2 Gating Model

3.2 ゲーティングモデル

The gating model is a pivotal component of the Node-MoE framework, aimed at selecting the most appropriate experts for each node.
ゲーティングモデルはNode-MoEフレームワークの中核的なコンポーネントであり、各ノードに対して最も適切なエキスパートを選択することを目的としている。

Its primary function is to dynamically assign higher weights to experts whose filtering characteristics best match the node’s patterns.
その主要な機能は、フィルタリング特性がノードのパターンに最も合致するエキスパートに動的に高い重みを割り当てることである。

For instance, an expert utilizing a high-pass filter may receive a higher weight for a node that exhibits heterophilic patterns.
例えば、ハイパスフィルタを利用するエキスパートは、ヘテロフィリックパターンを示すノードに対してより高い重みを受ける可能性がある。

However, a significant challenge arises as there is no explicit ground truth indicating which pattern each node belongs to.
しかし、各ノードがどのパターンに属するかを示す明示的なグラウンドトゥルースがないため、重大な課題が生じる。

In traditional MoE models, the gating model often utilizes a straightforward feed-forward network that processes the features of the sample as input [35–38].
従来のMoEモデルでは、ゲーティングモデルはしばしばサンプルの特徴を入力として処理する単純なフィードフォワードネットワークを利用する[35–38]。

Nevertheless, the nodes with different patterns may share similar node features, making this method ineffective.
しかしながら、異なるパターンを持つノードが類似のノード特徴を共有する場合があり、この方法を非効果的にしている。

To address this challenge, we estimate node patterns by incorporating the contextual features surrounding each node.
この課題に対処するために、各ノードを取り巻くコンテキスト特徴を組み込むことでノードパターンを推定する。

If a node’s features significantly differ from those of its neighboring nodes, it is likely that this node exhibits a heterophilic pattern.
ノードの特徴がその近傍ノードの特徴と大きく異なる場合、そのノードはヘテロフィリックパターンを示している可能性が高い。

Specifically, the input to our gating model includes a composite vector $[\mathbf{X},	\mathbf{A}\mathbf{X} - \mathbf{X}	,	\mathbf{A}^2\mathbf{X} - \mathbf{X}	]$.
具体的には、我々のゲーティングモデルへの入力は複合ベクトル$[\mathbf{X},	\mathbf{A}\mathbf{X} - \mathbf{X}	,	\mathbf{A}^2\mathbf{X} - \mathbf{X}	]$を含む。

This vector combines the node’s original features with the absolute differences between its features and those of its neighbors over one and two hops, respectively, to indicate the node’s structural patterns.
このベクトルは、ノードの構造パターンを示すために、ノードの元の特徴と、それぞれ1ホップおよび2ホップの近傍との特徴の絶対差を組み合わせている。

Moreover, as discussed in Section 2.1, different structural patterns are not uniformly distributed across the graph, and distinct communities may exhibit varying structural characteristics.
さらに、セクション2.1で議論したように、異なる構造パターンはグラフ全体に均一に分布しておらず、異なるコミュニティは異なる構造的特性を示す可能性がある。

To capitalize on this phenomenon, we employ GNNs with low-pass filters, such as GIN [39], for the gating model.
この現象を活用するために、ゲーティングモデルにはGIN [39]のようなローパスフィルタを持つGNNsを採用する。

These networks are chosen due to their strong community detection capabilities [40, 41], ensuring that neighboring nodes are likely to receive similar expert selections.
これらのネットワークは、その強力なコミュニティ検出能力[40, 41]のために選択され、近傍ノードが類似のエキスパート選択を受ける可能性が高いことを保証する。

Experimental results in Section 4.3 clearly demonstrate the proposed gating can efficiently assign different nodes to their suitable filters.
セクション4.3の実験結果は、提案されたゲーティングが異なるノードをそれらに適したフィルタに効率的に割り当てることができることを明確に示している。

3.3 Expert Models

3.3 エキスパートモデル

The mixed structural patterns observed in real-world graphs necessitate that the expert models in our NODE-MOE framework possess diverse capabilities.
実世界のグラフで観察される混合構造パターンは、我々のNODE-MOEフレームワークのエキスパートモデルが多様な能力を備えることを必要とする。

To achieve this, we consider multiple existing GNNs equipped with different filters.
これを達成するために、異なるフィルタを備えた複数の既存GNNsを考慮する。

Traditional GNNs often utilize fixed filters, which may not adequately capture the complexity of diverse structural patterns.
従来のGNNsはしばしば固定フィルタを利用するが、多様な構造パターンの複雑さを十分に捉えられない可能性がある。

To address this limitation, we opt for GNNs with learnable graph convolutions [9, 15–17], which are capable of adapting their filters to better fit the graph structural patterns.
この限界に対処するために、グラフ構造パターンにより良く適合するようフィルタを適応させることができる学習可能なグラフ畳み込みを持つGNNs [9, 15–17]を選択する。

However, the same experts would make the gating model hard to learn the right features [42] and may result in all experts’ filters being optimized in the same direction.
しかし、同じエキスパートはゲーティングモデルが正しい特徴を学習することを困難にし[42]、全てのエキスパートのフィルタが同じ方向に最適化される結果になる可能性がある。

To encourage diversity and ensure that each expert is adept at handling specific structural patterns, we adopt a differentiated initialization strategy for the filters in the experts.
多様性を促進し、各エキスパートが特定の構造パターンの処理に長けていることを保証するために、エキスパートのフィルタに差別化された初期化戦略を採用する。

Instead of using a fixed filter initialization, we initialize different experts with distinct types of filters, such as low-pass, constant, and high-pass filters.
固定フィルタ初期化を使用する代わりに、ローパス、定数、ハイパスフィルタなど、異なるタイプのフィルタで異なるエキスパートを初期化する。

Filter Smoothing Loss. While integrating multiple experts with diverse filters significantly enhances the expressive capacity of our NODE-MOE framework, this complexity can also make the model more challenging to fit.
フィルタ平滑化損失. 多様なフィルタを持つ複数のエキスパートの統合は、我々のNODE-MOEフレームワークの表現能力を大幅に向上させるが、この複雑さはモデルの適合をより困難にする可能性もある。

For example, training multiple filters simultaneously may lead to oscillations in the spectral domain for each filter as shown in Appendix B.
例えば、複数のフィルタを同時に学習することは、Appendix Bに示されるように各フィルタのスペクトル領域での振動につながる可能性がある。

This not only complicates fitting the model to the data but also impacts its explainability.
これはモデルをデータに適合させることを複雑にするだけでなく、その説明可能性にも影響を与える。

The specific role and function of each oscillating filter become difficult to discern, making it harder to understand and interpret the model’s behavior.
各振動フィルタの具体的な役割と機能を識別することが困難になり、モデルの挙動を理解し解釈することがより難しくなる。

To mitigate these issues, we introduce a filter smoothing loss designed to ensure that the learned filters exhibit smooth behavior in the spectral domain.
これらの問題を軽減するために、学習されたフィルタがスペクトル領域で滑らかな挙動を示すことを保証するよう設計されたフィルタ平滑化損失を導入する。

This loss is defined as follows:
この損失は以下のように定義される：

\[L_{os} = \sum_{i=1}^{K} |f_o(x_i) - f_o(x_{i-1})|^2,\]

where $f_o(\cdot)$ is the learnable filter function of the $o$-th expert, $x_0 \leq x_1 \leq \cdots \leq x_K$ are $K + 1$ values spanning the spectral domain.
ここで$f_o(\cdot)$は$o$番目のエキスパートの学習可能なフィルタ関数、$x_0 \leq x_1 \leq \cdots \leq x_K$はスペクトル領域にわたる$K + 1$個の値である。

The overall training loss is then given by $L = L_{task} + \gamma \sum_{o=1}^{m} L_{os}$, where the $L_{task}$ is the node classification loss and $\gamma$ is a hyperparameter that adjusts the influence of the filter smoothing loss.
全体の学習損失は$L = L_{task} + \gamma \sum_{o=1}^{m} L_{os}$で与えられ、$L_{task}$はノード分類損失、$\gamma$はフィルタ平滑化損失の影響を調整するハイパーパラメータである。

3.4 Top-K gating

3.4 Top-Kゲーティング

The soft gating that integrates all experts in the Node-MoE framework significantly enhances its modeling capabilities, but it also increases computational complexity since each expert must process all samples.
Node-MoEフレームワークにおいて全てのエキスパートを統合するソフトゲーティングは、そのモデリング能力を大幅に向上させるが、各エキスパートが全てのサンプルを処理する必要があるため、計算複雑度も増加する。

To improve computational efficiency while maintaining performance, we introduce a variant of NODE-MOE by leveraging the Top-K gating mechanism [35].
性能を維持しながら計算効率を改善するために、Top-Kゲーティング機構[35]を活用するNODE-MOEの変種を導入する。

In this variant, the NODE-MOE with Top-K gating selectively activates only the top $k$ experts with the highest relevance for each node.
この変種では、Top-KゲーティングによるNODE-MOEは、各ノードに対して最も関連性の高い上位$k$個のエキスパートのみを選択的に活性化する。

Specifically, the gating function for a node $v_i$ is defined as $g(v_i) = \text{Softmax}(\text{TopK}(g(\mathbf{A}, \mathbf{X})_i, k))$.
具体的には、ノード$v_i$に対するゲーティング関数は$g(v_i) = \text{Softmax}(\text{TopK}(g(\mathbf{A}, \mathbf{X})_i, k))$として定義される。

To prevent the gating model from consistently favoring a limited number of experts, we incorporate a load-balancing loss as suggested by [35].
ゲーティングモデルが限られた数のエキスパートを一貫して優先することを防ぐために、[35]で提案されたロードバランシング損失を組み込む。

In the experiments detailed in Section 4.4, we set $k = 1$ to ensure that the computational complexity of our NODE-MOE is comparable to that of an individual expert model.
セクション4.4で詳述される実験では、NODE-MOEの計算複雑度が個々のエキスパートモデルと同等であることを保証するために$k = 1$と設定する。

4 Experiment

4 実験

In this section, we conduct comprehensive experiments to validate the effectiveness of the proposed NODE-MOE.
本節では、提案手法NODE-MOEの有効性を検証するための包括的な実験を行う。

Specifically, we aim to address the following research questions:
具体的には、以下の研究課題に対処することを目指す：

RQ1: How does NODE-MOE perform compared with the state-of-the-art baselines on both homophilic and heterophilic graphs?
RQ1：NODE-MOEはホモフィリックグラフとヘテロフィリックグラフの両方において、最先端のベースラインと比較してどの程度の性能を発揮するか？
RQ2: Do the experts within NODE-MOE learn diverse structural patterns and does the gating model accurately assign each node to its most suitable experts?
RQ2：NODE-MOE内のエキスパートは多様な構造パターンを学習し、ゲーティングモデルは各ノードを最も適したエキスパートに正確に割り当てるか？
RQ3: How do different factors affect the performance of NODE-MOE?
RQ3：異なる要因はNODE-MOEの性能にどのように影響するか？

4.1 Experimental settings.

4.1 実験設定

Datasets. To evaluate the efficacy of our proposed NODE-MOE, we conduct experiments across seven widely used datasets.
データセット. 提案手法NODE-MOEの有効性を評価するために、広く使用されている7つのデータセットにわたる実験を行う。

These include four homophilic datasets: Cora, CiteSeer, Pubmed [24], and ogbn-arxiv [43]; along with three heterophilic datasets: Chameleon, Squirrel, and Actor [22].
これらには4つのホモフィリックデータセット：Cora、CiteSeer、Pubmed [24]、およびogbn-arxiv [43]と、3つのヘテロフィリックデータセット：Chameleon、Squirrel、およびActor [22]が含まれる。

For Cora, CiteSeer, and Pubmed, we generate ten random splits, distributing nodes into 60% training, 20% validation, and 20% testing partitions.
Cora、CiteSeer、およびPubmedについては、10個のランダム分割を生成し、ノードを60%の訓練、20%の検証、20%のテストに分配する。

For the heterophilic datasets, we utilize the ten fixed splits as specified in [22].
ヘテロフィリックデータセットについては、[22]で指定された10個の固定分割を利用する。

The ogbn-arxiv dataset is evaluated using its standard split [43].
ogbn-arxivデータセットは標準的な分割[43]を使用して評価される。

We run the experiments 3 times for each split and report the average performance and standard deviation.
各分割に対して実験を3回実行し、平均性能と標準偏差を報告する。

More details about these datasets are shown in Appendix C.1.
これらのデータセットに関する詳細はAppendix C.1に示されている。

Baselines. We compare our method with a diverse set of baselines, which can be divided into five categories: (1) Non-GNN methods like MLP and Label Propagation (LP) [44]; (2) Homophilic GNNs utilizing fixed low-pass filters such as GCN [1], GAT [2], APPNP [6], and GCNII [45]; (3) Heterophilic GNNs including WRGCN [46], GloGNN [18] and LinkX [47]; (4) GNNs with learnable filters like GPRGNN [9] and ChebNetII [17]; (5) MoE-based GNNs such as GMoE [38].
ベースライン. 我々の手法を5つのカテゴリに分けられる多様なベースラインセットと比較する：(1) MLPやLabel Propagation (LP) [44]のような非GNN手法；(2) GCN [1]、GAT [2]、APPNP [6]、GCNII [45]のような固定ローパスフィルタを利用するホモフィリックGNNs；(3) WRGCN [46]、GloGNN [18]、LinkX [47]を含むヘテロフィリックGNNs；(4) GPRGNN [9]やChebNetII [17]のような学習可能なフィルタを持つGNNs；(5) GMoE [38]のようなMoEベースのGNNs。

NODE-MOE settings. The proposed NODE-MOE framework is highly flexible, allowing for a wide range of choices in both gating and expert models.
NODE-MOEの設定. 提案されたNODE-MOEフレームワークは非常に柔軟であり、ゲーティングモデルとエキスパートモデルの両方において幅広い選択を可能にする。

In this work, we employ the GIN [39] as the gating model due to its exceptional expressive power and ability to leverage community properties as discussed in Section 2.
本研究では、セクション2で議論したように、その卓越した表現力とコミュニティ特性を活用する能力のため、GIN [39]をゲーティングモデルとして採用する。

As for the expert models, we utilize ChebNetII [17], known for its efficiency in learning filters.
エキスパートモデルとしては、フィルタ学習における効率性で知られるChebNetII [17]を利用する。

Specifically, we experiment with configurations of 2, 3, and 5 ChebNetII experts, each initialized with different filters.
具体的には、それぞれ異なるフィルタで初期化された2、3、5個のChebNetIIエキスパートの構成で実験を行う。

More details and parameter settings are in Appendix C.2.
詳細とパラメータ設定はAppendix C.2にある。

4.2 Performance Comparison on Benchmark Datasets

4.2 ベンチマークデータセットにおける性能比較

In this section, we evaluate the efficacy of the proposed NODE-MOE across both homophilic and heterophilic datasets.
本節では、ホモフィリックデータセットとヘテロフィリックデータセットの両方にわたる提案手法NODE-MOEの有効性を評価する。

The results of node classification experiments are detailed in Table 1.
ノード分類実験の結果はTable 1に詳述されている。

From the results, we can have the following observations:
結果から、以下の観察が得られる：

Table1(p.8)

The proposed NODE-MOE demonstrates robust performance across both homophilic and heterophilic datasets, achieving the best average rank among all baselines.
提案手法NODE-MOEはホモフィリックデータセットとヘテロフィリックデータセットの両方にわたって頑健な性能を示し、全てのベースラインの中で最良の平均ランクを達成した。

This indicates its effectiveness in handling diverse graph structures.
これは多様なグラフ構造の処理における有効性を示している。

The GNNs and methods like LP that use fixed low-pass filters generally do well on homophilic datasets but tend to underperform on heterophilic datasets.
LPのような固定ローパスフィルタを使用するGNNsおよび手法は、一般的にホモフィリックデータセットでは良好な性能を示すが、ヘテロフィリックデータセットでは性能が低下する傾向がある。

Conversely, specialized models like GloGNN and LinkX, designed for heterophilic graphs, do not perform as well on homophilic datasets such as the ogbn-arxiv dataset.
逆に、ヘテロフィリックグラフ向けに設計されたGloGNNやLinkXのような専門的なモデルは、ogbn-arxivデータセットのようなホモフィリックデータセットではそれほど良い性能を示さない。

The GNNs equipped with learnable filters generally perform well on both types of datasets, as they can adapt their filters to the dataset’s structural patterns.
学習可能なフィルタを備えたGNNsは、データセットの構造パターンにフィルタを適応させることができるため、両方のタイプのデータセットで一般的に良好な性能を示す。

However, their performance is still not optimal.
しかし、その性能は依然として最適ではない。

The proposed Node-MoE, which utilizes multiple ChebNetII as experts, significantly outperforms a single ChebNetII, especially on heterophilic datasets.
複数のChebNetIIをエキスパートとして利用する提案手法Node-MoEは、特にヘテロフィリックデータセットにおいて、単一のChebNetIIを大幅に上回る。

This result validates the effectiveness of our node-wise filtering approach.
この結果は、我々のノード単位フィルタリングアプローチの有効性を検証している。

We also compare the proposed NODE-MOE with GMoE, which adapts the receptive field for each node but still applies traditional graph convolution with low-pass filters.
また、各ノードの受容野を適応させるが依然として従来のローパスフィルタによるグラフ畳み込みを適用するGMoEと提案手法NODE-MOEを比較する。

We can find NODE-MOE consistently outperforms GMoE across all datasets.
NODE-MOEが全てのデータセットにわたって一貫してGMoEを上回ることがわかる。

4.3 Analysis of NODE-MOE

4.3 NODE-MOEの分析

In this section, we delve into an in-depth analysis of the behaviors of NODE-MOE to demonstrate its rationality and effectiveness.
本節では、NODE-MOEの合理性と有効性を示すために、その挙動の詳細な分析に踏み込む。

We aim to uncover several key aspects of how NODE-MOE operates and performs: What specific types of filters does Node-MoE learn?
NODE-MOEがどのように動作し性能を発揮するかのいくつかの重要な側面を明らかにすることを目指す：Node-MoEはどのような特定のタイプのフィルタを学習するか？

Are nodes appropriately assigned to these diverse filters by the gating model?
ノードはゲーティングモデルによってこれらの多様なフィルタに適切に割り当てられているか？

We conduct experiments on both CiteSeer and Chameleon datasets using configurations with 2 experts.
CiteSeerとChameleonの両データセットで2エキスパートの構成を使用して実験を行う。

The results for the Chameleon dataset are presented below.
Chameleonデータセットの結果を以下に示す。

For more results and analysis, please refer to Appendix D.
さらなる結果と分析については、Appendix Dを参照されたい。

figure5(p.8)

figure6(p.8)

Figure 5 showcases the two filters learned by NODE-MOE on the Chameleon dataset, where filter 0 functions as a low-pass filter and filter 1 as a high-pass filter.
Figure 5は、ChameleonデータセットにおいてNODE-MOEが学習した2つのフィルタを示しており、フィルタ0はローパスフィルタとして、フィルタ1はハイパスフィルタとして機能している。

To analyze the behavior of the gating model in NODE-MOE, we split nodes into different groups based on their homophily levels.
NODE-MOEにおけるゲーティングモデルの挙動を分析するために、ノードをホモフィリーレベルに基づいて異なるグループに分割する。

Figure 6 displays the weights assigned by the gating model to these two experts.
Figure 6は、ゲーティングモデルがこれら2つのエキスパートに割り当てた重みを表示している。

The results reveal that nodes with lower homophily levels predominantly receive higher weights for the high-pass filter (filter 1), and as the homophily level increases, the weight for this filter correspondingly decreases.
結果は、ホモフィリーレベルが低いノードが主にハイパスフィルタ（フィルタ1）に対してより高い重みを受け、ホモフィリーレベルが上昇するにつれてこのフィルタに対する重みが対応して減少することを明らかにしている。

This pattern confirms our design that nodes with varying structural patterns require different filters, demonstrating the effectiveness of the proposed gating model.
このパターンは、異なる構造パターンを持つノードが異なるフィルタを必要とするという我々の設計を確認し、提案されたゲーティングモデルの有効性を示している。

4.4 Ablation studies

4.4 アブレーションスタディ

In this section, we conduct ablation studies to further investigate the effectiveness of two key components within the Node-MoE framework: the gating model and the filter smoothing loss.
本節では、Node-MoEフレームワーク内の2つの主要コンポーネント：ゲーティングモデルとフィルタ平滑化損失の有効性をさらに調査するためにアブレーションスタディを行う。

For the gating model, we explore two variants: a traditional MLP-based gating mechanism that utilizes the input features $\mathbf{X}$, and the Top-k gating approach as detailed in Section 3.4.
ゲーティングモデルについては、入力特徴$\mathbf{X}$を利用する従来のMLPベースのゲーティング機構と、セクション3.4で詳述したTop-kゲーティングアプローチの2つの変種を検討する。

Specifically, we choose $k = 1$ to ensure the proposed NODE-MOE has similar efficiency with the single expert.
具体的には、提案手法NODE-MOEが単一エキスパートと同様の効率性を持つことを保証するために$k = 1$を選択する。

Figure 7 presents the results on CiteSeer, ogbn-arxiv, Chameleon, and Squirrel datasets.
Figure 7はCiteSeer、ogbn-arxiv、Chameleon、およびSquirrelデータセットの結果を示している。

figure7(p.9)

figure8(p.9)

We observe two findings: (1) Traditional gating does not perform as well as the proposed gating in NODE-MOE and only achieves results comparable to an individual ChebNetII expert.
2つの知見が観察される：(1) 従来のゲーティングはNODE-MOEにおける提案ゲーティングほど良い性能を示さず、個々のChebNetIIエキスパートと同等の結果しか達成しない。

(2) The Top-1 gating, which selects only one expert, can achieve similar results to those of the soft gating NODE-MOE that utilizes all experts.
(2) 1つのエキスパートのみを選択するTop-1ゲーティングは、全てのエキスパートを利用するソフトゲーティングNODE-MOEと同様の結果を達成できる。

This indicates that the proposed NODE-MOE can effectively enhance performance while maintaining a complexity level comparable to that of an individual expert model.
これは、提案手法NODE-MOEが個々のエキスパートモデルと同等の複雑度レベルを維持しながら、効果的に性能を向上させることができることを示している。

We also investigate the impact of the weight parameter, $\gamma$, of the filter smoothing loss on the overall performance.
フィルタ平滑化損失の重みパラメータ$\gamma$が全体的な性能に与える影響も調査する。

Specifically, we conduct experiments on the Citeseer and Squirrel datasets and the $\gamma$ is chosen in $[0, 0.1, 0.5, 1, 5]$.
具体的には、CiteseerとSquirrelデータセットで実験を行い、$\gamma$は$[0, 0.1, 0.5, 1, 5]$から選択される。

As shown in Figure 8, incorporating the filter smoothing loss generally enhances performance, especially for the Citeseer dataset.
Figure 8に示されるように、フィルタ平滑化損失の組み込みは一般的に性能を向上させ、特にCiteseerデータセットで顕著である。

The reason is that the filter smoothing loss can mitigate filter oscillation, which may lead to the model being hard to learn.
その理由は、フィルタ平滑化損失がフィルタの振動を軽減でき、この振動はモデルの学習を困難にする可能性があるためである。

For more detailed insights into the effects of the filter smoothing loss, please refer to Appendix B.
フィルタ平滑化損失の効果に関するより詳細な知見については、Appendix Bを参照されたい。

5 関連研究

Graph Neural Networks (GNNs) [1, 2] have achieved remarkable success in graph representation learning across a wide range of tasks [48].
グラフニューラルネットワーク（GNNs）[1, 2]は、幅広いタスクにわたるグラフ表現学習において顕著な成功を収めている[48]。

The design of GNN architectures is majorly motivated in the spatial domain [1, 5, 6, 49] and spectral domain [8, 9, 50, 15].
GNNアーキテクチャの設計は主に空間領域[1, 5, 6, 49]とスペクトル領域[8, 9, 50, 15]で動機付けられている。

The GNNs in the spatial domain usually follow the message-passing mechanism [7], which can be regarded as low-pass graph filters [51, 52].
空間領域のGNNsは通常メッセージパッシング機構[7]に従い、これはローパスグラフフィルタと見なすことができる[51, 52]。

As a result, these GNNs are usually suitable for homophilic graphs.
その結果、これらのGNNsは通常ホモフィリックグラフに適している。

To address heterophilic graphs, specialized models like GloGNN [18], LinkX [47], MixHop [53], and ACM-GNN [54] have been developed.
ヘテロフィリックグラフに対処するために、GloGNN [18]、LinkX [47]、MixHop [53]、ACM-GNN [54]のような専門的なモデルが開発されている。

Additionally, models such as Bernnet [16], GPRGNN [9], and ChebNetII [17] feature learnable filters that adapt to various graph types.
さらに、Bernnet [16]、GPRGNN [9]、ChebNetII [17]のようなモデルは、様々なグラフタイプに適応する学習可能なフィルタを特徴としている。

Recent studies have highlighted that real-world graphs often exhibit a mixture of structural patterns [18, 19], with [19] specifically exploring GNN generalization under mixed patterns.
最近の研究は、実世界のグラフがしばしば構造パターンの混合を示すことを強調しており[18, 19]、[19]は特に混合パターン下でのGNNの汎化を探求している。

Traditional GNNs typically apply the same global filter across all nodes, which can be suboptimal for such mixed scenarios.
従来のGNNsは典型的に全てのノードに同じグローバルフィルタを適用するが、そのような混合シナリオでは次善である可能性がある。

In response, our proposed NODE-MOE introduces a node-wise filtering approach, applying distinct filters to nodes based on their individual patterns, enhancing adaptability and performance.
これに対して、我々が提案するNODE-MOEは、個々のパターンに基づいてノードに異なるフィルタを適用するノード単位フィルタリングアプローチを導入し、適応性と性能を向上させる。

Mixture of Experts (MoE) [32, 33] architecture has been widely used in NLP [37, 55] and Computer Vision [36] to improve efficiency of large models.
Mixture of Experts (MoE) [32, 33]アーキテクチャは、大規模モデルの効率性を改善するためにNLP [37, 55]やComputer Vision [36]で広く使用されている。

In graph domain, GraphMETRO [56] leverage MoE to address the graph distribution shift issue.
グラフ領域では、GraphMETRO [56]がグラフ分布シフト問題に対処するためにMoEを活用している。

GMoE [38] utilizes MoE to adaptive select propagation hops for different nodes.
GMoE [38]は、異なるノードに対する伝播ホップを適応的に選択するためにMoEを利用している。

Despite these advancements, these method still face challenges in handling complex graph patterns with the same efficacy as our proposed NODE-MOE.
これらの進歩にもかかわらず、これらの手法は依然として我々が提案するNODE-MOEと同等の有効性で複雑なグラフパターンを扱う際に課題に直面している。

6 Conclusion

6 結論

In this paper, we explored the complex structural patterns inherent in real-world graph datasets, which typically showcase a mixture of homophilic and heterophilic patterns.
本論文では、典型的にホモフィリックパターンとヘテロフィリックパターンの混合を示す、実世界グラフデータセットに固有の複雑な構造パターンを探求した。

Notably, these patterns exhibit significant variability across different communities within the same dataset, highlighting the intricate and diverse nature of graph structures.
注目すべきことに、これらのパターンは同一データセット内の異なるコミュニティ間で顕著な変動性を示し、グラフ構造の複雑で多様な性質を浮き彫りにしている。

Our theoretical analysis reveals that the conventional single global filter, commonly used in many GNNs, is often inadequate for capturing such complex structural patterns.
我々の理論的分析は、多くのGNNsで一般的に使用されている従来の単一グローバルフィルタが、そのような複雑な構造パターンを捉えるにはしばしば不十分であることを明らかにしている。

To address this limitation, we proposed the Node-wise filtering method, NODE-MOE, a flexible and effective solution that adaptively selects appropriate filters for different nodes.
この限界に対処するために、異なるノードに対して適切なフィルタを適応的に選択する柔軟で効果的な解決策であるノード単位フィルタリング手法NODE-MOEを提案した。

Extensive experiments demonstrate the proposed NODE-MOE demonstrated robust performance on both homophilic and heterophilic datasets.
広範な実験により、提案手法NODE-MOEがホモフィリックデータセットとヘテロフィリックデータセットの両方で頑健な性能を示すことが実証された。

Further, our behavioral analysis and ablation studies validate the design and effectiveness of the proposed NODE-MOE.
さらに、我々の挙動分析とアブレーションスタディは、提案手法NODE-MOEの設計と有効性を検証している。

Appendix

付録

A Proof of Theorem 1

A 定理1の証明

In this section, we present the proof of Theorem 1.
本節では、定理1の証明を提示する。

This theorem analyzes the separability when different filters are applied to graphs generated by a mixed CSBM model in Definition 1 - $CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$ using a linear classifier.
この定理は、定義1の混合CSBMモデル$CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p_0, q_0), (p_1, q_1), P)$によって生成されたグラフに線形分類器を用いて異なるフィルタが適用された場合の分離可能性を分析する。

Notably, the following proof is derived based on [11], which analyzes the linear separability of a single graph convolution under a single CSBM model with only one pattern - $CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p, q))$.
特筆すべきは、以下の証明は、単一パターンのみを持つ単一CSBMモデル$CSBM(n, \boldsymbol{\mu}, \boldsymbol{\nu}, (p, q))$の下での単一グラフ畳み込みの線形分離可能性を分析した[11]に基づいて導出されていることである。

We extend the analysis to graphs with mixed CSBM models.
我々は分析を混合CSBMモデルを持つグラフに拡張する。

Besides, we analyze the scenarios in which different filters are applied to the same graph.
さらに、同一グラフに異なるフィルタが適用されるシナリオを分析する。

We follow the assumption 1 and 2 in [11]: The graph size $n$ should be relatively large with $\omega(d \log d) \leq n \leq O(\text{poly}(d))$, and the graph is not too sparse with $p_0, q_0, p_1, q_1 = \omega(\log^2(n)/n)$.
[11]の仮定1および2に従う：グラフサイズ$n$は$\omega(d \log d) \leq n \leq O(\text{poly}(d))$で十分大きく、グラフは$p_0, q_0, p_1, q_1 = \omega(\log^2(n)/n)$で疎すぎないものとする。

A.1 Proof of part 1 of Theorem 1

A.1 定理1のパート1の証明

Proof. For the low-pass filter, consider the filtered feature $\tilde{\mathbf{X}} = \mathbf{D}^{-1}\mathbf{A}\mathbf{X}$.
証明. ローパスフィルタに対して、フィルタリングされた特徴$\tilde{\mathbf{X}} = \mathbf{D}^{-1}\mathbf{A}\mathbf{X}$を考える。

Then, the filtered feature of node $i$ still follows the normal distribution, the mean can be represented by:
すると、ノード$i$のフィルタリングされた特徴は依然として正規分布に従い、平均は以下で表現できる：

\[m(i) = E(\tilde{\mathbf{X}}_i) = \begin{cases} \frac{p_0 \boldsymbol{\mu} + q_0 \boldsymbol{\nu}}{p_0 + q_0}(1 + o(1)) & \text{for } i \in C_0 \text{ and } i \in H_0 \\ \frac{q_0 \boldsymbol{\mu} + p_0 \boldsymbol{\nu}}{p_0 + q_0}(1 + o(1)) & \text{for } i \in C_1 \text{ and } i \in H_0 \\ \frac{p_1 \boldsymbol{\mu} + q_1 \boldsymbol{\nu}}{p_1 + q_1}(1 + o(1)) & \text{for } i \in C_0 \text{ and } i \in H_1 \\ \frac{q_1 \boldsymbol{\mu} + p_1 \boldsymbol{\nu}}{p_1 + q_1}(1 + o(1)) & \text{for } i \in C_1 \text{ and } i \in H_1 \end{cases},\]

where $C_0$ and $C_1$ represent the class 0 and class 1, respectively; $H_0$ and $H_1$ are the homophilic and heterophilic node sets, respectively.
ここで$C_0$と$C_1$はそれぞれクラス0とクラス1を表し、$H_0$と$H_1$はそれぞれホモフィリックノード集合とヘテロフィリックノード集合である。

The covariance matrix can be represented by: $\text{Cov}(\tilde{\mathbf{X}}i) = \frac{1}{d D{ii}} \mathbf{I}$.
共分散行列は$\text{Cov}(\tilde{\mathbf{X}}i) = \frac{1}{d D{ii}} \mathbf{I}$と表現できる。

Lemma 2 in [11] demonstrate that for any unit vector $\mathbf{w}$, we have: $(\tilde{\mathbf{X}}_i - m(i)) \cdot \mathbf{w} = O\left(\sqrt{\frac{\log n}{dn(p_0 + q_0)}}\right)$.
[11]のLemma 2は、任意の単位ベクトル$\mathbf{w}$に対して$(\tilde{\mathbf{X}}_i - m(i)) \cdot \mathbf{w} = O\left(\sqrt{\frac{\log n}{dn(p_0 + q_0)}}\right)$が成り立つことを示している。

If we only consider the nodes with homophilic patterns, i.e., $i \in H_0$, we can find the optimal linear classifier with $\mathbf{w}^* = R \frac{\boldsymbol{\nu} - \boldsymbol{\mu}}{|\boldsymbol{\nu} - \boldsymbol{\mu}|}$ and $b^* = -\frac{1}{2} \langle \boldsymbol{\nu} + \boldsymbol{\mu}, \mathbf{w}^* \rangle$.
ホモフィリックパターンを持つノードのみ、すなわち$i \in H_0$を考えると、$\mathbf{w}^* = R \frac{\boldsymbol{\nu} - \boldsymbol{\mu}}{|\boldsymbol{\nu} - \boldsymbol{\mu}|}$および$b^* = -\frac{1}{2} \langle \boldsymbol{\nu} + \boldsymbol{\mu}, \mathbf{w}^* \rangle$で最適な線形分類器を見つけることができる。

We also have the assumption that the distance between $\boldsymbol{\mu}$ and $\boldsymbol{\nu}$ are relatively large, with $|\boldsymbol{\nu} - \boldsymbol{\mu}| = \Omega\left(\frac{\log n}{dn(p_0 + q_0)}\right)$.
$\boldsymbol{\mu}$と$\boldsymbol{\nu}$の距離は$|\boldsymbol{\nu} - \boldsymbol{\mu}| = \Omega\left(\frac{\log n}{dn(p_0 + q_0)}\right)$で比較的大きいという仮定もある。

Then, for $i \in C_0$ and $i \in H_0$, we have:
すると、$i \in C_0$かつ$i \in H_0$に対して、以下が成り立つ：

\[\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^* = \frac{\langle p_0 \boldsymbol{\mu} + q_0 \boldsymbol{\nu}, \mathbf{w}^* \rangle}{p_0 + q_0}(1 + o(1)) + O\left(\|\mathbf{w}^*\| \sqrt{\frac{\log n}{dn(p+q)}}\right) - \frac{1}{2}\langle \boldsymbol{\nu} + \boldsymbol{\mu}, \mathbf{w}^* \rangle\] \[= \frac{\langle 2p_0 \boldsymbol{\mu} + 2q_0 \boldsymbol{\nu} - (p_0 + q_0)(\boldsymbol{\mu} + \boldsymbol{\nu}), \mathbf{w}^* \rangle}{p_0 + q_0}(1 + o(1)) + o(\|\mathbf{w}^*\|)\] \[= \frac{p_0 - q_0}{2(p_0 + q_0)} \langle \boldsymbol{\mu} - \boldsymbol{\nu}, \mathbf{w}^* \rangle (1 + o(1)) + o(\|\mathbf{w}^*\|)\] \[= -\frac{R(p_0 - q_0)}{2(p_0 + q_0)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) < 0\]

Similarly, for $i \in C_1$ and $i \in H_0$, we have:
同様に、$i \in C_1$かつ$i \in H_0$に対して、以下が成り立つ：

\[\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^* = -\frac{R(q_0 - p_0)}{2(p_0 + q_0)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) > 0\]

Therefore, the linear classifier with $\mathbf{w}^$ and $b^$ can separate class $C_0$.
したがって、$\mathbf{w}^$と$b^$による線形分類器はクラス$C_0$を分離できる。

However, if we apply this linear classifier to the heterophilic node set $H_1$, we have:
しかし、この線形分類器をヘテロフィリックノード集合$H_1$に適用すると、以下が成り立つ：

\[\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^* = \begin{cases} -\frac{R(p_1 - q_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) > 0 & \text{for } i \in C_0 \text{ and } i \in H_1 \\ -\frac{R(q_1 - p_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) < 0 & \text{for } i \in C_1 \text{ and } i \in H_1 \end{cases}\]

Therefore, all nodes in $H_1$ are misclassified.
したがって、$H_1$の全てのノードは誤分類される。

The binary cross-entropy over node set $H_1$ can be represented by:
ノード集合$H_1$上の二値交差エントロピーは以下で表現できる：

\[L(H_1, \mathbf{w}^*, b^*) = \frac{1}{|H_1|} \sum_{i \in H_1} -y_i \log\left(\sigma\left(\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^*\right)\right) - (1 - y_i) \log\left(1 - \sigma\left(\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^*\right)\right)\] \[= \frac{1}{|H_1|} \sum_{i \in H_1} \log\left(1 + \exp\left((1 - 2y_i)\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^*\right)\right)\] \[= \log\left(1 + \exp\left(-\frac{R(p_1 - q_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1))\right)\right)\]

As for $x = -\frac{R(p_1 - q_1)}{2(p_1 + q_1)} |\boldsymbol{\mu} - \boldsymbol{\nu}| > 0$, we have $e^x \geq x$.
$x = -\frac{R(p_1 - q_1)}{2(p_1 + q_1)} |\boldsymbol{\mu} - \boldsymbol{\nu}| > 0$に対して、$e^x \geq x$が成り立つ。

As a result, we have
その結果、以下が成り立つ

\[L(H_1, \mathbf{w}^*, b^*) \geq \frac{R(q_1 - p_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1))\]

A.2 Proof of part 2 of Theorem 1

A.2 定理1のパート2の証明

Proof. Suppose we apply a high-pass filter to the heterophilic nodes $H_1$ and the filtered features are $\tilde{\mathbf{X}} = -\mathbf{D}^{-1}\mathbf{A}\mathbf{X}$.
証明. ヘテロフィリックノード$H_1$にハイパスフィルタを適用し、フィルタリングされた特徴が$\tilde{\mathbf{X}} = -\mathbf{D}^{-1}\mathbf{A}\mathbf{X}$であると仮定する。

For nodes in $H_1$,
$H_1$のノードに対して、

\[m(i) = E(\tilde{\mathbf{X}}_i) = \begin{cases} -\frac{p_1 \boldsymbol{\mu} + q_1 \boldsymbol{\nu}}{p_1 + q_1}(1 + o(1)) & \text{for } i \in C_0 \text{ and } i \in H_1 \\ -\frac{q_1 \boldsymbol{\mu} + p_1 \boldsymbol{\nu}}{p_1 + q_1}(1 + o(1)) & \text{for } i \in C_1 \text{ and } i \in H_1 \end{cases}\]

Therefore, if we apply the same linear classifier with $\mathbf{w}^$ and $b^$, then we have:
したがって、$\mathbf{w}^$と$b^$による同じ線形分類器を適用すると、以下が成り立つ：

\[\langle \tilde{\mathbf{X}}_i, \mathbf{w}^* \rangle + b^* = \begin{cases} \frac{R(p_1 - q_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) < 0 & \text{for } i \in C_0 \text{ and } i \in H_1 \\ \frac{R(q_1 - p_1)}{2(p_1 + q_1)} \|\boldsymbol{\mu} - \boldsymbol{\nu}\| (1 + o(1)) > 0 & \text{for } i \in C_1 \text{ and } i \in H_1 \end{cases}\]

As a result, the same linear classifier can separate both the homophilic set $H_0$ and heterophilic set $H_1$.
その結果、同じ線形分類器がホモフィリック集合$H_0$とヘテロフィリック集合$H_1$の両方を分離できる。

B The Impact of Filter Smoothing Loss

B フィルタ平滑化損失の影響

In this section, we explore the impact of the proposed filter smoothing loss on the behavior of the learned filters in our NODE-MOE framework.
本節では、我々のNODE-MOEフレームワークにおける学習されたフィルタの挙動に対する提案されたフィルタ平滑化損失の影響を探求する。

Figures 9 and 10 display the effects of the NODE-MOE framework without and with the application of filter smoothing loss, respectively.
Figure 9と10は、それぞれフィルタ平滑化損失の適用なしおよびありにおけるNODE-MOEフレームワークの効果を示している。

figure9(p.16)

figure10(p.16)

Without the filter smoothing loss, as shown in Figure 9, the learned filters exhibit significant oscillations, making it challenging to discern their specific functions.
Figure 9に示されるように、フィルタ平滑化損失がない場合、学習されたフィルタは顕著な振動を示し、その特定の機能を識別することが困難になる。

In contrast, with the filter smoothing loss applied, as illustrated in Figure 10, the behavior of the filters becomes more distinct: filter 0 clearly functions as a low-pass filter, and filter 1 as a high-pass filter.
対照的に、Figure 10に示されるようにフィルタ平滑化損失が適用された場合、フィルタの挙動はより明確になる：フィルタ0はローパスフィルタとして、フィルタ1はハイパスフィルタとして明確に機能する。

Additionally, we assessed the training dynamics of the proposed Node-MoE framework by comparing performance with and without the filter smoothing loss, while keeping other hyperparameters constant.
さらに、他のハイパーパラメータを一定に保ちながら、フィルタ平滑化損失ありとなしの性能を比較することで、提案手法Node-MoEフレームワークの学習ダイナミクスを評価した。

For the Citeseer dataset, applying the filter smoothing loss resulted in a higher average training accuracy of $99.37 \pm 0.17$, compared to $93.51 \pm 1.27$ when the loss was not applied.
Citeseerデータセットでは、フィルタ平滑化損失の適用により平均訓練精度が$99.37 \pm 0.17$となり、損失が適用されなかった場合の$93.51 \pm 1.27$と比較して高くなった。

A similar pattern was observed on the Squirrel dataset, where the training accuracy was $96.54 \pm 1.42$ with the filter smoothing loss, versus $95.54 \pm 0.94$ without it.
同様のパターンがSquirrelデータセットでも観察され、フィルタ平滑化損失ありの訓練精度は$96.54 \pm 1.42$、なしの場合は$95.54 \pm 0.94$であった。

These results suggest that oscillations in the filters without the smoothing loss can hinder the model’s ability to fit the data effectively, resulting in suboptimal performance as shown in Section 4.4.
これらの結果は、平滑化損失なしのフィルタの振動がモデルのデータへの効果的な適合能力を妨げ、セクション4.4に示されるように次善の性能をもたらす可能性があることを示唆している。

C Datasets and Experimental Settings

C データセットと実験設定

In this section, we detail the datasets used and the experimental settings for both the baseline models and the proposed NODE-MOE framework.
本節では、使用したデータセットと、ベースラインモデルおよび提案手法NODE-MOEフレームワークの両方の実験設定を詳述する。

C.1 Datasets

C.1 データセット

We conduct experiments across seven widely recognized datasets, which encompass both homophilic and heterophilic types.
ホモフィリックタイプとヘテロフィリックタイプの両方を包含する、広く認知されている7つのデータセットにわたる実験を行う。

The homophilic datasets include Cora, CiteSeer, and Pubmed [24], along with ogbn-arxiv [43]; the heterophilic datasets comprise Chameleon, Squirrel, and Actor [22].
ホモフィリックデータセットにはCora、CiteSeer、Pubmed [24]およびogbn-arxiv [43]が含まれ、ヘテロフィリックデータセットにはChameleon、Squirrel、Actor [22]が含まれる。

For Cora, CiteSeer, and Pubmed, we generate ten random splits, allocating nodes into training, validation, and testing sets with proportions of 60%, 20%, and 20%, respectively.
Cora、CiteSeer、Pubmedについては、10個のランダム分割を生成し、ノードをそれぞれ60%、20%、20%の割合で訓練、検証、テストセットに分配する。

For the heterophilic datasets, we adhere to the ten fixed splits as defined in [22].
ヘテロフィリックデータセットについては、[22]で定義された10個の固定分割に従う。

The ogbn-arxiv dataset is assessed using its standard split as established by [43].
ogbn-arxivデータセットは[43]によって確立された標準分割を使用して評価される。

Detailed statistics of these datasets are shown in Table 2.
これらのデータセットの詳細な統計はTable 2に示されている。

Table2(p.16)

C.2 Experimental Settings

C.2 実験設定

For the baseline models, we adopt the same parameter setting in their original paper.
ベースラインモデルについては、元の論文と同じパラメータ設定を採用する。

For the proposed NODE-MOE, we adopt the GIN as gating and GCNII as experts.
提案手法NODE-MOEについては、GINをゲーティングとして、GCNIIをエキスパートとして採用する。

Notably, the GCNII model has different learning rates and weight decay for the filters and other parameters.
注目すべきことに、GCNIIモデルはフィルタと他のパラメータに対して異なる学習率と重み減衰を持つ。

All the hyperparameters are tuned based on the validation accuracy from the following search space:
全てのハイパーパラメータは、以下の探索空間から検証精度に基づいてチューニングされる：

Gating Learning Rate: {0.0001, 0.001, 0.01}
ゲーティング学習率：{0.0001, 0.001, 0.01}
Gating Dropout: {0, 0.5, 0.8}
ゲーティングDropout：{0, 0.5, 0.8}
Gating Weight Decay: {0, 5e-5, 5e-4}
ゲーティング重み減衰：{0, 5e-5, 5e-4}
Expert Learning Rate for Filters: {0.001, 0.01, 0.1}
フィルタ用エキスパート学習率：{0.001, 0.01, 0.1}
Expert Weight Decay for Filters: {0, 5e-5, 5e-3, 5e-2}
フィルタ用エキスパート重み減衰：{0, 5e-5, 5e-3, 5e-2}
Expert Learning Rate: {0.001, 0.01, 0.1, 0.5}
エキスパート学習率：{0.001, 0.01, 0.1, 0.5}
Expert Dropout: {0, 0.5, 0.8}
エキスパートDropout：{0, 0.5, 0.8}
Filter Smoothing loss weight: {0, 0.01, 0.1, 1}
フィルタ平滑化損失重み：{0, 0.01, 0.1, 1}
Load balancing weight for top-k gating: {0, 0.001, 0.01, 0.1, 1}
Top-kゲーティング用ロードバランシング重み：{0, 0.001, 0.01, 0.1, 1}
Number of experts: {2, 3, 5}
エキスパート数：{2, 3, 5}

For the initialization of filters in ChebNetII, which uses a $K$-order approximation, we employ a set of initialization strategies for the polynomial coefficients.
$K$次近似を使用するChebNetIIのフィルタの初期化については、多項式係数の初期化戦略のセットを採用する。

These strategies include: decreasing powers $[\alpha^0, \alpha^1, \cdots, \alpha^K]$, increasing powers $[\alpha^K, \alpha^{K-1}, \cdots, \alpha^0]$, and uniform values $[1, 1, \cdots, 1]$.
これらの戦略には以下が含まれる：減少べき乗$[\alpha^0, \alpha^1, \cdots, \alpha^K]$、増加べき乗$[\alpha^K, \alpha^{K-1}, \cdots, \alpha^0]$、および均一値$[1, 1, \cdots, 1]$。

For configurations with 2 or 3 experts, we set $\alpha = 0.9$.
2または3エキスパートの構成では$\alpha = 0.9$と設定する。

When expanding to 5 experts, we use two values of $\alpha$, setting them at 0.9 and 0.8, respectively, to diversify the response characteristics of the filters.
5エキスパートに拡張する場合、フィルタの応答特性を多様化するために$\alpha$の2つの値を使用し、それぞれ0.9と0.8に設定する。

We use a single GPU of NVIDIA RTX A5000 24Gb, to run the experiments.
実験の実行にはNVIDIA RTX A5000 24GbのGPUを1台使用する。

D Analysis of the proposed NODE-MOE

D 提案手法NODE-MOEの分析

In this section, we provide more analysis of the proposed NODE-MOE by comprehensive experiments.
本節では、包括的な実験により提案手法NODE-MOEのさらなる分析を提供する。

D.1 The behavior of NODE-MOE with 2 experts

D.1 2エキスパートによるNODE-MOEの挙動

The learned filters and the corresponding gating weights for nodes with different homophily levels are illustrated below.
学習されたフィルタと異なるホモフィリーレベルを持つノードに対応するゲーティング重みを以下に示す。

For the Chameleon dataset, these are displayed in Figure 11 for the filters and Figure 12 for the gating weights.
Chameleonデータセットについては、フィルタはFigure 11に、ゲーティング重みはFigure 12に表示されている。

Similarly, for the Citeseer dataset, the filters are shown in Figure 13 and the gating weights in Figure 14.
同様に、Citeseerデータセットについては、フィルタはFigure 13に、ゲーティング重みはFigure 14に示されている。

figure11(p.17)

figure12(p.17)

figure13(p.18)

figure14(p.18)

For both datasets, the learned filters demonstrate distinct characteristics: filter 0s function as low-pass filters, effectively smoothing signals, while filter 1s respond more strongly to high-frequency signals, characteristic of high-pass filters.
両方のデータセットにおいて、学習されたフィルタは明確な特性を示す：フィルタ0はローパスフィルタとして機能し、信号を効果的に平滑化する一方、フィルタ1は高周波信号により強く応答し、ハイパスフィルタの特性を示す。

Specifically, for the heterophilic dataset, such as Chameleon, the gating model generally assigns higher weights to filter 1, indicating a preference for high-pass filtering to accommodate the less homophilic nature of the dataset.
具体的には、Chameleonのようなヘテロフィリックデータセットでは、ゲーティングモデルは一般的にフィルタ1に高い重みを割り当て、データセットのホモフィリーの低い性質に対応するためのハイパスフィルタリングへの選好を示している。

Conversely, for the homophilic dataset, such as Citeseer, higher weights are typically assigned to filter 0, emphasizing low-pass filtering.
逆に、Citeseerのようなホモフィリックデータセットでは、通常フィルタ0に高い重みが割り当てられ、ローパスフィルタリングが強調される。

Moreover, within the Chameleon dataset, the weight assigned to the high-pass filter (filter 1) decreases as the homophily level increases.
さらに、Chameleonデータセット内では、ホモフィリーレベルが上昇するにつれてハイパスフィルタ（フィルタ1）に割り当てられる重みが減少する。

In contrast, in the Citeseer dataset, the weight to the low-pass filter (filter 0) increases with rising homophily levels.
対照的に、Citeseerデータセットでは、ホモフィリーレベルの上昇に伴いローパスフィルタ（フィルタ0）への重みが増加する。

This pattern supports our initial hypothesis: nodes with lower homophily are better served by high-pass filters to capture the dissimilarity among neighbors, while nodes with higher homophily benefit from low-pass filters to reinforce the similarity among neighboring nodes.
このパターンは我々の当初の仮説を支持する：ホモフィリーが低いノードは近傍間の非類似性を捉えるためにハイパスフィルタがより適しており、ホモフィリーが高いノードは近傍ノード間の類似性を強化するためにローパスフィルタの恩恵を受ける。

D.2 Accuracy analysis between the NODE-MOE with the individual expert

D.2 NODE-MOEと個別エキスパート間の精度分析

In this section, we evaluate the performance differences between the proposed Node-MoE framework and a single expert model, specifically ChebNetII, across nodes with varying homophily levels in the Chameleon dataset.
本節では、Chameleonデータセットにおける異なるホモフィリーレベルを持つノードにわたる、提案手法Node-MoEフレームワークと単一エキスパートモデル、具体的にはChebNetIIとの性能差を評価する。

figure15(p.18)

figure16(p.18)

The filter characteristics learned by the individual ChebNetII are depicted in Figure 15, which shows that this filter responds strongly to both low and high-frequency components.
個別のChebNetIIによって学習されたフィルタ特性はFigure 15に描かれており、このフィルタが低周波および高周波の両方の成分に強く応答することを示している。

The comparative performance analysis is presented in Figure 16.
比較性能分析はFigure 16に示されている。

From these results, it is evident that ChebNetII only outperforms Node-MoE when the node homophily level is exceptionally high.
これらの結果から、ChebNetIIがNode-MoEを上回るのはノードのホモフィリーレベルが例外的に高い場合のみであることが明らかである。

These findings indicate that the reliance on a single global filter, such as with ChebNetII, can lead to suboptimal performance.
これらの知見は、ChebNetIIのような単一のグローバルフィルタへの依存が次善の性能につながる可能性があることを示している。

While such a filter might fit exceptionally well for nodes of a particular type, it may not adequately address the needs of nodes in other groups, thereby limiting overall effectiveness across diverse node types.
そのようなフィルタは特定のタイプのノードに対して例外的に良く適合する可能性があるが、他のグループのノードのニーズに十分に対処できない可能性があり、多様なノードタイプにわたる全体的な有効性を制限する。