1. Introduction
Vertex embedding is the task of learning representations of graph vertices in a continuous vector space for use in downstream tasks, such as link prediction and vertex classification
(Yan et al., 2006). The primary classical approach for this task is spectral embedding: vertices are represented by their corresponding values in the smallest eigenvectors of the graph Laplacian. Spectral embedding methods include the ShiMalik normalized cuts algorithm
(Shi and Malik, 2000), Laplacian Eigenmaps (Belkin and Niyogi, 2003), and spectral partitioning methods for stochastic block models (McSherry, 2001). They also include many spectral clustering methods, which apply spectral embeddings to general datasets by first transforming them into a graph based on data point similarity (Ng et al., 2002).The spectral embedding approach has recently been exceeded in predictive performance on many tasks by methods inspired by Word2vec (Mikolov et al., 2013), which performs the related word embedding task. Word2vec forms representations for words based on the frequency with which they cooccur with other words, called context words, within a fixed distance in natural text. The DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015), and node2vec (Grover and Leskovec, 2016) algorithms, among others, adapt this idea to network data. In particular, DeepWalk takes several random walks on the network, treating the vertices as words, and treating the walks as sequences of words in text. It has been shown in (Levy and Goldberg, 2014) that the representations learned by this approach are implicitly forming a lowdimensional factorization of a known matrix, which contains the pointwise mutual information (PMI) of cooccurences between nodes in the random walks. Work by Qiu et al. (Qiu et al., 2018) gives a closed form expression for this matrix and shows a connection to the normalized graph Laplacian. This motivates their NetMF algorithm, which performs a direct factorization, improving on the performance of DeepWalk on a number of tasks.
In this work, we consider DeepWalk in the limit as the window size goes to infinity. We derive a simple expression for the PMI matrix in this limit:
(1) 
where is the degree matrix, is the trace of (twice the number of edges in ), is the normalized Laplacian (i.e. ), and is the allones matrix. is the pseudoinverse of . One can show that is equal to the unnormalized Laplacian pseudoinverse , the central object in classic spectral embeddings, up to a rank2 component (see (6) in Section 3.3). Thus, is equal to plus at most a rank3 component and a diagonal matrix.
Not surprisingly, embeddings formed directly using a low dimensional factorization of itself perform poorly on downstream tasks compared to DeepWalk and other skipgram methods. However, we show that after an elementwise application of a linear function followed by a logarithm, in particular, , approximates the finite PMI matrix. Embeddings formed by factorizing this transformed matrix are competitive with DeepWalk and nearly competitive with the NetMF method of Qiu et al (Qiu et al., 2018), when evaluated on multilabel node classification tasks. We call our method InfiniteWalk.
Note that the window hyperparameter only appears in the entrywise nonlinearity in InfiniteWalk and not in the formula for . This is perhaps surprising, as is a key hyperparameter in existing methods. Our results suggest that ’s importance lies largely in determining the shape of the nonlinearity applied. Since is closely related to the Laplacian pseudoinverse, the key difference between DeepWalk and classic spectral embedding seems to be the application of this nonlinearity.
In more detail, note that our results show that InfiniteWalk well approximates DeepWalk by forming a lowrank factorization of a nonlinear entrywise transformation of . Classic spectral embedding and clustering methods (Shi and Malik, 2000; McSherry, 2001; Belkin and Niyogi, 2003)
embed nodes using the eigenvectors corresponding to the smallest eigenvalues of the Laplacian
(or a variant of this matrix), which are the largest eigenvalues of . Thus, these methods can be viewed as embedding nodes using an optimal lowdimensional factorization of (lying in the span of ’s top eigenvectors), without applying an entrywise nonlinearity.Inspired by this observation, we simplify the idea of a nonlinear transformation of the Laplacian pseudoinverse even further: thresholding it to a binary matrix. In particular, we form the binary matrix
where is an element of
itself of some hyperparameter quantile (e.g. the median or the
^{th} percentile element). Surprisingly, embeddings from factorizing this binary matrix are also competitive with DeepWalk and the method of Qiu et al. on many tasks.Broadly, our results strengthen the theoretical connection between classical methods based on factorizing the graph Laplacian and more recent “deep” methods for vertex embedding. They suggest that these methods are not as different conceptually as they may seem at first glance.
2. Background and Related Work
We begin by surveying existing work on skipgrambased network embeddings and their connections to matrix factorization, which our work builds on.
2.1. SkipGram
In word embedding models, words are typically treated as both words and contexts. A context is simply a word that appears within a window of length of another word. As formalized in (Goldberg and Levy, 2014), given a dataset of words , contexts (typically ), and wordcontext cooccurrence pairs , the “skipgram” model for training word and context embeddings , (Mikolov et al., 2013) has the following loglikelihood objective:
We can see that the objective encourages cooccuring pairs to have similar embeddings with large dot product . This exact objective is not used as repeatedly evaluating the partition function is too computationally expensive; (Mikolov et al., 2013) proposes a substitute: the skipgram with negative sampling (SGNS). Here, an auxiliary ‘negative’ dataset consisting of random pairs not appearing in is used. Pairs in this negative set are encouraged to have dissimilar embeddings with small .
2.2. Implicit PMI Matrix
Levy and Goldberg (Levy and Goldberg, 2014) prove that SGNS implicitly factors a matrix whose entries gave the pointwise mutual information (PMI) of occurrence of a word and occurrence of a context . Given a dataset of these cooccurrences, an element of this matrix is given by
where is the ratio of negative samples to positive samples. Their proof assumes that the dimensionality of the embedding is at least the cardinality of the word/context set; the analysis of Li et al. (Li et al., 2015) relaxes assumptions under which this equivalence holds. Several works, including (Arora et al., 2016) and (Gittens et al., 2017), propose generative models for wordcontext cooccurrence to explain the effectiveness of PMIbased methods in linearizing semantic properties of words. More recently, the analysis of Allen et al. in (Allen et al., 2019) and (Allen and Hospedales, 2019) has provided explanations of this phenomenon based on geometric properties of the PMI matrix, without the strong assumptions required by the generative models. The extensive research and results on the skipgram PMI matrix make it an intrinsically interesting object in representation learning.
2.3. Networks
The DeepWalk method (Perozzi et al., 2014) applies the SGNS model to networks, where the word and context sets are the nodes in the network, and the cooccuring pairs are node pairs that appear within a window of length hops in a set of length random walks run on the graph. Qiu et al. (Qiu et al., 2018) derived the following expression for the PMI matrix in the context of random walks on undirected, connected, and nonbipartite graphs for DeepWalk: in the limit as the number of walks originating at each node and walk length , it approaches
(2) 
where is an elementwise natural logarithm, (the “volume” of the graph) is the sum of the degrees of all of the vertices, and is the random walk transition matrix.
Rather than sampling from random walks as in DeepWalk, the NetMF algorithm of Qiu et al. explicitly calculates and factors this matrix to produce embeddings, outperforming DeepWalk significantly on multilabel vertex classification tasks. For low , NetMF manually computes the exact sum, whereas for high , it computes a lowrank approximation via SVD of the symmetrized transition matrix, . While Qiu et al. analyze the effect of increasing on the spectrum of the resulting matrix, they do not pursue the limit, stopping at as in the original DeepWalk paper. We show that this limiting matrix is both meaningful and simple to express.
2.4. Other Approaches
Some other node embedding algorithms share significant similarities with DeepWalk. Qiu et al (Qiu et al., 2018) showed the LINE method to also be implicit matrix factorization, though its algorithm is based on edge sampling rather than sampling from random walks. In particular, its factorized matrix is a special case of the DeepWalk matrix with . We include the performance of LINE in our empirical results. node2vec (Grover and Leskovec, 2016) is a generalization of DeepWalk which uses secondorder random walks: the distribution of the following node in node2vec walks depends on the current and preceding nodes rather than only the current node as in DeepWalk. Hyperparameters allow the walk to approach BFSlike or DFSlike behavior as desired, which the authors assert extract qualitatively different information about node similarities.
Several architectures for applying convolutional neural networks to network data in an endtoend fashion have been developed in the past few years, including the graph convolutional networks (GCNs) of
(Kipf and Welling, 2016) and (Defferrard et al., 2016), and some methods leverage these architectures to produce node embeddings: for example, Deep Graph Infomax (Veličković et al., 2018) uses GCNs to maximize a mutual information objective involving patches around nodes. Recent work from Wu et al. (Wu et al., 2019)shows that much of the complexity of GCNs comes from components inspired by other forms of deep learning that have limited utility for network data. In the same way, we seek to further the investigation of the core principles of “deep” network embeddings apart from their inspiration in word embedding and neural networks. We note that, like DeepWalk, and the related methods, we focus on unsupervised embeddings, derived solely from a graph’s structure, without training, e.g., on vertex labels.
3. Methodology
We now present our main contributions, which connect DeepWalk in the infinite window limit to classic spectral embedding with a nonlinearity. We discuss how this viewpoint clarifies the role of the window size parameter in DeepWalk and motivates a very simple embedding technique based on a binary thresholding of the graph Laplacian pseudoinverse.
3.1. Derivation of Limiting PMI Matrix
We start by showing how to simplify the expression in (2) for the DeepWalk PMI given by (Qiu et al., 2018) in the limit . We first establish some wellknown facts about random walks on graphs. First, is welldefined under our assumption that the graph is undirected, connected, and nonbipartite. It is rank and equal to , where is a column vector of ones and
is the probability mass of each vertex in the stationary distribution of the random walk as a column vector. Note that
. That is, the probability mass of a vertex in the stationary distribution is proportional to its degree. We let denote the diagonal matrix with entries .Let and be the eigenvalue and eigenvector of the symmetrized transition matrix . We have and . From (Levinson, ), for any positive integer ,
(3) 
where . We rewrite the expression in (3) for and the expression (2) of Qiu et al. for the PMI matrix, setting the negative sampling ratio to for the latter (i.e., one negative sample per positive sample):
Substituting the former into the latter, then rearranging the order of the summations and applying the geometric series formula yields
Now we consider the limit as . Define
Since for (Levinson, ), the terms go to as and we have:
Since as , for any constant real number , as . We apply this identity elementwise, then simplify:
where the last step follows from being the elementwise square root of . Note that the above equation gives the expression for in (1), since . Similar analysis also leads to the following general expressions for the finite PMI matrix:
(4) 
3.2. Approximation of Finite PMI Matrix via Limiting PMI Matrix
Note that the expression (3.1) above for the finite matrix, when multiplied by differs from the limiting matrix only by the term , which vanishes as . So, we may approximate the finite matrix by simply dropping this term as follows:
(5) 
where is any ramp function to ensure that the argument of the logarithm is positive. In our implementation, we use the function . We use the 64bit floatingpoint machine precision limit () for . Note that the NetMF method of (Qiu et al., 2018) uses ; we find that a small positive value consistently performs better. Both ramping functions can be interpreted as producing the positive shifted PMI matrix (shifted PPMI) matrix introduced by Levy and Goldberg (Levy and Goldberg, 2014). Other methods of avoiding invalid arguments to the logarithm are an interesting avenue for future work.
Note that the accuracy of (3.2) is limited by the second largest eigenvalue of , which is known as the Fiedler eigenvalue. Smaller magnitudes of this eigenvalue are correlated with a faster mixing rate (Levinson, ), the rate at which as increases. In Section 4.2 we show that for typical graphs, the Fiedler eigenvalue is relatively small, and so the approximation is very accurate for large , e.g., , which is a typical setting for DeepWalk. The approximation is less accurate for small , e.g., (See Table 2.)
Effect of the Window Size
Intuitively, the effect of in (3.2) is to control the “strength” of the logarithm nonlinearity, since, as noted previously, for any real constant , as . That is, the logarithm becomes a simple linear scaling in this limit. As we will see, even when the approximation of (3.2) in inaccurate (when is very low) this approximated matrix qualitatively has similar properties to the actual window PMI matrix, and produces similar quality embeddings, as measured by performance in downstream classification tasks. This finding suggests that the strength of the logarithm nonlinearity can influence the locality of the embedding (as modulated in DeepWalk by the window size ) independently of modifying the arguments of the nonlinearity, which contain the actual information from the network as filtered by length windows.
3.3. Binarized Laplacian Pseudoinverse
Motivated by the view of DeepWalk as a variant of classic Laplacian factorization with an entrywise nonlinearity, we investigate a highly simplified version of InfiniteWalk. We construct a binary matrix by 1) computing the pseudoinverse of the unnormalized Laplacian , 2) picking a quantile hyperparameter , 3) determining the quantile element value, and 4) setting all elements less than this value to and others to . In other words, an element of this matrix is given by , where is the quantile element of . We then produce embeddings by partially factoring this matrix as with the PMI matrices. Interestingly, this can be interpreted as factorizing the adjacency matrix of an implicit derived network whose sparsity is determined by . Gaining a better understanding of the structure and interpretation of this network is an interesting direction from future work.
Note that in this method, we use the unnormalized Laplacian rather than the normalized Laplacian , which appears in the expression (1) for . This is because, as we will show, the limiting PMI matrix is equal to the pseudoinverse of the unnormalized Laplacian up to a rank3 term and a diagonal adjustment. We can rewrite our expression for the limiting matrix by expanding the normalized Laplacian in terms of the normalized Laplacian as follows:
Consider the underbraced term above containing . If this term had a true inverse rather than a pseudoinverse, the four factors involving the degree matrix would simply cancel. Instead, application of a variant of the ShermanMorrison formula for pseudoinverses (Meyer, 1973) results in the following expression for this term:
This yields the following alternate expression for the limiting PMI matrix:
(6) 
In our context of binarizing
by a quantile, note that addition by the allones matrix and multiplication by does not affect the ranks of the elements within the matrix, and the subtraction by the diagonal matrix affects relatively few elements. Hence we might expect binarizing by thresholding on quantiles to have a similar effect as binarizing the limiting PMI matrix itself.Binarization is arguably one of the simplest possible methods of augmenting the Laplacian with a nonlinearity. As we will see, this method has good downstream performance compared to DeepWalk and related methods. We argue that this suggests that the core advancement of deep vertex embeddings over classical spectral embedding methods based on factorizing the Laplacian is application of a nonlinearity.
4. Experimental Setup
We now discuss how we empirically validate the performance of the limiting PMI matrix method presented in Section 3.2 and the Laplacian pseudoinverse binarization method of Section 3.3.
4.1. Data Sets
We use three of the four datasets uses in the evaluation of the NetMF algorithm (Qiu et al., 2018). Table 1 provides network statistics. Figure 1 provides the eigenvalue distribution of the symmetrized random walk matrix for each network.
BlogCatalog ^{1}^{1}1socialcomputing.asu.edu/datasets/BlogCatalog3 (Agarwal et al., 2009) is a social network of bloggers. The edges represent friendships between bloggers, and vertex labels represent group memberships corresponding to interests of bloggers.
ProteinProtein Interaction (PPI) ^{2}^{2}2snap.stanford.edu/biodata/datasets/10000/10000PPPathways.html (Stark et al., 2010) is a subgraph of the PPI network for Homo Sapiens. Vertices represent proteins, edges represent interactions between proteins, and vertex labels represent biological states. We use only the largest connected component, which has over of the nodes.
Wikipedia ^{3}^{3}3mattmahoney.net/dc/text.html is a cooccurrence network of words from a portion of the Wikipedia dump. Nodes represent words, edges represent cooccurrences within windows of length in the corpus, and labels represent inferred part of speech (POS) tags.
Dataset  BlogCatalog  PPI  Wikipedia 

10,312  3,852  4,777  
333,983  76,546  184,812  
Fiedler Eigenvalue  0.568  0.800  0.504 
# Labels  39  50  40 
4.2. Procedure
Implementation
See Algorithm 1 for the pseudocode of our limiting PMI matrix method. We use the NumPy (Oliphant, 2006) and SciPy (Jones et al., 2001) libraries for our implementation. The most expensive component of the algorithm is the pseudoinversion of the graph Laplacian. While there is significant literature on approximating this matrix, or vector products with it (Spielman and Teng, 2004; Koutis et al., 2014; Kelner et al., 2013; Ranjan et al., 2014), we simply use the dense SVDbased function from NumPy. For graphs of larger scale, this method would not be practical. The truncated sparse eigendecomposition is handled via SciPy’s packer to ARPACK (Lehoucq et al., 1998), which uses the Implicitly Restarted Lanczos Method. As in (Qiu et al., 2018), to generate a dimensional embedding, we return the singular vectors corresponding to the
largest singular values, scaling the dimensions of the singular vectors by the square roots of the singular values. For classification, we use the scikitlearn
(Pedregosa et al., 2011) packer to LIBLINEAR (Fan et al., 2008).Evaluation Setting
To investigate the usefulness and meaningfulness of the limiting PMI matrix, we evaluate the quality of embeddings generated by its truncated SVD after applying the elementwise ramplogarithm described in Section 3.2. For this task, we closely follow the same procedure as in (Perozzi et al., 2014) and (Qiu et al., 2018)
. We use onevsrest logistic regression on the embeddings for multilabel prediction on the datasets. The classifiers employ L2 regularization with inverse regularization strength
. Classifiers are trained on a portion of the data, with the training ratio being varied from to ; the remainder is used for testing. As in (Perozzi et al., 2014) and (Qiu et al., 2018), we assume that the number of labels for each test example is given. In particular, given that a vertex is assigned labels, the classifier predicts exactly the classes to which it assigned the highest probability. We use the mean scores over random splits of the training and test data for each training ratio. We evaluate the microF1 and macroF1 scores of classifiers using our embedding.Hyperparameter Choices
We compare our results to those of DeepWalk (Perozzi et al., 2014), LINE (Tang et al., 2015), and NetMF (Qiu et al., 2018) as reported in (Qiu et al., 2018). The hyperparameters used for DeepWalk are the preferred default settings of its authors: window size , walk length , and number of walks starting from each vertex . Results from the secondorder variant of LINE are reported. As the authors of NetMF report results for window sizes and , we similarly report results for InfiniteWalk with and . We expect the results of InfiniteWalk, as an approximation of the NetMF method in the high limit, to at least be roughly similar for the higher setting. We also include results with our limiting
matrix, though only for illustrative purposes. As the limiting matrix is essentially a simple linear transformation of the graph Laplacian’s pseudoinverse, we expect embeddings derived thereof to perform relatively poorly. The entrywise nonlinearity seems to be critical. The embedding dimensionality is
for all methods as in both (Perozzi et al., 2014) and (Qiu et al., 2018).4.3. Binarized Laplacian Pseudoinverse
We implement and evaluate the simplified method of factorizing a binarized version of the unnormalized Laplacian pseudoinverse (described in Section 3.3) in the same way. We present results for the best values of quantile hyperparameter found by rough grid search. As with the window size , the best value is expected to vary across networks. We compare to the performance of NetMF, the sampling methods LINE and DeepWalk, and classical methods  since the normalized and unnormalized Laplacians themselves both perform poorly on these tasks, we compare to factorizing the adjacency matrix itself. Again, since inverting and binarizing is one of the simplest possible nonlinearities to apply the Laplacian, good downstream performance suggests that the addition of a nonlinearity is the key advancement of deep node embeddings from classical embeddings of the Laplacian itself.
5. Results
We now discuss our experimental results on both the limiting PMIbased algorithm and the simple Laplacian binarization algorithm.
Dataset  BlogCatalog  PPI  Wikipedia 

Error  0.001273  0.04152  1.355 
Ramped Elements  0.0004901  0.002521  0.08892 
5.1. PMI Approximation Error
In Table 2 we show how well closely the PMI approximation given by (3.2) matches the true PMI matrix. We can see from Table 1 that the Fiedler eigenvalues of our graphs are bounded away from . Thus, as expected, the approximation of the finite PMI matrix via the limiting matrix is quite close for the networks, but not so for the one. This holds both before and after application of the logramp nonlinearity, which is applied in the NetMF method (Qiu et al., 2018). Across the surveyed networks and values of , the elements which are affected by the ramping component of the nonlinearity are similar between our approximation and the true PMI matrix. The accurate approximation at explains why InfiniteWalk performs similarly on downstream classification tasks. Interestingly, at , InfiniteWalk performs competitively, in spite of inaccurate approximation.
5.2. MultiLabel Classification
In Figure 2 we show downstream performance of embeddings based on the limiting approximation, compared to other methods. Across both metrics and all datasets, NetMF and InfiniteWalk are generally or par with or outperform the samplingbased methods, LINE and DeepWalk. As observed in (Qiu et al., 2018), DeepWalk and NetMF with have better overall performance than LINE and NetMF with on the BlogCatalog and PPI networks, while the inverse is true for the Wikipedia network. This suggests that shorterrange dependencies better capture Wikipedia’s network structure. As expected, InfiniteWalk with the nonlinearity performs better than the version with the nonlinearity on the former two datasets, while the inverse is true for Wikipedia. In all cases, the factorization of the PMI matrix itself performs poorly. These findings support our hypothesis that changing the strength of the logarithm nonlinearity can largely emulate the effect of actually changing the window size in sampling and deterministic approaches.
While maximizing downstream performance is not the focus of our work, we observe that, overall, InfiniteWalk has performance competitive with if slightly inferior to NetMF (see Figure 3). On BlogCatalog, InfiniteWalk underperforms relative to NetMF. On PPI, InfiniteWalk outperforms NetMF when less than half the data is used for training, but underperforms when given more training data. On Wikipedia, InfiniteWalk underperforms relative to NetMF on macroF1 score, but outperforms NetMF on microF1 score.
Binarized Laplacian Pseudoinverse
In Figure 4 we show down stream performance of our simple method based on factorizing the binarized Laplacian pseudoinverse. This method performs remarkably well on both networks. On PPI, it matches the performance of NetMF, and on BlogCatalog, it is nearly on par again, accounting for most of the improvement from the classical method. On the network, Wikipedia, it is less successful, especially on MacroF1 error, but still improves on the classical method. These result again support our hypothesis that the key ingredient of improved node embeddings seems to be the application of a nonlinearity to the Laplacian pseudoinverse.
Elements of Limiting PMI Matrices
Since we are introducing the limiting PMI matrix as an object for future investigation, we also give a preliminary qualitative description of its elements. See Figure 5 for a visualization of the element distribution for the three networks we investigate. Across these networks, these matrices tend to contain mostly small negative elements, corresponding to weak negative correlations between nodes, as well as some large positive values, corresponding to strong positive correlations. The distributions overall have similar shapes, and, interestingly, have roughly the same ratios of negative values to positive values, corresponding to roughly the same ratios of negative correlations to positive correlations.
6. Discussion and Conclusions
In this work we have simplified known expressions for the finite network PMI matrix and derived an expression for the matrix in terms of the pseudoinverse of the graph Laplacian. This expression strengthens connections between SGNSbased and class spectral embedding methods.
We show that, with a simple scaled logarithm nonlinearity, this limiting matrix can be used to approximate finite matrices which yield competitive results on downstream vertex classification tasks. This finding suggests that the core mechanism of SGNS methods as applied to networks is a simple nonlinear transformation of classical Laplacian embedding methods. We even find that just binarizing the Laplacian pseudoinverse by thresholding often accounts for most of the performance gain from classical approaches, suggesting again the important of the nonlinearity.
Future Work
We view our work as a step in understanding the core mechanisms of SGNSbased embedding methods. However many open questions remain.
For example, one may ask why the scaled logarithm nonlinearity is a good choice. Relatedly, how robust is performance to changes in the nonlinearity? Our results on binarization of the Laplacian pseudoinverse indicate that it may be quite robust, but this is worth further exploration. Finally, as discussed, our binarization method can be viewed as producing the adjacency matrix of a graph based on the Laplacian pseudoinverse, and then directly factoring this matrix. Understanding how this derived graph relates to the input graph would be a very interesting next step in understanding the surprisingly competitive performance of this method.
Additionally, as discussed previously, node2vec (Grover and Leskovec, 2016) is a generalization of DeepWalk that adds additional hyperparameters to create secondorder random walks. Qiu et al. (Qiu et al., 2018) also provide an expression for the matrix that is implicitly factored by node2vec, so pursuing the limit of this matrix may provide insight into node2vec and an interesting generalization of DeepWalk’s limiting PMI matrix.
References
 A social identity approach to identify familiar strangers in a social network. In Third International AAAI Conference on Weblogs and Social Media, Cited by: §4.1.
 What the vec? towards probabilistically grounded embeddings. In Advances in Neural Information Processing Systems, pp. 7465–7475. Cited by: §2.2.

Analogies explained: towards understanding word embeddings.
In
International Conference on Machine Learning
, pp. 223–231. Cited by: §2.2.  A latent variable model approach to pmibased word embeddings. Transactions of the Association for Computational Linguistics 4, pp. 385–399. Cited by: §2.2.
 Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15 (6), pp. 1373–1396. Cited by: §1, §1.
 Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.4.
 LIBLINEAR: a library for large linear classification. Journal of machine learning research 9 (Aug), pp. 1871–1874. Cited by: §4.2.
 Skipgram zipf+ uniform= vector additivity. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 69–76. Cited by: §2.2.
 Word2vec explained: deriving mikolov et al.’s negativesampling wordembedding method. arXiv preprint arXiv:1402.3722. Cited by: §2.1.
 Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855–864. Cited by: §1, §2.4, §6.

SciPy: open source scientific tools for Python
. External Links: Link Cited by: §4.2. 
A simple, combinatorial algorithm for solving sdd systems in nearlylinear time.
In
Proceedings of the fortyfifth annual ACM symposium on Theory of computing
, pp. 911–920. Cited by: §4.2.  Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.4.
 Approaching optimality for solving sdd linear systems. SIAM Journal on Computing 43 (1), pp. 337–354. Cited by: §4.2.
 ARPACK users’ guide: solution of largescale eigenvalue problems with implicitly restarted arnoldi methods. Vol. 6, Siam. Cited by: §4.2.
 [16] An eigenvalue representation for random walk hitting times and its application to the rook graph. External Links: Link Cited by: §3.1, §3.2.
 Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2177–2185. External Links: Link Cited by: §1, §2.2, §3.2.

Word embedding revisited: a new representation learning and explicit matrix factorization perspective.
In
TwentyFourth International Joint Conference on Artificial Intelligence
, Cited by: §2.2.  Spectral partitioning of random graphs. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pp. 529–537. Cited by: §1, §1.
 Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics 24 (3), pp. 315–323. Cited by: §3.3.
 Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pp. 3111–3119. Cited by: InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity, §1, §2.1.

On spectral clustering: analysis and an algorithm
. In Advances in Neural Information Processing Systems, pp. 849–856. Cited by: §1.  A guide to numpy. Vol. 1, Trelgol Publishing USA. Cited by: §4.2.
 Scikitlearn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §4.2.
 Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity, §1, §2.3, §4.2, §4.2.
 Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity, §1, §1, §2.3, §2.4, §3.1, §3.2, §4.1, §4.2, §4.2, §4.2, Figure 3, §5.1, §5.2, §6.

Incremental computation of pseudoinverse of laplacian.
In
International Conference on Combinatorial Optimization and Applications
, pp. 729–749. Cited by: §4.2.  Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence 22 (8), pp. 888–905. Cited by: §1, §1.
 Nearlylinear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirtysixth annual ACM symposium on Theory of computing, pp. 81–90. Cited by: §4.2.
 The biogrid interaction database: 2011 update. Nucleic acids research 39 (suppl_1), pp. D698–D704. Cited by: §4.1.
 Line: largescale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §1, §4.2.
 Deep graph infomax. arXiv preprint arXiv:1809.10341. Cited by: §2.4.
 Simplifying graph convolutional networks. Proceedings of Machine Learning Research. Cited by: §2.4.
 Graph embedding and extensions: a general framework for dimensionality reduction. IEEE transactions on pattern analysis and machine intelligence 29 (1), pp. 40–51. Cited by: §1.
Comments
There are no comments yet.