Hi! This is a pyspark implementation of Pagerank which can be run on the Microsoft Azure Spark cluster.
We define the convergence metric where in-between two iterations of the algorithm, the set of top 10 nodes with the highest PageRank score, and the relative order between them, does not change.
This is a competition for the data center visit from the Big Data course in ETH Zurich. Thanks Microsoft for sponsorship of Azure for academic use.