PySpark examples running on Azure Databricks to analyze sample Microsoft Academic Graph Data on Azure storage.
Before running these examples, you need to complete the following setups:
-
Setting up provisioning of Microsoft Academic Graph to an Azure blob storage account. See Get Microsoft Academic Graph on Azure storage.
-
Setting up Azure Databricks service. See Set up Azure Databricks.
Before you begin, you should have these items of information:
✔️ The name of your Azure Storage (AS) account containing MAG dataset from Get Microsoft Academic Graph on Azure storage.
✔️ The access key of your Azure Storage (AS) account from Get Microsoft Academic Graph on Azure storage.
✔️ The name of the container in your Azure Storage (AS) account containing MAG dataset.
✔️ The name of the output container in your Azure Storage (AS) account.
-
git clone https://github.com/Azure-Samples/microsoft-academic-graph-pyspark-samples.git
-
Follow instructions in PySpark analytics samples for Microsoft Academic Graph to run PySpark scripts in this repository.