The following tutorial covers how to download datasets from Kaggle using the Linux command line tools in a virtual machine running on Google Cloud Platform.
This tutorial breaks this task down into multiple steps including:
In Kaggle: Download API Token file and then locate a data set
In Google Cloud:
- Create a Compute Engine VM
- Upload API Token file
- Set up Python environment
- Download Kaggle Data set
- Unzip Kaggle Data set
- Copy files to Google Cloud Storage
If you would prefer a video presentation of these materials, please use this link to YouTube
Prerequisites
Before starting this tutorial, make sure you create an account on Kaggle.com and then log in to your Kaggle.com account.
Log in to your Google Cloud Platform account and make sure you have enabled the Compute Engine API
Tutorial Sections
- Part I Downloading the Kaggle API Token File
- Part II Locating a Dataset on Kaggle
- Part III Creating a Virtual Machine in Google Cloud Platform
- Part IV Connecting to the Linux instance with secure shell
- Part V Upload and copy the Kaggle API Token file
- Part VI Install additional software packages and set up a Python development environment
- Part VII Downloading Datasets from Kaggle
- Part VIII Unzip the Archive file to extract the individual data files
- Part IX Copying Files to Google Cloud Storage
- Part X Stopping and Deleting the Linux Instance
Get started on the tutorial by downloading the Kaggle API Token file as described on the next page.