Lab: Explore Data with Pandas

Objectives

Load data into Juypter notebooks by using RHODS and Pandas.

Steps

Open the RHODS dashboard.
1. In a web browser, navigate to the Web Console of your Red Hat OpenShift cluster, and log in.
2. Click the applications menu in the top navigation bar of OpenShift, then click Red Hat OpenShift Data Science.
3. If prompted, log in with your Red Hat OpenShift credentials.
Configure the workbench of your data science project.

A RHODS workbench is a containerized application that includes commonly used data science tools and libraries, such as JupyterLab, Tensorflow, and PyTorch. RHODS provides you with a collection of workbench container images, each one preconfigured and tailored to a specific data science use case.
1. Click Data Science Projects in the left sidebar.
2. Create a data science project. Click Create data science project. In the modal window that opens, enter a name and click Create.
  
  If you are using Red Hat OpenShift from the developer sandbox, then a project is already created for you.
3. Click the newly created project.
4. In the project page, click Create workbench and complete the form with the following values.
  
  Name
  
  data-load
  
  Notebook image - Image selection
  
  Standard Data Science
  
  Notebook image - Version selection
  
  Select the recommended option
  
  Do not modify the default values of the rest of the fields.
5. Click Create workbench. RHODS creates the workbench and the associated persistent storage.
Configure a data connection.

A data connection provides the workbench with access to a storage layer. In this demo, you use the storage layer to save the trained model.

Additionally, a data connection also configures RHODS Model Serving with the required settings to download the model to be served.

If you do not have access to an S3 bucket, you can continue to the next step.
1. Click Add data connection.
2. In the name field, enter data-load-data-connection.
3. Complete the AWS_* fields with the connection details of an S3-compatible API.
  
  This example uses IBM Cloud Object Storage, but you can use any storage service that provides an S3 API.
4. In the Connected workbench field, select lab to assign this data connection to the lab workbench.
5. Click Add data connection. This data connection injects the S3 configuration values as environement variables in the pytorch workbench. RHODS restarts the worbench to inject the variables.
Open the workbench and clone the repository.
1. Make sure that the lab workbench is running and click Open.
2. If prompted, log in with your Red Hat OpenShift credentials.
3. Click Allow selected permissions to grant the workbench access to your data science project.
4. Verify that the JuyperLab interface opens in a new browser tab.
5. Click the Git icon in the left sidebar.
6. Click Clone a repository.
7. Enter https://github.com/RedHatTraining/rhods-quick-course.git as the repository, and click Clone.
8. In the file explorer, navigate to the rhods-quick-course/exercises directory.
Open the data-pandas.ipynb notebook and follow the instructions.