Lab: Explore Data with Pandas

Objectives

  • Load data into Juypter notebooks by using RHODS and Pandas.

Steps

  1. Open the RHODS dashboard.

    1. In a web browser, navigate to the Web Console of your Red Hat OpenShift cluster, and log in.

    2. Click the applications menu in the top navigation bar of OpenShift, then click Red Hat OpenShift Data Science.

      ocp top bar
    3. If prompted, log in with your Red Hat OpenShift credentials.

  2. Configure the workbench of your data science project.

    A RHODS workbench is a containerized application that includes commonly used data science tools and libraries, such as JupyterLab, Tensorflow, and PyTorch. RHODS provides you with a collection of workbench container images, each one preconfigured and tailored to a specific data science use case.

    1. Click Data Science Projects in the left sidebar.

      rhods side menu
    2. Create a data science project. Click Create data science project. In the modal window that opens, enter a name and click Create.

      If you are using Red Hat OpenShift from the developer sandbox, then a project is already created for you.

    3. Click the newly created project.

    4. In the project page, click Create workbench and complete the form with the following values.

      Name

      data-load

      Notebook image - Image selection

      Standard Data Science

      Notebook image - Version selection

      Select the recommended option

      Do not modify the default values of the rest of the fields.

    5. Click Create workbench. RHODS creates the workbench and the associated persistent storage.

  3. Configure a data connection.

    A data connection provides the workbench with access to a storage layer. In this demo, you use the storage layer to save the trained model.

    Additionally, a data connection also configures RHODS Model Serving with the required settings to download the model to be served.

    If you do not have access to an S3 bucket, you can continue to the next step.

    1. Click Add data connection.

    2. In the name field, enter data-load-data-connection.

    3. Complete the AWS_* fields with the connection details of an S3-compatible API.

      data connection form

      This example uses IBM Cloud Object Storage, but you can use any storage service that provides an S3 API.

    4. In the Connected workbench field, select lab to assign this data connection to the lab workbench.

    5. Click Add data connection. This data connection injects the S3 configuration values as environement variables in the pytorch workbench. RHODS restarts the worbench to inject the variables.

  4. Open the workbench and clone the repository.

    1. Make sure that the lab workbench is running and click Open.

      workbench open link
    2. If prompted, log in with your Red Hat OpenShift credentials.

    3. Click Allow selected permissions to grant the workbench access to your data science project.

    4. Verify that the JuyperLab interface opens in a new browser tab.

    5. Click the Git icon in the left sidebar.

    6. Click Clone a repository.

      git clone menu
    7. Enter https://github.com/RedHatTraining/rhods-quick-course.git as the repository, and click Clone.

    8. In the file explorer, navigate to the rhods-quick-course/exercises directory.

  5. Open the data-pandas.ipynb notebook and follow the instructions.