Skip to main content

`DataFrame` Indexing and Loading

Introduction to CSV Files

  • CSV (Comma Separated Values): A lightweight, ubiquitous format for data storage.
    • Commonly used by spreadsheet software like Excel and Google Sheets.
    • Format is flexible but lacks a strict specification, leading to possible variations in structure.

Viewing CSV File Contents

  • Viewing CSV Files with Shell Commands:
    • Jupyter notebooks can integrate shell commands using an exclamation mark (!).
    • Example: Using !cat to display the contents of a CSV file:
      !cat datasets/Admission_Predict.csv

Loading CSV Data into a DataFrame

  • Import pandas:

    import pandas as pd
  • Loading Data:

    df = pd.read_csv('datasets/Admission_Predict.csv')
  • Viewing Data:

    df.head()

Setting a Specific Column as the Index

  • Using index_col parameter:
    df = pd.read_csv('datasets/Admission_Predict.csv', index_col=0)
    df.head()

Renaming Columns

  • Basic Renaming Using a Dictionary:

    new_df = df.rename(columns={
    'GRE Score': 'GRE Score',
    'TOEFL Score': 'TOEFL Score',
    'University Rating': 'University Rating',
    'SOP': 'Statement of Purpose',
    'LOR': 'Letter of Recommendation',
    'CGPA': 'CGPA',
    'Research': 'Research',
    'Chance of Admit': 'Chance of Admit'
    })
    new_df.head()
  • Identifying Issues with Column Names:

    new_df.columns
  • Addressing Issues with Spaces:

    new_df = new_df.rename(columns={'LOR ': 'Letter of Recommendation'})
    new_df.head()
  • Using str.strip() for Robustness:

    new_df = new_df.rename(mapper=str.strip, axis='columns')
    new_df.head()

Modifying Column Names Directly

  • Using df.columns Attribute:
    cols = list(df.columns)
    cols = [x.lower().strip() for x in cols]
    df.columns = cols
    df.head()

Summary

  • Loading CSV into pandas DataFrame: Utilized pd.read_csv().
  • Basic Data Cleaning: Performed column renaming and handled issues with extra spaces.
  • Direct Column Modification: Demonstrated using lists and list comprehensions for efficient renaming.