`DataFrame` Indexing and Loading
Introduction to CSV Files
- CSV (Comma Separated Values): A lightweight, ubiquitous format for data storage.
- Commonly used by spreadsheet software like Excel and Google Sheets.
- Format is flexible but lacks a strict specification, leading to possible variations in structure.
Viewing CSV File Contents
- Viewing CSV Files with Shell Commands:
- Jupyter notebooks can integrate shell commands using an exclamation mark (
!). - Example: Using
!catto display the contents of a CSV file:!cat datasets/Admission_Predict.csv
- Jupyter notebooks can integrate shell commands using an exclamation mark (
Loading CSV Data into a DataFrame
-
Import pandas:
import pandas as pd -
Loading Data:
df = pd.read_csv('datasets/Admission_Predict.csv') -
Viewing Data:
df.head()
Setting a Specific Column as the Index
- Using
index_colparameter:df = pd.read_csv('datasets/Admission_Predict.csv', index_col=0)df.head()
Renaming Columns
-
Basic Renaming Using a Dictionary:
new_df = df.rename(columns={'GRE Score': 'GRE Score','TOEFL Score': 'TOEFL Score','University Rating': 'University Rating','SOP': 'Statement of Purpose','LOR': 'Letter of Recommendation','CGPA': 'CGPA','Research': 'Research','Chance of Admit': 'Chance of Admit'})new_df.head() -
Identifying Issues with Column Names:
new_df.columns -
Addressing Issues with Spaces:
new_df = new_df.rename(columns={'LOR ': 'Letter of Recommendation'})new_df.head() -
Using
str.strip()for Robustness:new_df = new_df.rename(mapper=str.strip, axis='columns')new_df.head()
Modifying Column Names Directly
- Using
df.columnsAttribute:cols = list(df.columns)cols = [x.lower().strip() for x in cols]df.columns = colsdf.head()
Summary
- Loading CSV into pandas DataFrame: Utilized
pd.read_csv(). - Basic Data Cleaning: Performed column renaming and handled issues with extra spaces.
- Direct Column Modification: Demonstrated using lists and list comprehensions for efficient renaming.