Idioms

In Python programming, certain idioms are considered more appropriate and efficient. Pandas, a sub-language within Python, has its own idioms, often referred to as "pandorable." These idioms improve code readability and performance. Here are some key features to make your code pandorable.

Method Chaining

Method chaining in pandas allows multiple operations on a DataFrame to be condensed into a single statement.

Pandorable Example

This example pulls out state and city names as a multi-index, filters for county-level data, and renames a column:

(df.where(df['SUMLEV'] == 50)
   .dropna()
   .set_index(['STNAME', 'CTYNAME'])
   .rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'}))

Non-Pandorable Example

A more traditional, non-pandorable approach:

df = df[df['SUMLEV'] == 50]  # Filters data
df.set_index(['STNAME', 'CTYNAME'], inplace=True)  # Sets new index
df.rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'})  # Renames column

Performance Comparison

Compare the performance of both approaches using the timeit module:

First Approach

def first_approach():
    global df
    return (df.where(df['SUMLEV'] == 50)
             .dropna()
             .set_index(['STNAME', 'CTYNAME'])
             .rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'}))

df = pd.read_csv('datasets/census.csv')
timeit.timeit(first_approach, number=10)

Second Approach

def second_approach():
    global df
    new_df = df[df['SUMLEV'] == 50]
    new_df.set_index(['STNAME', 'CTYNAME'], inplace=True)
    return new_df.rename(columns={'ESTIMATESBASE2010': 'Estimates Base 2010'})

df = pd.read_csv('datasets/census.csv')
timeit.timeit(second_approach, number=10)

Apply Function

The apply function allows row-wise operations on DataFrames.

Example: Calculating Min and Max

Define a function to calculate the minimum and maximum values for population estimates:

def min_max(row):
    data = row[['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']]
    return pd.Series({'min': np.min(data), 'max': np.max(data)})

df.apply(min_max, axis='columns').head()

Adding New Columns

Modify the function to add new columns directly to the DataFrame:

def min_max(row):
    data = row[['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']]
    row['max'] = np.max(data)
    row['min'] = np.min(data)
    return row

df.apply(min_max, axis='columns')

Using Lambdas

A lambda function to calculate the maximum value across specific columns:

rows = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014', 'POPESTIMATE2015']
df.apply(lambda x: np.max(x[rows]), axis=1).head()

Custom Functions with Apply

Custom functions can categorize or manipulate data based on specific logic.

Example: Categorizing States into Regions

Define a function to determine the region of a state:

def get_state_region(x):
    northeast = ['Connecticut', 'Maine', 'Massachusetts', 'New Hampshire', 'Rhode Island','Vermont','New York','New Jersey','Pennsylvania']
    midwest = ['Illinois','Indiana','Michigan','Ohio','Wisconsin','Iowa','Kansas','Minnesota','Missouri','Nebraska','North Dakota','South Dakota']
    south = ['Delaware','Florida','Georgia','Maryland','North Carolina','South Carolina','Virginia','District of Columbia','West Virginia','Alabama','Kentucky','Mississippi','Tennessee','Arkansas','Louisiana','Oklahoma','Texas']
    west = ['Arizona','Colorado','Idaho','Montana','Nevada','New Mexico','Utah','Wyoming','Alaska','California','Hawaii','Oregon','Washington']
    
    if x in northeast:
        return "Northeast"
    elif x in midwest:
        return "Midwest"
    elif x in south:
        return "South"
    else:
        return "West"

Applying the Function

Use the apply function to create a new column for state regions:

df['state_region'] = df['STNAME'].apply(lambda x: get_state_region(x))
df[['STNAME','state_region']].head()

Method Chaining​

Pandorable Example​

Non-Pandorable Example​

Performance Comparison​

First Approach​

Second Approach​

Apply Function​

Example: Calculating Min and Max​

Adding New Columns​

Using Lambdas​

Custom Functions with Apply​

Example: Categorizing States into Regions​

Applying the Function​