`Series` Data Structure
A Series in pandas is a one-dimensional array-like object that can hold various data types, similar to a list in Python, but with additional features. It combines elements of lists and dictionaries, storing items in order and allowing access via labels (index).
- List-like: Ordered collection of items.
- Dictionary-like: Access items using labels.
Structure
A Series consists of two main components:
- Index: Similar to keys in a dictionary.
- Data: Actual values stored in the series.
The data column has a label that can be retrieved using the .name attribute, which is useful for operations like merging multiple columns of data.
Creating a Series
To start, import pandas:
import pandas as pd
From a List
You can create a Series by passing a list of values. Pandas automatically assigns an index starting with zero and sets the name of the series to None.
Example
students = ['Alice', 'Jack', 'Molly']
series_students = pd.Series(students)
Data Types
-
String List: The
Seriestype is set toobject.students = ['Alice', 'Jack', 'Molly']
pd.Series(students) -
Integer List: The
Seriestype is set toint64.numbers = [1, 2, 3]
pd.Series(numbers)
Handling Missing Data
-
Strings with
None: Pandas uses the typeobject.students = ['Alice', 'Jack', None]
pd.Series(students) -
Numbers with
None: Pandas convertsNonetoNaNand sets the type tofloat64.numbers = [1, 2, None]
pd.Series(numbers)
NaN vs. None
-
NaNis not equivalent toNone. Using equality tests, the result isFalse.import numpy as np
np.nan == None # False
np.nan == np.nan # False
np.isnan(np.nan) # True
Creating Series from Dictionaries
A Series can also be created from dictionary data, where the keys become the index values.
Example
students_scores = {'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}
s = pd.Series(students_scores)
Index and Data Types
- The index can be accessed using the
.indexattribute. - The data type (
dtype) of the series and index is inferred automatically.
Example
s.index
Complex Data Types
You can store complex data types like tuples in a Series.
Example
students = [("Alice", "Brown"), ("Jack", "White"), ("Molly", "Green")]
pd.Series(students)
Custom Index
You can explicitly pass an index when creating a Series.
Example
s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
Handling Mismatched Index and Dictionary Keys
If the index provided does not match the dictionary keys, pandas will only include the provided index values, filling missing values with None or NaN.
Example
students_scores = {'Alice': 'Physics', 'Jack': 'Chemistry', 'Molly': 'English'}
s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])