Pandas Study Sheet

Reference: https://pandas.pydata.org/

Examples updated to reflect the popular Titanic dataset (Passengers, Survival, Age, Fare).

Importing and Creating Data

Keyword Description Syntax Example
Import Pandas Imports the Pandas library. import pandas as pd import pandas as pd
DataFrame Creates a tabular data structure from a list or dictionary. pd.DataFrame(data) data = {'Name': ['Braund, Mr. Owen Harris', 'Heikkinen, Miss. Laina'], 'Age': [22, 26], 'Survived': [0, 1]}
df = pd.DataFrame(data)
print(df)
Series Creates a one-dimensional labeled array. pd.Series(data) ages = pd.Series([22, 38, 26, 35])
print(ages)
read_csv() Reads a CSV file into a DataFrame. pd.read_csv(path) df = pd.read_csv('titanic.csv')
print(df.head())
read_excel() Reads an Excel file into a DataFrame. pd.read_excel(path) df = pd.read_excel('titanic.xlsx', sheet_name='passengers')

Viewing and Inspecting Data

Keyword Description Syntax Example
head() Returns the first n rows. df.head(n=5) print(df.head(3))
# Shows first 3 passengers
tail() Returns the last n rows. df.tail(n=5) print(df.tail(3))
# Shows last 3 passengers
info() Summary of DataFrame including index, columns, non-null counts, and memory usage. df.info() df.info()
# RangeIndex: 891 entries...
# Data columns (total 12 columns)...
describe() Generates descriptive statistics (mean, std, min, max) for numerical columns. df.describe() print(df.describe())
# Stats for Age, Fare, Pclass, etc.
shape Returns a tuple (rows, columns). df.shape print(df.shape)
# (891, 12)
columns Returns the column labels. df.columns print(df.columns.tolist())
# ['PassengerId', 'Survived', 'Pclass', 'Name'...]
index Returns the index (row labels) of the DataFrame. df.index print(df.index)
# RangeIndex(start=0, stop=891, step=1)
dtypes Returns data types of each column. df.dtypes print(df.dtypes)
# Age: float64, Name: object

Selecting and Indexing Data

Keyword Description Syntax Example
loc[] Access rows/columns by label or boolean array. df.loc[labels] # Select Name and Age for index 5 to 10
print(df.loc[5:10, ['Name', 'Age']])
iloc[] Access rows/columns by integer position. df.iloc[positions] # Select first 5 rows and first 3 columns
print(df.iloc[:5, :3])
set_index() Sets a column as the index. df.set_index(keys) df.set_index('PassengerId', inplace=True)
# Rows now accessed by PassengerId
reset_index() Resets the index to default integer index. df.reset_index() df.reset_index(inplace=True)

Filtering Data

Keyword Description Syntax Example
Boolean Indexing Filter data using conditions (& for AND, | for OR, ~ for NOT). df[condition] # Women over 30 who survived
survivors = df[(df['Sex'] == 'female') & (df['Age'] > 30) & (df['Survived'] == 1)]
isin() Filter rows where column value is present in a list. df[col.isin(list)] # Passengers in 1st or 2nd class
premium = df[df['Pclass'].isin([1, 2])]
query() Filter using a query string expression. df.query(expr) # Passengers with Fare > 100
rich = df.query('Fare > 100 and Survived == 1')

Handling Missing Data

Keyword Description Syntax Example
isna() Detects missing values (NaN). df.isna().sum() print(df.isna().sum())
# Age: 177, Cabin: 687, Embarked: 2
dropna() Removes rows (or columns) with missing values. df.dropna() # Drop rows where 'Age' is missing
df_clean = df.dropna(subset=['Age'])
fillna() Fills missing values with a specified value or strategy. df.fillna(value) # Fill missing Age with mean Age
df['Age'].fillna(df['Age'].mean(), inplace=True)

Data Cleaning

Keyword Description Syntax Example
unique() / nunique() Get unique values or the count of unique values. df[col].unique() print(df['Embarked'].unique())
# ['S', 'C', 'Q', nan]
print(df['Pclass'].nunique()) # 3
drop_duplicates() Removes duplicate rows. df.drop_duplicates() df.drop_duplicates(subset=['Ticket'], keep='first', inplace=True)

Manipulating Data

Keyword Description Syntax Example
Adding Columns Create a new column by assigning a value or calculation to a new key. df['New'] = val # Create 'FamilySize' from SibSp + Parch
df['FamilySize'] = df['SibSp'] + df['Parch'] + 1
rename() Renames columns or index labels. df.rename(columns={...}) df.rename(columns={'Pclass': 'PassengerClass'}, inplace=True)
drop() Removes specified rows or columns. df.drop(labels, axis) # Drop the 'Cabin' column
df.drop('Cabin', axis=1, inplace=True)
sort_values() Sorts by values along an axis. df.sort_values(by) # Sort by Class (asc) then Fare (desc)
df.sort_values(by=['Pclass', 'Fare'], ascending=[True, False], inplace=True)
astype() Casts a pandas object to a specified dtype. df.astype(dtype) # Convert Survived to boolean
df['Survived'] = df['Survived'].astype(bool)

String Operations

Keyword Description Syntax Example
str.contains() Tests if pattern or regex is contained in string. df[col].str.contains() # Find all 'Mr.' titles
misters = df[df['Name'].str.contains('Mr\.')]
str.lower() / str.upper() Converts strings to lowercase or uppercase. df[col].str.upper() df['Name'] = df['Name'].str.upper()
str.split() Splits string on delimiter. df[col].str.split() # Extract Last Name (before comma)
df['LastName'] = df['Name'].str.split(',').str[0]

Applying Functions

Keyword Description Syntax Example
apply() Applies a function along an axis of the DataFrame. df.apply(func) # Define function to categorize Age
def categorize(age):
return 'Child' if age < 18 else 'Adult'
df['AgeGroup'] = df['Age'].apply(categorize)
map() Maps values of Series using a dictionary. series.map(dict) # Map 'male' to 0, 'female' to 1
df['SexCode'] = df['Sex'].map({'male': 0, 'female': 1})
applymap() Applies a function to every element of the DataFrame (element-wise). df.applymap(func) # Convert all elements in numeric df to float
df_num = df[['Age', 'Fare']].applymap(float)

Statistical Operations

Keyword Description Syntax Example
mean() / median() / mode() Central tendency measures. df.mean() avg_fare = df['Fare'].mean()
median_age = df['Age'].median()
std() / var() Standard deviation and variance. df.std() age_variability = df['Age'].std()
min() / max() Minimum and maximum values. df.min() youngest = df['Age'].min() # 0.42
oldest = df['Age'].max() # 80.0
quantile() Values at given quantile (0 to 1). df.quantile(q) # 75th percentile of Fare
high_fare = df['Fare'].quantile(0.75)
value_counts() Counts of unique values (frequency). series.value_counts() print(df['Pclass'].value_counts())
# 3: 491, 1: 216, 2: 184

Grouping and Aggregating

Keyword Description Syntax Example
groupby() Groups DataFrame using a mapper or by columns. df.groupby(by) # Average survival rate by Pclass
print(df.groupby('Pclass')['Survived'].mean())
agg() Aggregates using one or more operations. grouped.agg(func) # Mean Fare and Max Age per Class
print(df.groupby('Pclass').agg({'Fare': 'mean', 'Age': 'max'}))
pivot_table() Creates a spreadsheet-style pivot table. pd.pivot_table() # Survival rate by Sex and Class
pt = pd.pivot_table(df, values='Survived', index='Sex', columns='Pclass', aggfunc='mean')

Merging and Concatenating

Keyword Description Syntax Example
concat() Concatenates pandas objects along an axis (stacking). pd.concat(objs) # Combine train and test sets
combined = pd.concat([train_df, test_df], axis=0)
merge() Merges DataFrames (SQL-style join). pd.merge(left, right) # Merge passenger data with ticket details
full_data = pd.merge(df_passengers, df_tickets, on='TicketId', how='left')

Reshaping Data

Keyword Description Syntax Example
melt() Unpivots DataFrame from wide to long format. pd.melt(df) # Useful for plotting
long_df = pd.melt(df, id_vars=['PassengerId'], value_vars=['Age', 'Fare'])
pivot() Reshapes data from long to wide format. df.pivot() # Reshape back to wide
wide = long_df.pivot(index='PassengerId', columns='variable', values='value')

Time Series Operations

Keyword Description Syntax Example
to_datetime() Converts argument to datetime. pd.to_datetime(col) # Assuming we have a date column
df['Date'] = pd.to_datetime(df['DateStr'])
resample() Frequency conversion (e.g., daily to monthly). df.resample(freq) # Monthly mean sales
monthly_sales = sales_df.resample('M').mean()
rolling() Rolling window calculations. df.rolling(window) # 7-day moving average
ma7 = stock_df['Price'].rolling(window=7).mean()
shift() Shifts index by desired number of periods. df.shift(periods) # Compare today vs yesterday
df['PrevDay'] = df['Price'].shift(1)

Exporting Data

Keyword Description Syntax Example
to_csv() Saves DataFrame to CSV. df.to_csv(filename) df.to_csv('titanic_cleaned.csv', index=False)