Welcome to my in-depth study on the U.S. presidents throughout history! In this research project, I'll be delving into a wealth of data from numerous resources to uncover insights and patterns related to the American presidency. The data sets I have collected cover diverse aspects of each president's term, ranging from personal information to political achievements and economic indicators.
To achieve our research objectives and derive meaningful insights, we will follow a structured approach:
Gathering data from various reputable sources, including historical archives, government databases, and academic research papers. The datasets encompass information on all U.S. presidents from George Washington to the present day.
Ensuring the data is accurate and consistent by handling missing values, removing duplicates, and standardizing formats. Integrating relevant information from different datasets into a unified data repository for analysis.
Conducting exploratory data analysis to understand the basic characteristics and distributions of the data. We will use visualizations like bar charts, line plots, and scatter plots to gain initial insights.
Performing comprehensive analyses using advanced statistical techniques to answer specific research questions. We will leverage regression analysis, correlation studies, and hypothesis testing to reveal underlying patterns and relationships.
Creating visually appealing and informative charts, graphs, and maps to communicate our findings effectively. The visualizations will assist in conveying complex information in a clear and understandable manner.
Interpreting the results of our analyses and drawing meaningful conclusions. We will contextualize our findings within historical and political contexts to provide a comprehensive perspective on the U.S. presidency.
I would like to express my gratitude to all the data providers and researchers who have made this study possible. Their efforts in collecting and sharing valuable data have been instrumental in the success of this research.
Let's embark on this exciting journey together and discover the fascinating insights hidden within the historical data of the U.S. presidents!
import numpy as np # Library for numerical computations
import pandas as pd # Library for data manipulation and analysis
import seaborn as sns # Library for statistical data visualization
import matplotlib.pyplot as plt # Library for creating plots and visualizations
# Import the necessary Plotly libraries
import plotly.graph_objects as go # Low-level interface for creating Plotly plots
import plotly.express as px # Higher-level interface for creating interactive plots
# Import the necessary ipywidgets libraries
import ipywidgets as widgets # Library for creating interactive widgets
from ipywidgets import interact, interact_manual # Functions for creating interactive controls
import warnings
warnings.filterwarnings('ignore')
In this study, I compiled a collection of diverse datasets related to U.S. Presidents. To ensure the data's quality and reliability, I employed various methods to obtain these resources from different platforms. Below, I outline the steps I took to gather the datasets:
One of the primary sources for the U.S. Presidents' information was the online learning platform, Coursera. I accessed relevant datasets during my Data Science course titled "Data Science with Python." The datasets offered valuable information about the U.S. Presidents' backgrounds and accomplishments.
Kaggle, a well-known data science community and platform, served as another reliable resource. I found multiple datasets on Kaggle, including "First Ladies' Data" and "Historical Presidents Physical Data," which provided unique insights into related aspects.
To complement the datasets from specialized platforms, I conducted targeted Google searches. Through this method, I discovered the dataset "U.S. Presidents Dataset (1)," which offered specific information not found in other sources.
Statista, a reputable statistics portal, contributed valuable data to my study. The dataset "Most Common Names of U.S. Presidents (1789-2021)" provided an intriguing analysis of presidential names over the years.
For economic-related datasets, I referred to the World Bank, a renowned international organization. The datasets "USA Economic Growth" and "U.S. GDP During Presidencies" provided essential economic indicators during presidential tenures.
Each dataset underwent meticulous evaluation to ensure its relevance and accuracy. The combination of resources from various platforms allowed me to present a comprehensive study of U.S. Presidents with diverse perspectives and insights. Through this multi-faceted approach, I aimed to enhance the study's credibility and provide readers with a holistic understanding of the dataset collection process.
Data Set Name | Provider | URL to Provider | Variable Name in Code |
---|---|---|---|
U.S. Presidents' Information | Coursera | Coursera | pres_df_1 |
First Ladies' Data | Kaggle | Kaggle | pres_df_2 |
Historical Presidents Physical Data (More) | Kaggle | Kaggle | pres_df_3 |
U.S. Presidents Dataset (1) | Google Search | Google Search | pres_df_4 |
U.S. Presidents Popular Vote Percentage Dataset (1) | Kaggle | Kaggle | pres_df_5 |
Most Common Names of U.S. Presidents (1789-2021) (1) | Statista | Statista | pres_df_6 |
USA Economic Growth Dataset | World Bank | World Bank | pres_df_7 |
U.S. GDP During Presidencies Dataset | Statista | Statista | pres_df_8 |
This code example demonstrates an interactive widget that allows you to choose a resource from a list and view its website directly within JupyterLab. The widget is created using ipywidgets
and IPython.display.IFrame
.
The code defines a dictionary resources
that maps resource names to their corresponding URLs. For example, it includes providers like Coursera, Kaggle, Google Search, Statista, and World Bank.
It creates a dropdown widget called resource_dropdown
, where the available options are the names of the resources listed in the dictionary.
When you select a resource from the dropdown, the update_iframe
function is triggered. This function extracts the selected resource from the dropdown and displays its website URL using the IFrame
widget.
The IFrame
is embedded in the output area, allowing you to interact with the chosen website directly within JupyterLab.
Run the code in a JupyterLab code cell.
After running the code, a dropdown widget will appear, showing the available resources.
Choose a resource from the dropdown to view its website in the output area.
The website will be displayed using an embedded IFrame, allowing you to explore the chosen resource without leaving JupyterLab.
Please note that this example uses hardcoded URLs for demonstration purposes. You can modify the resources
dictionary to include the actual URLs of the resources you want to showcase in the widget. Additionally, you may replace the existing providers with your desired resources.
import ipywidgets as widgets
from IPython.display import display, IFrame
# Dictionary of resource names and their corresponding URLs
resources = {
"Coursera": "https://www.coursera.org/",
"Kaggle": "https://www.kaggle.com",
"Google Search": "https://www.google.com",
"Statista": "https://www.statista.com",
"World Bank": "https://www.data.worldbank.org",
"Wikipedia": "https://www.wikipedia.org/"
}
# Dropdown widget to select the resource
resource_dropdown = widgets.Dropdown(
options=list(resources.keys()),
description='Select Resource:',
layout=widgets.Layout(width='400px')
)
# Output widget to display the IFrame
output = widgets.Output()
# Function to update the IFrame based on the selected resource
def update_iframe(change):
selected_resource = resource_dropdown.value
with output:
output.clear_output()
display(IFrame(resources[selected_resource], width="100%", height="400px"))
# Attach the update function to the dropdown widget
resource_dropdown.observe(update_iframe, names='value')
# Display the dropdown widget and the output widget
display(resource_dropdown)
display(output)
You could use the code below if you like as well.
import webbrowser
resources = {
"Coursera": "https://www.coursera.org",
"Kaggle": "https://www.kaggle.com",
"Google Search": "https://www.google.com",
"Statista": "https://www.statista.com",
"World Bank": "https://data.worldbank.org"
}
# Function to open the web page in the default browser
def open_website(url):
webbrowser.open_new_tab(url)
# Loop through the resources and open their websites
for resource, url in resources.items():
print(f"Opening {resource} website: {url}")
open_website(url)
Opening Coursera website: https://www.coursera.org Opening Kaggle website: https://www.kaggle.com Opening Google Search website: https://www.google.com Opening Statista website: https://www.statista.com Opening World Bank website: https://data.worldbank.org
Let's read the files:
In this Python script, we will read the content of various files and display their contents.
# Reading the datasets into DataFrames
# Dataset 1: U.S. Presidents' Information
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents.csv
pres_df_1 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents.csv')
# Dataset 2: First Ladies' Data
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\first_ladies.csv
pres_df_2 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\first_ladies.csv')
# Dataset 3: Historical Presidents Physical Data (More)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\Historical Presidents Physical Data (More).csv
pres_df_3 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\Historical Presidents Physical Data (More).csv')
# Dataset 4: U.S. Presidents Dataset (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents(1).csv
pres_df_4 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents(1).csv')
# Dataset 5: U.S. Presidents Popular Vote Percentage Dataset (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\pvp_dataset(1).csv
pres_df_5 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\pvp_dataset(1).csv')
# Dataset 6: Most Common Names of U.S. Presidents (1789-2021) (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\us_presidents.csv
pres_df_6 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\us_presidents.csv')
# Dataset 7: USA Economy Growth Dataset
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USA Economy Growth.csv
# Note: Using encoding='iso-8859-1' to handle non-utf-8 encoded characters
pres_df_7 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USA Economy Growth.csv', encoding='iso-8859-1')
# Dataset 8: U.S. GDP During Presidencies Dataset
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USGDPpresidents.csv
pres_df_8 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USGDPpresidents.csv')
Let's start discovering the data sets:
pres_df_1.head()
# | President | Born | Age atstart of presidency | Age atend of presidency | Post-presidencytimespan | Died | Age | |
---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | Feb 22, 1732[a] | 57 years, 67 daysApr 30, 1789 | 65 years, 10 daysMar 4, 1797 | 2 years, 285 days | Dec 14, 1799 | 67 years, 295 days |
1 | 2 | John Adams | Oct 30, 1735[a] | 61 years, 125 daysMar 4, 1797 | 65 years, 125 daysMar 4, 1801 | 25 years, 122 days | Jul 4, 1826 | 90 years, 247 days |
2 | 3 | Thomas Jefferson | Apr 13, 1743[a] | 57 years, 325 daysMar 4, 1801 | 65 years, 325 daysMar 4, 1809 | 17 years, 122 days | Jul 4, 1826 | 83 years, 82 days |
3 | 4 | James Madison | Mar 16, 1751[a] | 57 years, 353 daysMar 4, 1809 | 65 years, 353 daysMar 4, 1817 | 19 years, 116 days | Jun 28, 1836 | 85 years, 104 days |
4 | 5 | James Monroe | Apr 28, 1758 | 58 years, 310 daysMar 4, 1817 | 66 years, 310 daysMar 4, 1825 | 6 years, 122 days | Jul 4, 1831 | 73 years, 67 days |
pres_df_1.head()
# | President | Born | Age atstart of presidency | Age atend of presidency | Post-presidencytimespan | Died | Age | |
---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | Feb 22, 1732[a] | 57 years, 67 daysApr 30, 1789 | 65 years, 10 daysMar 4, 1797 | 2 years, 285 days | Dec 14, 1799 | 67 years, 295 days |
1 | 2 | John Adams | Oct 30, 1735[a] | 61 years, 125 daysMar 4, 1797 | 65 years, 125 daysMar 4, 1801 | 25 years, 122 days | Jul 4, 1826 | 90 years, 247 days |
2 | 3 | Thomas Jefferson | Apr 13, 1743[a] | 57 years, 325 daysMar 4, 1801 | 65 years, 325 daysMar 4, 1809 | 17 years, 122 days | Jul 4, 1826 | 83 years, 82 days |
3 | 4 | James Madison | Mar 16, 1751[a] | 57 years, 353 daysMar 4, 1809 | 65 years, 353 daysMar 4, 1817 | 19 years, 116 days | Jun 28, 1836 | 85 years, 104 days |
4 | 5 | James Monroe | Apr 28, 1758 | 58 years, 310 daysMar 4, 1817 | 66 years, 310 daysMar 4, 1825 | 6 years, 122 days | Jul 4, 1831 | 73 years, 67 days |
pres_df_2.head()
Unnamed: 0 | relation | name | president | born | death | age_of_death | marriage_date | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Husband | Martha Dandridge | George Washington | June 13, 1731 | May 22, 1802 | 70.0 | January 6, 1759 |
1 | 1 | Husband | Abigail Smith | John Adams | November 22, 1744 | October 28, 1818 | 73.0 | October 25, 1764 |
2 | 2 | Father | Martha Jefferson | Thomas Jefferson | September 27, 1772 | October 10, 1836 | 64.0 | NaN |
3 | 3 | Husband | Dolley Payne | James Madison | May 20, 1768 | July 12, 1849 | 81.0 | September 14, 1794 |
4 | 4 | Husband | Elizabeth Kortright | James Monroe | June 30, 1768 | September 23, 1830 | 62.0 | February 16, 1786 |
pres_df_3.head()
order | name | height_cm | height_in | weight_kg | weight_lb | body_mass_index | body_mass_index_range | birth_day | birth_month | ... | term_begin_year | term_begin_date | term_end_day | term_end_month | term_end_year | term_end_date | presidency_begin_age | presidency_end_age | political_party | corrected_iq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | 188 | 74.0 | 79.4 | 175 | 22.5 | Normal | 22 | 2 | ... | 1789 | 30-04-1789 | 4.0 | 3.0 | 1797.0 | 04-03-1797 | 57 | 65.0 | Unaffiliated | 140.0 |
1 | 2 | John Adams | 170 | 67.0 | 83.9 | 185 | 29.0 | Overweight | 30 | 10 | ... | 1797 | 04-03-1797 | 4.0 | 3.0 | 1801.0 | 04-03-1801 | 61 | 65.0 | Federalist | 155.0 |
2 | 3 | Thomas Jefferson | 189 | 74.5 | 82.1 | 181 | 23.0 | Normal | 13 | 4 | ... | 1801 | 04-03-1801 | 4.0 | 3.0 | 1809.0 | 04-03-1809 | 57 | 65.0 | Democratic-Republican | 160.0 |
3 | 4 | James Madison | 163 | 64.0 | 55.3 | 122 | 20.8 | Normal | 16 | 3 | ... | 1809 | 04-03-1809 | 4.0 | 3.0 | 1817.0 | 04-03-1817 | 57 | 65.0 | Democratic-Republican | 160.0 |
4 | 5 | James Monroe | 183 | 72.0 | 85.7 | 189 | 25.6 | Overweight | 28 | 4 | ... | 1817 | 04-03-1817 | 4.0 | 3.0 | 1825.0 | 04-03-1825 | 58 | 66.0 | Democratic-Republican | 139.0 |
5 rows × 32 columns
pres_df_4.head()
No. | Name | Birthplace | Birthday | Life | Height | Children | Religion | Higher Education | Occupation | Military Service | Term | Party | Vice President | Previous Office | Economy | Foreign Affairs | Military Activity | Other Events | Legacy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | Pope's Creek, VA | 22-Feb | 1732-1799 | 1.88 | 0 | Episcopalian | None | Plantation Owner, Soldier | Commander-in-Chief of the Continental Army in... | 1789-1797 | None, Federalist | John Adams | Commander-in-Chief | [' Hamilton established BUS', '1792 Coinage Ac... | ['1793 Neutrality in the France-Britain confli... | ['1794 Whiskey Rebellion'] | ['1791 Bill of Rights', '1792 Post Office foun... | He is universally regarded as one of the great... |
1 | 2 | John Adams | Braintree, MA | 30-Oct | 1735-1826 | 1.70 | 5 | Unitarian | Harvard | Lawyer, Farmer | none | 1797-1801 | Federalist | Thomas Jefferson | 1st Vice President of USA | ['1798 Progressive land value tax of up to 1% ... | ['1797 the XYZ Affair: a bribe of French agent... | ['1798–1800 The Quasi war. Undeclared naval wa... | ['1798 Alien & Sedition Act to silence critics... | One of the most experienced men ever to become... |
2 | 3 | Thomas Jefferson | Goochland County, VA | 13-Apr | 1743-1826 | 1.89 | 6 | unaffiliated Christian | College of William and Mary | Inventor,Lawyer, Architect | Colonel of Virginia militia (without real mili... | 1801-1809 | Democratic-Republican | Aaron Burr, George Clinton | 2nd Vice President of USA | ['1807 Embargo Act forbidding foreign trade in... | ['1805 Peace Treaty with Tripoli. Piracy stopp... | ['1801-05 Naval operation against Tripoli and ... | ['1803 The Louisiana purchase', '1804 12th Ame... | Probably the most intelligent man ever to occ... |
3 | 4 | James Madison | Port Conway, VA | 16-Mar | 1751-1836 | 1.63 | 0 | Episcopalian | Princeton | Plantation Owner, Lawyer | Colonel of Virginia militia (without real mili... | 1809-1817 | Democratic-Republican | George Clinton, Elbridge Gerry | Secretary of State | [' The first U.S. protective tariff was impose... | ['1814 The Treaty of Ghent ends the War of 1812'] | ['1811 Tippecanoe battle (Harrison vs. Chief T... | ['1811 Cumberland Road construction starts (fi... | His leadership in the War of 1812 was particul... |
4 | 5 | James Monroe | Monroe Hall, VA | 28-Apr | 1758-1831 | 1.83 | 2 | Episcopalian | College of William and Mary | Plantation Owner, Lawyer | Major of the Continental Army | 1817-1825 | Democratic-Republican | Daniel Tompkins | Secretary of War | ['1819 Panic of 1819 (too much land speculatio... | ['1823 Monroe Doctrine', '1818 49th parallel s... | ['1817 1st Seminole war against Seminole India... | ['1819 Florida ceded to US', "1820 Missouri Co... | His presidency contributed to national defense... |
pres_df_5.head()
year | name | party | term | salary | position_title | |
---|---|---|---|---|---|---|
0 | 1789 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
1 | 1790 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
2 | 1791 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
3 | 1792 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
4 | 1793 | Washington,George | Unaffiliated | Second | 25000 | PRESIDENT OF THE UNITED STATES |
pres_df_6.head()
Unnamed: 0 | S.No. | start | end | president | prior | party | vice | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | April 30, 1789 | March 4, 1797 | George Washington | Commander-in-Chief of the Continental Army ... | Nonpartisan [13] | John Adams |
1 | 1 | 2 | March 4, 1797 | March 4, 1801 | John Adams | 1st Vice President of the United States | Federalist | Thomas Jefferson |
2 | 2 | 3 | March 4, 1801 | March 4, 1809 | Thomas Jefferson | 2nd Vice President of the United States | Democratic- Republican | Aaron Burr |
3 | 3 | 4 | March 4, 1809 | March 4, 1817 | James Madison | 5th United States Secretary of State (1801–... | Democratic- Republican | George Clinton |
4 | 4 | 5 | March 4, 1817 | March 4, 1825 | James Monroe | 7th United States Secretary of State (1811–... | Democratic- Republican | Daniel D. Tompkins |
pres_df_7.head()
Year | GDP | GDP per capita (in US$ PPP) | GDP (in Bil. US$nominal) | GDP per capita (in US$ nominal) | GDP growth % | Inflation rate % | Unemployment % | Government debt (in % of GDP) | Presidents | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1981 | 3207.0 | 13948.7 | 3207.0 | 13948.7 | 2.50% | 10.40% | 7.60% | 31.00% | Ronald Reagan |
1 | 1982 | 3343.8 | 14405.0 | 3343.8 | 14405.0 | -1.80% | 6.20% | 9.70% | 34.00% | Ronald Reagan |
2 | 1983 | 3634.0 | 15513.7 | 3634.0 | 15513.7 | 4.60% | 3.20% | 9.60% | 37.00% | Ronald Reagan |
3 | 1984 | 4037.7 | 17086.4 | 4037.7 | 17086.4 | 7.20% | 4.40% | 7.50% | 38.00% | Ronald Reagan |
4 | 1985 | 4339.0 | 18199.3 | 4339.0 | 18199.3 | 4.20% | 3.50% | 7.20% | 41.00% | Ronald Reagan |
pres_df_8.head()
Unnamed: 0 | Year | CPI | GDPdeflator | population.K | realGDPperCapita | executive | war | battleDeaths | battleDeathsPMP | ... | unemployment | unempSource | fedReceipts | fedOutlays | fedSurplus | fedDebt | fedReceipts_pGDP | fedOutlays_pGDP | fedSurplus_pGDP | fedDebt_pGDP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1610 | 1610 | NaN | NaN | 0.350 | NaN | JamesI | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1620 | 1620 | NaN | NaN | 2.302 | NaN | JamesI | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1630 | 1630 | NaN | NaN | 4.646 | NaN | CharlesI | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1640 | 1640 | NaN | NaN | 26.634 | NaN | CharlesI | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1650 | 1650 | NaN | NaN | 50.368 | NaN | Cromwell | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 21 columns
Interactive Data Set Exploration
This Python code below showcases an interactive data set exploration tool designed to facilitate data analysis. The primary objective is to allow users to interactively choose from a range of data sets, explore the datasets, and gain valuable insights without delving into the intricacies of programming.
The code offers a dropdown menu with a list of eight data sets. Once the user selects a data set, they can choose whether to view the beginning (head) or the end (tail) of the dataset. The chosen data set's summary is displayed, including the number of rows and columns in the dataset, presented in a bar plot for better visualization. The bars have numeric labels, representing the counts of rows in each data set.
Furthermore, the code generates a detailed HTML report using the pandas-profiling
module. This report provides a comprehensive overview of the selected dataset, including data types, statistics, and any missing values.
Users can quickly switch between datasets and access important information effortlessly, without the need to dive into complex programming concepts. This interactive approach empowers users to make informed decisions and perform data-driven analyses efficiently.
To use this tool, the reader should follow these steps:
pandas-profiling
module.This interactive data set exploration tool makes it easy for users to interact with and analyze multiple datasets, making data exploration more intuitive, accessible, and informative.
import pandas as pd
import matplotlib.pyplot as plt
import pandas_profiling
from ipywidgets import interact, widgets
from IPython.display import display, HTML
# Dictionary to store the dataset names and corresponding DataFrames
datasets = {
"U.S. Presidents' Information": pres_df_1,
"First Ladies' Data": pres_df_2,
"Historical Presidents Physical Data (More)": pres_df_3,
"U.S. Presidents Dataset (1)": pres_df_4,
"U.S. Presidents Popular Vote Percentage Dataset (1)": pres_df_5,
"Most Common Names of U.S. Presidents (1789-2021) (1)": pres_df_6,
"USA Economy Growth Dataset": pres_df_7,
"U.S. GDP During Presidencies Dataset": pres_df_8,
}
# Function to display dataset information, head, tail, and profiling report
@interact(dataset=datasets.keys(), show_head=True)
def display_dataset_info(dataset, show_head):
df = datasets[dataset]
# Display head or tail of the dataset based on user selection
if show_head:
display(df.head())
else:
display(df.tail())
# Plotting number of columns and rows (shape) with numbers on the bars
rows, cols = df.shape
shape_plot = pd.DataFrame({'Rows': [rows], 'Columns': [cols]})
ax = shape_plot.plot(kind='bar', legend=True, title='Number of Rows and Columns', figsize=(8, 6))
plt.xlabel('Dataset')
plt.ylabel('Count')
plt.xticks(rotation=0)
# Adding numbers on top of the bars
for index, value in enumerate(shape_plot['Rows']):
ax.text(index, value, str(value), ha='center', va='bottom', fontsize=10)
plt.show()
# Displaying information about column types and missing values using pandas-profiling
display(HTML(f"<h3>{dataset} Information:</h3>"))
profile = pandas_profiling.ProfileReport(df, title=dataset)
display(profile)
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
Ensuring the data is accurate and consistent by handling missing values, removing duplicates, and standardizing formats. Integrating relevant information from different datasets into a unified data repository for analysis.
Data Understanding Completed: Time for Data Cleaning and Integration
Having successfully explored the datasets and gained insights into the data, we now have a solid understanding of our data sources and their respective attributes. We have familiarized ourselves with the intricacies of data gathering and the characteristics of each dataset, empowering us to make well-informed decisions moving forward.
With the initial data exploration completed, we are now ready to proceed to the crucial phases of data cleaning, combining, and integration. During these stages, we will focus on refining and preparing the data to be in its optimal form for analysis. This involves addressing various issues that may include:
Data Cleaning: We will identify and handle missing values, inconsistent data formats, and potential outliers. Cleaning the data ensures its accuracy and reliability, mitigating any potential biases that could impact our analyses.
Data Transformation: We may need to transform the data by performing feature engineering, scaling, or normalizing certain variables. These transformations can enhance the data's usability and enable more effective analysis.
Data Integration: We will combine and merge datasets if necessary, ensuring that related information from multiple sources is unified into a cohesive dataset. Integration enables a more comprehensive view of the data and facilitates seamless analysis.
Data Validation: We will validate the data to ensure its quality and verify that it aligns with our research objectives. Data validation is vital for maintaining the integrity of our analyses.
By performing these critical data preparation tasks, we are setting the foundation for robust and accurate data analysis. The subsequent stages of data exploration, modeling, and interpretation will be greatly enhanced, allowing us to extract valuable insights and make informed decisions based on the processed data.
As we embark on the data cleaning, combining, and integration journey, we must approach the process with precision, attention to detail, and a deep understanding of the datasets. By doing so, we will uncover meaningful patterns, trends, and relationships, transforming raw data into valuable knowledge and driving impactful outcomes from our analyses. Let us proceed with enthusiasm and diligence to unlock the true potential of our data and uncover valuable insights that will guide our decision-making process.
Now, we will examine each dataset independently, conducting data cleaning to address any issues like missing values and inconsistencies. Simultaneously, we will carefully choose the relevant columns from each dataset to facilitate smooth data integration. This targeted approach will prepare the datasets for seamless integration, setting the stage for comprehensive and informed data analysis.
pres_df_1
# | President | Born | Age atstart of presidency | Age atend of presidency | Post-presidencytimespan | Died | Age | |
---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | Feb 22, 1732[a] | 57 years, 67 daysApr 30, 1789 | 65 years, 10 daysMar 4, 1797 | 2 years, 285 days | Dec 14, 1799 | 67 years, 295 days |
1 | 2 | John Adams | Oct 30, 1735[a] | 61 years, 125 daysMar 4, 1797 | 65 years, 125 daysMar 4, 1801 | 25 years, 122 days | Jul 4, 1826 | 90 years, 247 days |
2 | 3 | Thomas Jefferson | Apr 13, 1743[a] | 57 years, 325 daysMar 4, 1801 | 65 years, 325 daysMar 4, 1809 | 17 years, 122 days | Jul 4, 1826 | 83 years, 82 days |
3 | 4 | James Madison | Mar 16, 1751[a] | 57 years, 353 daysMar 4, 1809 | 65 years, 353 daysMar 4, 1817 | 19 years, 116 days | Jun 28, 1836 | 85 years, 104 days |
4 | 5 | James Monroe | Apr 28, 1758 | 58 years, 310 daysMar 4, 1817 | 66 years, 310 daysMar 4, 1825 | 6 years, 122 days | Jul 4, 1831 | 73 years, 67 days |
5 | 6 | John Quincy Adams | Jul 11, 1767 | 57 years, 236 daysMar 4, 1825 | 61 years, 236 daysMar 4, 1829 | 18 years, 356 days | Feb 23, 1848 | 80 years, 227 days |
6 | 7 | Andrew Jackson | Mar 15, 1767 | 61 years, 354 daysMar 4, 1829 | 69 years, 354 daysMar 4, 1837 | 8 years, 96 days | Jun 8, 1845 | 78 years, 85 days |
7 | 8 | Martin Van Buren | Dec 5, 1782 | 54 years, 89 daysMar 4, 1837 | 58 years, 89 daysMar 4, 1841 | 21 years, 142 days | Jul 24, 1862 | 79 years, 231 days |
8 | 9 | William H. Harrison | Feb 9, 1773 | 68 years, 23 daysMar 4, 1841 | 68 years, 54 days Apr 4, 1841[b] | NaN | Apr 4, 1841 | 68 years, 54 days |
9 | 10 | John Tyler | Mar 29, 1790 | 51 years, 6 daysApr 4, 1841 | 54 years, 340 daysMar 4, 1845 | 16 years, 320 days | Jan 18, 1862 | 71 years, 295 days |
10 | 11 | James K. Polk | Nov 2, 1795 | 49 years, 122 daysMar 4, 1845 | 53 years, 122 daysMar 4, 1849 | 103 days | Jun 15, 1849 | 53 years, 225 days |
11 | 12 | Zachary Taylor | Nov 24, 1784 | 64 years, 100 daysMar 4, 1849 | 65 years, 227 daysJul 9, 1850[b] | NaN | Jul 9, 1850 | 65 years, 227 days |
12 | 13 | Millard Fillmore | Jan 7, 1800 | 50 years, 183 daysJul 9, 1850 | 53 years, 56 daysMar 4, 1853 | 21 years, 4 days | Mar 8, 1874 | 74 years, 60 days |
13 | 14 | Franklin Pierce | Nov 23, 1804 | 48 years, 101 daysMar 4, 1853 | 52 years, 101 daysMar 4, 1857 | 12 years, 218 days | Oct 8, 1869 | 64 years, 319 days |
14 | 15 | James Buchanan | Apr 23, 1791 | 65 years, 315 daysMar 4, 1857 | 69 years, 315 daysMar 4, 1861 | 7 years, 89 days | Jun 1, 1868 | 77 years, 39 days |
15 | 16 | Abraham Lincoln | Feb 12, 1809 | 52 years, 20 daysMar 4, 1861 | 56 years, 62 daysApr 15, 1865[b] | NaN | Apr 15, 1865 | 56 years, 62 days |
16 | 17 | Andrew Johnson | Dec 29, 1808 | 56 years, 107 daysApr 15, 1865 | 60 years, 65 daysMar 4, 1869 | 6 years, 149 days | Jul 31, 1875 | 66 years, 214 days |
17 | 18 | Ulysses S. Grant | Apr 27, 1822 | 46 years, 311 daysMar 4, 1869 | 54 years, 311 daysMar 4, 1877 | 8 years, 141 days | Jul 23, 1885 | 63 years, 87 days |
18 | 19 | Rutherford B. Hayes | Oct 4, 1822 | 54 years, 151 daysMar 4, 1877 | 58 years, 151 daysMar 4, 1881 | 11 years, 319 days | Jan 17, 1893 | 70 years, 105 days |
19 | 20 | James A. Garfield | Nov 19, 1831 | 49 years, 105 daysMar 4, 1881 | 49 years, 304 daysSep 19, 1881[b] | NaN | Sep 19, 1881 | 49 years, 304 days |
20 | 21 | Chester A. Arthur | Oct 5, 1829 | 51 years, 349 daysSep 19, 1881 | 55 years, 150 daysMar 4, 1885 | 1 year, 259 days | Nov 18, 1886 | 57 years, 44 days |
21 | 22 | Grover Cleveland | Mar 18, 1837 | 47 years, 351 daysMar 4, 1885 | 51 years, 351 daysMar 4, 1889 | 4 years, 0 days[c] | Jun 24, 1908 | 71 years, 98 days |
22 | 23 | Benjamin Harrison | Aug 20, 1833 | 55 years, 196 daysMar 4, 1889 | 59 years, 196 daysMar 4, 1893 | 8 years, 9 days | Mar 13, 1901 | 67 years, 205 days |
23 | 24 | Grover Cleveland | Mar 18, 1837 | 55 years, 351 daysMar 4, 1893 | 59 years, 351 daysMar 4, 1897 | 11 years, 112 days[d] | Jun 24, 1908 | 71 years, 98 days |
24 | 25 | William McKinley | Jan 29, 1843 | 54 years, 34 daysMar 4, 1897 | 58 years, 228 daysSep 14, 1901[b] | NaN | Sep 14, 1901 | 58 years, 228 days |
25 | 26 | Theodore Roosevelt | Oct 27, 1858 | 42 years, 322 daysSep 14, 1901 | 50 years, 128 daysMar 4, 1909 | 9 years, 308 days | Jan 6, 1919 | 60 years, 71 days |
26 | 27 | William H. Taft | Sep 15, 1857 | 51 years, 170 daysMar 4, 1909 | 55 years, 170 daysMar 4, 1913 | 17 years, 4 days | Mar 8, 1930 | 72 years, 174 days |
27 | 28 | Woodrow Wilson | Dec 28, 1856 | 56 years, 66 daysMar 4, 1913 | 64 years, 66 daysMar 4, 1921 | 2 years, 336 days | Feb 3, 1924 | 67 years, 37 days |
28 | 29 | Warren G. Harding | Nov 2, 1865 | 55 years, 122 daysMar 4, 1921 | 57 years, 273 daysAug 2, 1923[b] | NaN | Aug 2, 1923 | 57 years, 273 days |
29 | 30 | Calvin Coolidge | Jul 4, 1872 | 51 years, 29 daysAug 2, 1923 | 56 years, 243 daysMar 4, 1929 | 3 years, 307 days | Jan 5, 1933 | 60 years, 185 days |
30 | 31 | Herbert Hoover | Aug 10, 1874 | 54 years, 206 daysMar 4, 1929 | 58 years, 206 daysMar 4, 1933 | 31 years, 230 days | Oct 20, 1964 | 90 years, 71 days |
31 | 32 | Franklin D. Roosevelt | Jan 30, 1882 | 51 years, 33 daysMar 4, 1933 | 63 years, 72 daysApr 12, 1945[b] | NaN | Apr 12, 1945 | 63 years, 72 days |
32 | 33 | Harry S. Truman | May 8, 1884 | 60 years, 339 daysApr 12, 1945 | 68 years, 257 daysJan 20, 1953 | 19 years, 341 days | Dec 26, 1972 | 88 years, 232 days |
33 | 34 | Dwight D. Eisenhower | Oct 14, 1890 | 62 years, 98 daysJan 20, 1953 | 70 years, 98 daysJan 20, 1961 | 8 years, 67 days | Mar 28, 1969 | 78 years, 165 days |
34 | 35 | John F. Kennedy | May 29, 1917 | 43 years, 236 daysJan 20, 1961 | 46 years, 177 daysNov 22, 1963[b] | NaN | Nov 22, 1963 | 46 years, 177 days |
35 | 36 | Lyndon B. Johnson | Aug 27, 1908 | 55 years, 87 daysNov 22, 1963 | 60 years, 146 daysJan 20, 1969 | 4 years, 2 days | Jan 22, 1973 | 64 years, 148 days |
36 | 37 | Richard Nixon | Jan 9, 1913 | 56 years, 11 daysJan 20, 1969 | 61 years, 212 daysAug 9, 1974[e] | 19 years, 256 days | Apr 22, 1994 | 81 years, 103 days |
37 | 38 | Gerald Ford | Jul 14, 1913 | 61 years, 26 daysAug 9, 1974 | 63 years, 190 daysJan 20, 1977 | 29 years, 340 days | Dec 26, 2006 | 93 years, 165 days |
38 | 39 | Jimmy Carter | Oct 1, 1924 | 52 years, 111 daysJan 20, 1977 | 56 years, 111 daysJan 20, 1981 | 38 years, 175 days | (living) | 94 years, 286 days |
39 | 40 | Ronald Reagan | Feb 6, 1911 | 69 years, 349 daysJan 20, 1981 | 77 years, 349 daysJan 20, 1989 | 15 years, 137 days | Jun 5, 2004 | 93 years, 120 days |
40 | 41 | George H. W. Bush | Jun 12, 1924 | 64 years, 222 daysJan 20, 1989 | 68 years, 222 daysJan 20, 1993 | 25 years, 314 days | Nov 30, 2018 | 94 years, 171 days |
41 | 42 | Bill Clinton | Aug 19, 1946 | 46 years, 154 daysJan 20, 1993 | 54 years, 154 daysJan 20, 2001 | 18 years, 175 days | (living) | 72 years, 329 days |
42 | 43 | George W. Bush | Jul 6, 1946 | 54 years, 198 daysJan 20, 2001 | 62 years, 198 daysJan 20, 2009 | 10 years, 175 days | (living) | 73 years, 8 days |
43 | 44 | Barack Obama | Aug 4, 1961 | 47 years, 169 daysJan 20, 2009 | 55 years, 169 daysJan 20, 2017 | 2 years, 175 days | (living) | 57 years, 344 days |
pres_df_1.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 44 entries, 0 to 43 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 44 non-null int64 1 President 44 non-null object 2 Born 44 non-null object 3 Age atstart of presidency 44 non-null object 4 Age atend of presidency 44 non-null object 5 Post-presidencytimespan 36 non-null object 6 Died 44 non-null object 7 Age 44 non-null object dtypes: int64(1), object(7) memory usage: 2.9+ KB
Data Set (pres_df_1) Exploration and Required Transformations:
In our analysis of pres_df_1
, we identified several necessary transformations and data handling tasks:
By performing these necessary transformations, we ensure the dataset's accuracy, consistency, and completeness, paving the way for robust data integration and insightful analysis.
# Task 0: Rename the columns
# Rename the columns
pres_df_1 = pres_df_1.rename(columns={'Age atstart of presidency': 'Age at start of presidency',
'Age atend of presidency': 'Age at end of presidency', 'Post-presidencytimespan': 'Post-presidency timespan'})
# Task 1: Remove the redundant index
pres_df_1 = pres_df_1.drop(columns=['#'], axis=1)
# Task 3: Split the 'Age at start of presidency' column into separate columns for age and date
import re
# Function to split the 'Age at start of presidency' column into separate columns for age and date
def split_age_and_date(row):
age_date = row['Age at start of presidency']
age_match = re.search(r'\d+ years?, \d+ days?', age_date)
date_match = re.search(r'[A-Za-z]{3} \d{1,2}, \d{4}', age_date)
age = age_match.group() if age_match else ""
date = date_match.group() if date_match else ""
return pd.Series({'Age at Start of Presidency': age.strip(), 'Start Date': date.strip()})
pres_df_1[['Age at Start of Presidency', 'Start Date of presidency']] = pres_df_1.apply(split_age_and_date, axis=1)
pres_df_1['Start Age of presidency'] = (pres_df_1['Age at start of presidency'].str.split('days', expand=True))[0]
pres_df_1 = pres_df_1.drop(['Age at Start of Presidency','Age at start of presidency'], axis=1)
# Display the updated DataFrame after performing Task 5
pres_df_1.head()
President | Born | Age at end of presidency | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | |
---|---|---|---|---|---|---|---|---|
0 | George Washington | Feb 22, 1732[a] | 65 years, 10 daysMar 4, 1797 | 2 years, 285 days | Dec 14, 1799 | 67 years, 295 days | Apr 30, 1789 | 57 years, 67 |
1 | John Adams | Oct 30, 1735[a] | 65 years, 125 daysMar 4, 1801 | 25 years, 122 days | Jul 4, 1826 | 90 years, 247 days | Mar 4, 1797 | 61 years, 125 |
2 | Thomas Jefferson | Apr 13, 1743[a] | 65 years, 325 daysMar 4, 1809 | 17 years, 122 days | Jul 4, 1826 | 83 years, 82 days | Mar 4, 1801 | 57 years, 325 |
3 | James Madison | Mar 16, 1751[a] | 65 years, 353 daysMar 4, 1817 | 19 years, 116 days | Jun 28, 1836 | 85 years, 104 days | Mar 4, 1809 | 57 years, 353 |
4 | James Monroe | Apr 28, 1758 | 66 years, 310 daysMar 4, 1825 | 6 years, 122 days | Jul 4, 1831 | 73 years, 67 days | Mar 4, 1817 | 58 years, 310 |
# Task 4: Split the 'Age at end of presidency' column into separate columns for number with days and years, and the date
pres_df_1[['End of Presidency Age', 'End of Presidency Date']] = pres_df_1['Age at end of presidency'].str.split('days', expand=True)
# Remove leading and trailing whitespaces from 'End of Presidency Age' and 'End of Presidency Date' columns
pres_df_1['End of Presidency Age'] = pres_df_1['End of Presidency Age'].str.strip()
pres_df_1['End of Presidency Date'] = pres_df_1['End of Presidency Date'].str.strip()
pres_df_1 = pres_df_1.drop('Age at end of presidency',axis=1)
pres_df_1
President | Born | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | End of Presidency Age | End of Presidency Date | |
---|---|---|---|---|---|---|---|---|---|
0 | George Washington | Feb 22, 1732[a] | 2 years, 285 days | Dec 14, 1799 | 67 years, 295 days | Apr 30, 1789 | 57 years, 67 | 65 years, 10 | Mar 4, 1797 |
1 | John Adams | Oct 30, 1735[a] | 25 years, 122 days | Jul 4, 1826 | 90 years, 247 days | Mar 4, 1797 | 61 years, 125 | 65 years, 125 | Mar 4, 1801 |
2 | Thomas Jefferson | Apr 13, 1743[a] | 17 years, 122 days | Jul 4, 1826 | 83 years, 82 days | Mar 4, 1801 | 57 years, 325 | 65 years, 325 | Mar 4, 1809 |
3 | James Madison | Mar 16, 1751[a] | 19 years, 116 days | Jun 28, 1836 | 85 years, 104 days | Mar 4, 1809 | 57 years, 353 | 65 years, 353 | Mar 4, 1817 |
4 | James Monroe | Apr 28, 1758 | 6 years, 122 days | Jul 4, 1831 | 73 years, 67 days | Mar 4, 1817 | 58 years, 310 | 66 years, 310 | Mar 4, 1825 |
5 | John Quincy Adams | Jul 11, 1767 | 18 years, 356 days | Feb 23, 1848 | 80 years, 227 days | Mar 4, 1825 | 57 years, 236 | 61 years, 236 | Mar 4, 1829 |
6 | Andrew Jackson | Mar 15, 1767 | 8 years, 96 days | Jun 8, 1845 | 78 years, 85 days | Mar 4, 1829 | 61 years, 354 | 69 years, 354 | Mar 4, 1837 |
7 | Martin Van Buren | Dec 5, 1782 | 21 years, 142 days | Jul 24, 1862 | 79 years, 231 days | Mar 4, 1837 | 54 years, 89 | 58 years, 89 | Mar 4, 1841 |
8 | William H. Harrison | Feb 9, 1773 | NaN | Apr 4, 1841 | 68 years, 54 days | Mar 4, 1841 | 68 years, 23 | 68 years, 54 | Apr 4, 1841[b] |
9 | John Tyler | Mar 29, 1790 | 16 years, 320 days | Jan 18, 1862 | 71 years, 295 days | Apr 4, 1841 | 51 years, 6 | 54 years, 340 | Mar 4, 1845 |
10 | James K. Polk | Nov 2, 1795 | 103 days | Jun 15, 1849 | 53 years, 225 days | Mar 4, 1845 | 49 years, 122 | 53 years, 122 | Mar 4, 1849 |
11 | Zachary Taylor | Nov 24, 1784 | NaN | Jul 9, 1850 | 65 years, 227 days | Mar 4, 1849 | 64 years, 100 | 65 years, 227 | Jul 9, 1850[b] |
12 | Millard Fillmore | Jan 7, 1800 | 21 years, 4 days | Mar 8, 1874 | 74 years, 60 days | Jul 9, 1850 | 50 years, 183 | 53 years, 56 | Mar 4, 1853 |
13 | Franklin Pierce | Nov 23, 1804 | 12 years, 218 days | Oct 8, 1869 | 64 years, 319 days | Mar 4, 1853 | 48 years, 101 | 52 years, 101 | Mar 4, 1857 |
14 | James Buchanan | Apr 23, 1791 | 7 years, 89 days | Jun 1, 1868 | 77 years, 39 days | Mar 4, 1857 | 65 years, 315 | 69 years, 315 | Mar 4, 1861 |
15 | Abraham Lincoln | Feb 12, 1809 | NaN | Apr 15, 1865 | 56 years, 62 days | Mar 4, 1861 | 52 years, 20 | 56 years, 62 | Apr 15, 1865[b] |
16 | Andrew Johnson | Dec 29, 1808 | 6 years, 149 days | Jul 31, 1875 | 66 years, 214 days | Apr 15, 1865 | 56 years, 107 | 60 years, 65 | Mar 4, 1869 |
17 | Ulysses S. Grant | Apr 27, 1822 | 8 years, 141 days | Jul 23, 1885 | 63 years, 87 days | Mar 4, 1869 | 46 years, 311 | 54 years, 311 | Mar 4, 1877 |
18 | Rutherford B. Hayes | Oct 4, 1822 | 11 years, 319 days | Jan 17, 1893 | 70 years, 105 days | Mar 4, 1877 | 54 years, 151 | 58 years, 151 | Mar 4, 1881 |
19 | James A. Garfield | Nov 19, 1831 | NaN | Sep 19, 1881 | 49 years, 304 days | Mar 4, 1881 | 49 years, 105 | 49 years, 304 | Sep 19, 1881[b] |
20 | Chester A. Arthur | Oct 5, 1829 | 1 year, 259 days | Nov 18, 1886 | 57 years, 44 days | Sep 19, 1881 | 51 years, 349 | 55 years, 150 | Mar 4, 1885 |
21 | Grover Cleveland | Mar 18, 1837 | 4 years, 0 days[c] | Jun 24, 1908 | 71 years, 98 days | Mar 4, 1885 | 47 years, 351 | 51 years, 351 | Mar 4, 1889 |
22 | Benjamin Harrison | Aug 20, 1833 | 8 years, 9 days | Mar 13, 1901 | 67 years, 205 days | Mar 4, 1889 | 55 years, 196 | 59 years, 196 | Mar 4, 1893 |
23 | Grover Cleveland | Mar 18, 1837 | 11 years, 112 days[d] | Jun 24, 1908 | 71 years, 98 days | Mar 4, 1893 | 55 years, 351 | 59 years, 351 | Mar 4, 1897 |
24 | William McKinley | Jan 29, 1843 | NaN | Sep 14, 1901 | 58 years, 228 days | Mar 4, 1897 | 54 years, 34 | 58 years, 228 | Sep 14, 1901[b] |
25 | Theodore Roosevelt | Oct 27, 1858 | 9 years, 308 days | Jan 6, 1919 | 60 years, 71 days | Sep 14, 1901 | 42 years, 322 | 50 years, 128 | Mar 4, 1909 |
26 | William H. Taft | Sep 15, 1857 | 17 years, 4 days | Mar 8, 1930 | 72 years, 174 days | Mar 4, 1909 | 51 years, 170 | 55 years, 170 | Mar 4, 1913 |
27 | Woodrow Wilson | Dec 28, 1856 | 2 years, 336 days | Feb 3, 1924 | 67 years, 37 days | Mar 4, 1913 | 56 years, 66 | 64 years, 66 | Mar 4, 1921 |
28 | Warren G. Harding | Nov 2, 1865 | NaN | Aug 2, 1923 | 57 years, 273 days | Mar 4, 1921 | 55 years, 122 | 57 years, 273 | Aug 2, 1923[b] |
29 | Calvin Coolidge | Jul 4, 1872 | 3 years, 307 days | Jan 5, 1933 | 60 years, 185 days | Aug 2, 1923 | 51 years, 29 | 56 years, 243 | Mar 4, 1929 |
30 | Herbert Hoover | Aug 10, 1874 | 31 years, 230 days | Oct 20, 1964 | 90 years, 71 days | Mar 4, 1929 | 54 years, 206 | 58 years, 206 | Mar 4, 1933 |
31 | Franklin D. Roosevelt | Jan 30, 1882 | NaN | Apr 12, 1945 | 63 years, 72 days | Mar 4, 1933 | 51 years, 33 | 63 years, 72 | Apr 12, 1945[b] |
32 | Harry S. Truman | May 8, 1884 | 19 years, 341 days | Dec 26, 1972 | 88 years, 232 days | Apr 12, 1945 | 60 years, 339 | 68 years, 257 | Jan 20, 1953 |
33 | Dwight D. Eisenhower | Oct 14, 1890 | 8 years, 67 days | Mar 28, 1969 | 78 years, 165 days | Jan 20, 1953 | 62 years, 98 | 70 years, 98 | Jan 20, 1961 |
34 | John F. Kennedy | May 29, 1917 | NaN | Nov 22, 1963 | 46 years, 177 days | Jan 20, 1961 | 43 years, 236 | 46 years, 177 | Nov 22, 1963[b] |
35 | Lyndon B. Johnson | Aug 27, 1908 | 4 years, 2 days | Jan 22, 1973 | 64 years, 148 days | Nov 22, 1963 | 55 years, 87 | 60 years, 146 | Jan 20, 1969 |
36 | Richard Nixon | Jan 9, 1913 | 19 years, 256 days | Apr 22, 1994 | 81 years, 103 days | Jan 20, 1969 | 56 years, 11 | 61 years, 212 | Aug 9, 1974[e] |
37 | Gerald Ford | Jul 14, 1913 | 29 years, 340 days | Dec 26, 2006 | 93 years, 165 days | Aug 9, 1974 | 61 years, 26 | 63 years, 190 | Jan 20, 1977 |
38 | Jimmy Carter | Oct 1, 1924 | 38 years, 175 days | (living) | 94 years, 286 days | Jan 20, 1977 | 52 years, 111 | 56 years, 111 | Jan 20, 1981 |
39 | Ronald Reagan | Feb 6, 1911 | 15 years, 137 days | Jun 5, 2004 | 93 years, 120 days | Jan 20, 1981 | 69 years, 349 | 77 years, 349 | Jan 20, 1989 |
40 | George H. W. Bush | Jun 12, 1924 | 25 years, 314 days | Nov 30, 2018 | 94 years, 171 days | Jan 20, 1989 | 64 years, 222 | 68 years, 222 | Jan 20, 1993 |
41 | Bill Clinton | Aug 19, 1946 | 18 years, 175 days | (living) | 72 years, 329 days | Jan 20, 1993 | 46 years, 154 | 54 years, 154 | Jan 20, 2001 |
42 | George W. Bush | Jul 6, 1946 | 10 years, 175 days | (living) | 73 years, 8 days | Jan 20, 2001 | 54 years, 198 | 62 years, 198 | Jan 20, 2009 |
43 | Barack Obama | Aug 4, 1961 | 2 years, 175 days | (living) | 57 years, 344 days | Jan 20, 2009 | 47 years, 169 | 55 years, 169 | Jan 20, 2017 |
# Task: Remove the [b] from the 'End of Presidency Date' column
pres_df_1['End of Presidency Date'] = pres_df_1['End of Presidency Date'].str.replace(r'\[b\]|\[e\]', '', regex=True)
# Task 5: Fill missing values in 'Post-presidency timespan' with 'Died in Office'
pres_df_1['Post-presidency timespan'].fillna('Died in Office', inplace=True)
# Task: Remove the [b], [c] and [d] from the 'Post-presidency timespan' column
pres_df_1['Post-presidency timespan'] = pres_df_1['Post-presidency timespan'].str.replace(r'\[b\]|\[c\]|\[d\]', '', regex=True)
# Task: Remove the [b] from the 'Born' column
pres_df_1['Born'] = pres_df_1['Born'].str.replace(r'\[b\]|\[a\]', '', regex=True)
# Task : Add the word "days" to 'Start Age of Presidency' and 'End of Presidency Age'
pres_df_1['Start Age of presidency'] = pres_df_1['Start Age of presidency'] + " days"
pres_df_1['End of Presidency Age'] = pres_df_1['End of Presidency Age'] + " days"
# Task 6: Add missing data for Presidents Trump and Biden
pres_df_1 = pres_df_1.append({'President': 'Donald J. Trump', 'Born': 'June 14, 1946', 'Start Age of presidency': '70 years, 220 days',
'Post-presidency timespan': 'Living', 'Died': '(living)', 'Start Date of presidency': 'Jan 20 2017',
'End of Presidency Age': '74 years, 222 days', 'End of Presidency Date': 'Jan 20, 2021','Age':'70 years, 220 days'},
ignore_index=True)
pres_df_1 = pres_df_1.append({'President': 'Joe Biden', 'Born': 'Nov 20 1942', 'Start Age of presidency': '77 years, 62 days',
'Post-presidency timespan': 'Living', 'Died': '(living)', 'Start Date of presidency': 'Jan 20, 2021',
'End of Presidency Age': '81 years, 62 days', 'End of Presidency Date':'Nov 5, 2024','Age':'78 years, 61 days'},
ignore_index=True)
# Task 7: Convert specified columns to datetime data
date_columns = ['Born', 'Died', 'Start Date of presidency', 'End of Presidency Date']
pres_df_1[date_columns] = pres_df_1[date_columns].apply(pd.to_datetime, errors='coerce')
# Task 7: Convert specified columns to datetime data
# Function to convert 'number years, number days', 'Died in Office', 'Living', and 'number days' to timedelta
def convert_years_days_to_timedelta(row):
if pd.notna(row):
if 'Died' in row:
return row # Keep "Died in Office" intact
elif 'Living' in row:
return pd.NaT # Handle "Living" case
else:
years_days = row.split(', ')
years = int(years_days[0].split()[0]) if 'year' in years_days[0] else 0
days = int(years_days[1].split()[0]) if len(years_days) > 1 and 'day' in years_days[1] else int(years_days[0].split()[0])
return pd.Timedelta(days=years*365 + days)
else:
return pd.NaT
# Convert Age, Start Age of presidency, End of Presidency Age, and Post-presidency timespan to timedelta
timedelta_columns = ['Age', 'Start Age of presidency', 'End of Presidency Age', 'Post-presidency timespan']
pres_df_1[timedelta_columns] = pres_df_1[timedelta_columns].applymap(convert_years_days_to_timedelta)
pres_df_1
President | Born | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | End of Presidency Age | End of Presidency Date | |
---|---|---|---|---|---|---|---|---|---|
0 | George Washington | 1732-02-22 | 1015 days 00:00:00 | 1799-12-14 | 24750 days | 1789-04-30 | 20872 days | 23735 days | 1797-03-04 |
1 | John Adams | 1735-10-30 | 9247 days 00:00:00 | 1826-07-04 | 33097 days | 1797-03-04 | 22390 days | 23850 days | 1801-03-04 |
2 | Thomas Jefferson | 1743-04-13 | 6327 days 00:00:00 | 1826-07-04 | 30377 days | 1801-03-04 | 21130 days | 24050 days | 1809-03-04 |
3 | James Madison | 1751-03-16 | 7051 days 00:00:00 | 1836-06-28 | 31129 days | 1809-03-04 | 21158 days | 24078 days | 1817-03-04 |
4 | James Monroe | 1758-04-28 | 2312 days 00:00:00 | 1831-07-04 | 26712 days | 1817-03-04 | 21480 days | 24400 days | 1825-03-04 |
5 | John Quincy Adams | 1767-07-11 | 6926 days 00:00:00 | 1848-02-23 | 29427 days | 1825-03-04 | 21041 days | 22501 days | 1829-03-04 |
6 | Andrew Jackson | 1767-03-15 | 3016 days 00:00:00 | 1845-06-08 | 28555 days | 1829-03-04 | 22619 days | 25539 days | 1837-03-04 |
7 | Martin Van Buren | 1782-12-05 | 7807 days 00:00:00 | 1862-07-24 | 29066 days | 1837-03-04 | 19799 days | 21259 days | 1841-03-04 |
8 | William H. Harrison | 1773-02-09 | Died in Office | 1841-04-04 | 24874 days | 1841-03-04 | 24843 days | 24874 days | 1841-04-04 |
9 | John Tyler | 1790-03-29 | 6160 days 00:00:00 | 1862-01-18 | 26210 days | 1841-04-04 | 18621 days | 20050 days | 1845-03-04 |
10 | James K. Polk | 1795-11-02 | 103 days 00:00:00 | 1849-06-15 | 19570 days | 1845-03-04 | 18007 days | 19467 days | 1849-03-04 |
11 | Zachary Taylor | 1784-11-24 | Died in Office | 1850-07-09 | 23952 days | 1849-03-04 | 23460 days | 23952 days | 1850-07-09 |
12 | Millard Fillmore | 1800-01-07 | 7669 days 00:00:00 | 1874-03-08 | 27070 days | 1850-07-09 | 18433 days | 19401 days | 1853-03-04 |
13 | Franklin Pierce | 1804-11-23 | 4598 days 00:00:00 | 1869-10-08 | 23679 days | 1853-03-04 | 17621 days | 19081 days | 1857-03-04 |
14 | James Buchanan | 1791-04-23 | 2644 days 00:00:00 | 1868-06-01 | 28144 days | 1857-03-04 | 24040 days | 25500 days | 1861-03-04 |
15 | Abraham Lincoln | 1809-02-12 | Died in Office | 1865-04-15 | 20502 days | 1861-03-04 | 19000 days | 20502 days | 1865-04-15 |
16 | Andrew Johnson | 1808-12-29 | 2339 days 00:00:00 | 1875-07-31 | 24304 days | 1865-04-15 | 20547 days | 21965 days | 1869-03-04 |
17 | Ulysses S. Grant | 1822-04-27 | 3061 days 00:00:00 | 1885-07-23 | 23082 days | 1869-03-04 | 17101 days | 20021 days | 1877-03-04 |
18 | Rutherford B. Hayes | 1822-10-04 | 4334 days 00:00:00 | 1893-01-17 | 25655 days | 1877-03-04 | 19861 days | 21321 days | 1881-03-04 |
19 | James A. Garfield | 1831-11-19 | Died in Office | 1881-09-19 | 18189 days | 1881-03-04 | 17990 days | 18189 days | 1881-09-19 |
20 | Chester A. Arthur | 1829-10-05 | 624 days 00:00:00 | 1886-11-18 | 20849 days | 1881-09-19 | 18964 days | 20225 days | 1885-03-04 |
21 | Grover Cleveland | 1837-03-18 | 1460 days 00:00:00 | 1908-06-24 | 26013 days | 1885-03-04 | 17506 days | 18966 days | 1889-03-04 |
22 | Benjamin Harrison | 1833-08-20 | 2929 days 00:00:00 | 1901-03-13 | 24660 days | 1889-03-04 | 20271 days | 21731 days | 1893-03-04 |
23 | Grover Cleveland | 1837-03-18 | 4127 days 00:00:00 | 1908-06-24 | 26013 days | 1893-03-04 | 20426 days | 21886 days | 1897-03-04 |
24 | William McKinley | 1843-01-29 | Died in Office | 1901-09-14 | 21398 days | 1897-03-04 | 19744 days | 21398 days | 1901-09-14 |
25 | Theodore Roosevelt | 1858-10-27 | 3593 days 00:00:00 | 1919-01-06 | 21971 days | 1901-09-14 | 15652 days | 18378 days | 1909-03-04 |
26 | William H. Taft | 1857-09-15 | 6209 days 00:00:00 | 1930-03-08 | 26454 days | 1909-03-04 | 18785 days | 20245 days | 1913-03-04 |
27 | Woodrow Wilson | 1856-12-28 | 1066 days 00:00:00 | 1924-02-03 | 24492 days | 1913-03-04 | 20506 days | 23426 days | 1921-03-04 |
28 | Warren G. Harding | 1865-11-02 | Died in Office | 1923-08-02 | 21078 days | 1921-03-04 | 20197 days | 21078 days | 1923-08-02 |
29 | Calvin Coolidge | 1872-07-04 | 1402 days 00:00:00 | 1933-01-05 | 22085 days | 1923-08-02 | 18644 days | 20683 days | 1929-03-04 |
30 | Herbert Hoover | 1874-08-10 | 11545 days 00:00:00 | 1964-10-20 | 32921 days | 1929-03-04 | 19916 days | 21376 days | 1933-03-04 |
31 | Franklin D. Roosevelt | 1882-01-30 | Died in Office | 1945-04-12 | 23067 days | 1933-03-04 | 18648 days | 23067 days | 1945-04-12 |
32 | Harry S. Truman | 1884-05-08 | 7276 days 00:00:00 | 1972-12-26 | 32352 days | 1945-04-12 | 22239 days | 25077 days | 1953-01-20 |
33 | Dwight D. Eisenhower | 1890-10-14 | 2987 days 00:00:00 | 1969-03-28 | 28635 days | 1953-01-20 | 22728 days | 25648 days | 1961-01-20 |
34 | John F. Kennedy | 1917-05-29 | Died in Office | 1963-11-22 | 16967 days | 1961-01-20 | 15931 days | 16967 days | 1963-11-22 |
35 | Lyndon B. Johnson | 1908-08-27 | 1462 days 00:00:00 | 1973-01-22 | 23508 days | 1963-11-22 | 20162 days | 22046 days | 1969-01-20 |
36 | Richard Nixon | 1913-01-09 | 7191 days 00:00:00 | 1994-04-22 | 29668 days | 1969-01-20 | 20451 days | 22477 days | 1974-08-09 |
37 | Gerald Ford | 1913-07-14 | 10925 days 00:00:00 | 2006-12-26 | 34110 days | 1974-08-09 | 22291 days | 23185 days | 1977-01-20 |
38 | Jimmy Carter | 1924-10-01 | 14045 days 00:00:00 | NaT | 34596 days | 1977-01-20 | 19091 days | 20551 days | 1981-01-20 |
39 | Ronald Reagan | 1911-02-06 | 5612 days 00:00:00 | 2004-06-05 | 34065 days | 1981-01-20 | 25534 days | 28454 days | 1989-01-20 |
40 | George H. W. Bush | 1924-06-12 | 9439 days 00:00:00 | 2018-11-30 | 34481 days | 1989-01-20 | 23582 days | 25042 days | 1993-01-20 |
41 | Bill Clinton | 1946-08-19 | 6745 days 00:00:00 | NaT | 26609 days | 1993-01-20 | 16944 days | 19864 days | 2001-01-20 |
42 | George W. Bush | 1946-07-06 | 3825 days 00:00:00 | NaT | 26653 days | 2001-01-20 | 19908 days | 22828 days | 2009-01-20 |
43 | Barack Obama | 1961-08-04 | 905 days 00:00:00 | NaT | 21149 days | 2009-01-20 | 17324 days | 20244 days | 2017-01-20 |
44 | Donald J. Trump | 1946-06-14 | NaT | NaT | 25770 days | 2017-01-20 | 25770 days | 27232 days | 2021-01-20 |
45 | Joe Biden | 1942-11-20 | NaT | NaT | 28531 days | 2021-01-20 | 28167 days | 29627 days | 2024-11-05 |
pres_df_1.dtypes
President object Born datetime64[ns] Post-presidency timespan object Died datetime64[ns] Age timedelta64[ns] Start Date of presidency datetime64[ns] Start Age of presidency timedelta64[ns] End of Presidency Age timedelta64[ns] End of Presidency Date datetime64[ns] dtype: object
Here are the steps we took to clean the pres_df_1
DataFrame and the achievements:
Convert Date Columns to DateTime: We converted the 'Born', 'Died', 'Start Date of presidency', and 'End of Presidency Date' columns to DateTime data type using pd.to_datetime
. This ensures that the date values are in a proper datetime format.
Convert Years and Days to Timedelta: We defined a function convert_years_days_to_timedelta
to handle the conversion of columns with 'number years, number days', 'Died in Office', 'Living', and 'number days' entries to timedelta format. The function uses string parsing to extract the years and days and calculates the timedelta accordingly. We applied this function to the 'Age', 'Start Age of presidency', 'End of Presidency Age', and 'Post-presidency timespan' columns using applymap
.
Handling Missing Data: We used pd.NaT
to represent missing values in the 'Died in Office' and 'Living' cases for the 'Post-presidency timespan' column.
Converted date columns to DateTime data type: The date columns 'Born', 'Died', 'Start Date of presidency', and 'End of Presidency Date' are now in a proper DateTime format, making it easier to perform date-related operations.
Converted years and days to Timedelta: The columns 'Age', 'Start Age of presidency', 'End of Presidency Age', and 'Post-presidency timespan' were converted to Timedelta format. This allows for meaningful calculations and comparisons of time durations.
Handled Missing Data: We properly handled missing data for 'Died in Office' and 'Living' cases in the 'Post-presidency timespan' column by using pd.NaT
.
Overall, the data in pres_df_1
is now cleaned and well-prepared for further analysis and insights.
pres_df_2
Unnamed: 0 | relation | name | president | born | death | age_of_death | marriage_date | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Husband | Martha Dandridge | George Washington | June 13, 1731 | May 22, 1802 | 70.0 | January 6, 1759 |
1 | 1 | Husband | Abigail Smith | John Adams | November 22, 1744 | October 28, 1818 | 73.0 | October 25, 1764 |
2 | 2 | Father | Martha Jefferson | Thomas Jefferson | September 27, 1772 | October 10, 1836 | 64.0 | NaN |
3 | 3 | Husband | Dolley Payne | James Madison | May 20, 1768 | July 12, 1849 | 81.0 | September 14, 1794 |
4 | 4 | Husband | Elizabeth Kortright | James Monroe | June 30, 1768 | September 23, 1830 | 62.0 | February 16, 1786 |
5 | 5 | Husband | Louisa Catherine Johnson | John Quincy Adams | February 12, 1775 | May 15, 1852 | 77.0 | July 26, 1797 |
6 | 6 | Uncle | Emily Donelson | Andrew Jackson | June 1, 1807 | December 19, 1836 | 29.0 | NaN |
7 | 7 | Father-in-law | Sarah Yorke | Andrew Jackson | July 16, 1803 | August 23, 1887 | 84.0 | NaN |
8 | 8 | Father-in-law | Sarah Angelica Singleton | Martin Van Buren | February 13, 1818 | December 29, 1877 | 59.0 | NaN |
9 | 9 | Husband | Anna Tuthill Symmes | William Henry Harrison | July 25, 1775 | February 25, 1864 | 88.0 | November 22, 1795 |
10 | 10 | Father-in-law | Jane Irwin | William Henry Harrison | July 23, 1804 | May 11, 1846 | 41.0 | NaN |
11 | 11 | Husband | Letitia Christian | John Tyler | November 12, 1790 | September 10, 1842 | 51.0 | March 29, 1813 |
12 | 12 | Father-in-law | Elizabeth Priscilla Cooper | John Tyler | June 14, 1816 | December 29, 1889 | 73.0 | NaN |
13 | 13 | Husband | Julia Gardiner | John Tyler | May 4, 1820 | July 10, 1889 | 69.0 | June 26, 1844 |
14 | 14 | Husband | Sarah Childress | James K. Polk | September 4, 1803 | August 14, 1891 | 87.0 | January 1, 1824 |
15 | 15 | Husband | Margaret Mackall Smith | Zachary Taylor | September 21, 1788 | August 14, 1852 | 63.0 | June 21, 1810 |
16 | 16 | Husband | Abigail Powers | Millard Fillmore | March 13, 1798 | March 30, 1853 | 55.0 | February 5, 1826 |
17 | 17 | Husband | Jane Means Appleton | Franklin Pierce | March 12, 1806 | December 2, 1863 | 57.0 | November 19, 1834 |
18 | 18 | Uncle | Harriet Rebecca Lane | James Buchanan | May 9, 1830 | July 3, 1903 | 73.0 | NaN |
19 | 19 | Husband | Mary Ann Todd | Abraham Lincoln | December 13, 1818 | July 16, 1882 | 63.0 | November 4, 1842 |
20 | 20 | Husband | Eliza McCardle | Andrew Johnson | October 4, 1810 | January 15, 1876 | 65.0 | May 17, 1827 |
21 | 21 | Husband | Julia Boggs Dent | Ulysses S. Grant | January 26, 1826 | December 14, 1902 | 76.0 | August 22, 1848 |
22 | 22 | Husband | Lucy Ware Webb | Rutherford B. Hayes | August 28, 1831 | June 25, 1889 | 57.0 | December 30, 1852 |
23 | 23 | Husband | Lucretia Rudolph | James A. Garfield | April 19, 1832 | March 14, 1918 | 85.0 | November 11, 1858 |
24 | 24 | Brother | Mary Arthur McElroy | Chester A. Arthur | July 5, 1841 | January 8, 1917 | 75.0 | NaN |
25 | 25 | Brother | Rose Elizabeth Cleveland | Grover Cleveland | June 13, 1846 | November 22, 1918 | 72.0 | NaN |
26 | 26 | Husband | Frances Clara Folsom | Grover Cleveland | July 21, 1864 | October 29, 1947 | 83.0 | June 2, 1886 |
27 | 27 | Husband | Caroline Lavinia Scott | Benjamin Harrison | October 1, 1832 | October 25, 1892 | 60.0 | October 20, 1853 |
28 | 28 | Father | Mary Scott Harrison | Benjamin Harrison | April 3, 1858 | October 28, 1930 | 72.0 | NaN |
29 | 29 | Husband | Frances Clara Folsom | Grover Cleveland | July 21, 1864 | October 29, 1947 | 83.0 | June 2, 1886 |
30 | 30 | Husband | Ida Saxton | William McKinley | June 8, 1847 | May 26, 1907 | 59.0 | January 25, 1871 |
31 | 31 | Husband | Edith Kermit Carow | Theodore Roosevelt | August 6, 1861 | September 30, 1948 | 87.0 | December 2, 1886 |
32 | 32 | Husband | Helen Louise Herron | William H. Taft | June 2, 1861 | May 22, 1943 | 81.0 | June 19, 1886 |
33 | 33 | Husband | Ellen Louise Axson | Woodrow Wilson | May 15, 1860 | August 6, 1914 | 54.0 | June 24, 1885 |
34 | 34 | Father | Margaret Woodrow Wilson | Woodrow Wilson | April 16, 1886 | February 12, 1944 | 57.0 | NaN |
35 | 35 | Husband | Edith Bolling | Woodrow Wilson | October 15, 1872 | December 28, 1961 | 89.0 | December 18, 1915 |
36 | 36 | Husband | Florence Mabel Kling | Warren G. Harding | August 15, 1860 | November 21, 1924 | 64.0 | July 8, 1891 |
37 | 37 | Husband | Grace Anna Goodhue | Calvin Coolidge | January 3, 1879 | July 8, 1957 | 78.0 | October 4, 1905 |
38 | 38 | Husband | Lou Henry | Herbert Hoover | March 29, 1874 | January 7, 1944 | 69.0 | February 10, 1899 |
39 | 39 | Husband | Anna Eleanor Roosevelt | Franklin D. Roosevelt | October 11, 1884 | November 7, 1962 | 78.0 | March 17, 1905 |
40 | 40 | Husband | Elizabeth Virginia "Bess" Wallace | Harry S. Truman | February 13, 1885 | October 18, 1982 | 97.0 | June 28, 1919 |
41 | 41 | Husband | Mamie Geneva Doud | Dwight D. Eisenhower | November 14, 1896 | November 1, 1979 | 82.0 | July 1, 1916 |
42 | 42 | Husband | Jacqueline "Jackie" Lee Bouvier | John F. Kennedy | July 28, 1929 | May 19, 1994 | 64.0 | September 12, 1953 |
43 | 43 | Husband | Claudia Alta "Lady Bird" Taylor | Lyndon B. Johnson | December 22, 1912 | July 11, 2007 | 94.0 | November 17, 1934 |
44 | 44 | Husband | Thelma "Pat" Catherine Ryan | Richard Nixon | March 16, 1912 | June 22, 1993 | 81.0 | June 21, 1940 |
45 | 45 | Husband | Elizabeth "Betty" Ann Bloomer | Gerald Ford | April 8, 1918 | July 8, 2011 | 93.0 | October 15, 1948 |
46 | 46 | Husband | Eleanor Rosalynn Smith | Jimmy Carter | August 18, 1927 | NaN | NaN | August 18, 1927 |
47 | 47 | Husband | Nancy Davis | Ronald Reagan | July 6, 1921 | March 6, 2016 | 94.0 | March 4, 1952 |
48 | 48 | Husband | Barbara Pierce | George H. W. Bush | June 8, 1925 | April 17, 2018 | 92.0 | January 6, 1945 |
49 | 49 | Husband | Hillary Diane Rodham | Bill Clinton | October 26, 1947 | NaN | NaN | October 26, 1947 |
50 | 50 | Husband | Laura Lane Welch | George W. Bush | November 4, 1946 | NaN | NaN | November 4, 1946 |
51 | 51 | Husband | Michelle LaVaughn Robinson | Barack Obama | January 17, 1964 | NaN | NaN | January 17, 1964 |
52 | 52 | Husband | Melanija Knavs | Donald Trump | April 26, 1970 | NaN | NaN | January 22, 2005 |
pres_df_2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 53 entries, 0 to 52 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 53 non-null int64 1 relation 53 non-null object 2 name 53 non-null object 3 president 53 non-null object 4 born 53 non-null object 5 death 48 non-null object 6 age_of_death 48 non-null float64 7 marriage_date 42 non-null object dtypes: float64(1), int64(1), object(6) memory usage: 3.4+ KB
In the analysis of pres_df_2
, the following observations were made:
Redundant Index: The dataset contains an additional index column that duplicates the row numbers. It is recommended to remove this redundant index column as it does not provide any useful information.
Column Name Change: The column names in pres_df_2
may need to be modified for better clarity and consistency. We can consider renaming certain columns to ensure they are descriptive and easily understandable.
Date Conversion: The columns "born," "death," and "marriage_date" appear to contain date-related information. To facilitate further analysis, it is essential to convert these columns into datetime data types.
To summarize the required data cleaning steps for pres_df_2
:
By implementing these data cleaning steps, we can enhance the usability and integrity of the pres_df_2
dataset for further analysis.
Note: The pres_df_2 dataset primarily focuses on the life events of First Ladies, not the Presidents. Therefore, all the columns, such as "Born," "Death," and others, are related to the First Ladies' biographical information and significant events in their lives. By conducting these data cleaning steps, we aim to make the dataset more intuitive and informative for exploratory analysis and data visualization.
# Task 1: Drop the redundant index column
pres_df_2 = pres_df_2.drop(columns='Unnamed: 0', axis=1)
# Task 2: Rename the columns
pres_df_2.rename(columns={
'relation': 'Relation to President',
'name': 'First Lady Name',
'president': 'President',
'born': 'Date of Born, First Lady',
'death': 'Date of Death, First Lady',
'age_of_death': 'Age at Death, First Lady',
'marriage_date': 'Date of Marriage'
}, inplace=True)
# Task 3: Convert the 'Date of Born, First Lady', 'Date of Death, First Lady', and 'Date of Marriage' columns to datetime
date_columns = ['Date of Born, First Lady', 'Date of Death, First Lady', 'Date of Marriage']
pres_df_2[date_columns] = pres_df_2[date_columns].apply(pd.to_datetime)
pres_df_2
Relation to President | First Lady Name | President | Date of Born, First Lady | Date of Death, First Lady | Age at Death, First Lady | Date of Marriage | |
---|---|---|---|---|---|---|---|
0 | Husband | Martha Dandridge | George Washington | 1731-06-13 | 1802-05-22 | 70.0 | 1759-01-06 |
1 | Husband | Abigail Smith | John Adams | 1744-11-22 | 1818-10-28 | 73.0 | 1764-10-25 |
2 | Father | Martha Jefferson | Thomas Jefferson | 1772-09-27 | 1836-10-10 | 64.0 | NaT |
3 | Husband | Dolley Payne | James Madison | 1768-05-20 | 1849-07-12 | 81.0 | 1794-09-14 |
4 | Husband | Elizabeth Kortright | James Monroe | 1768-06-30 | 1830-09-23 | 62.0 | 1786-02-16 |
5 | Husband | Louisa Catherine Johnson | John Quincy Adams | 1775-02-12 | 1852-05-15 | 77.0 | 1797-07-26 |
6 | Uncle | Emily Donelson | Andrew Jackson | 1807-06-01 | 1836-12-19 | 29.0 | NaT |
7 | Father-in-law | Sarah Yorke | Andrew Jackson | 1803-07-16 | 1887-08-23 | 84.0 | NaT |
8 | Father-in-law | Sarah Angelica Singleton | Martin Van Buren | 1818-02-13 | 1877-12-29 | 59.0 | NaT |
9 | Husband | Anna Tuthill Symmes | William Henry Harrison | 1775-07-25 | 1864-02-25 | 88.0 | 1795-11-22 |
10 | Father-in-law | Jane Irwin | William Henry Harrison | 1804-07-23 | 1846-05-11 | 41.0 | NaT |
11 | Husband | Letitia Christian | John Tyler | 1790-11-12 | 1842-09-10 | 51.0 | 1813-03-29 |
12 | Father-in-law | Elizabeth Priscilla Cooper | John Tyler | 1816-06-14 | 1889-12-29 | 73.0 | NaT |
13 | Husband | Julia Gardiner | John Tyler | 1820-05-04 | 1889-07-10 | 69.0 | 1844-06-26 |
14 | Husband | Sarah Childress | James K. Polk | 1803-09-04 | 1891-08-14 | 87.0 | 1824-01-01 |
15 | Husband | Margaret Mackall Smith | Zachary Taylor | 1788-09-21 | 1852-08-14 | 63.0 | 1810-06-21 |
16 | Husband | Abigail Powers | Millard Fillmore | 1798-03-13 | 1853-03-30 | 55.0 | 1826-02-05 |
17 | Husband | Jane Means Appleton | Franklin Pierce | 1806-03-12 | 1863-12-02 | 57.0 | 1834-11-19 |
18 | Uncle | Harriet Rebecca Lane | James Buchanan | 1830-05-09 | 1903-07-03 | 73.0 | NaT |
19 | Husband | Mary Ann Todd | Abraham Lincoln | 1818-12-13 | 1882-07-16 | 63.0 | 1842-11-04 |
20 | Husband | Eliza McCardle | Andrew Johnson | 1810-10-04 | 1876-01-15 | 65.0 | 1827-05-17 |
21 | Husband | Julia Boggs Dent | Ulysses S. Grant | 1826-01-26 | 1902-12-14 | 76.0 | 1848-08-22 |
22 | Husband | Lucy Ware Webb | Rutherford B. Hayes | 1831-08-28 | 1889-06-25 | 57.0 | 1852-12-30 |
23 | Husband | Lucretia Rudolph | James A. Garfield | 1832-04-19 | 1918-03-14 | 85.0 | 1858-11-11 |
24 | Brother | Mary Arthur McElroy | Chester A. Arthur | 1841-07-05 | 1917-01-08 | 75.0 | NaT |
25 | Brother | Rose Elizabeth Cleveland | Grover Cleveland | 1846-06-13 | 1918-11-22 | 72.0 | NaT |
26 | Husband | Frances Clara Folsom | Grover Cleveland | 1864-07-21 | 1947-10-29 | 83.0 | 1886-06-02 |
27 | Husband | Caroline Lavinia Scott | Benjamin Harrison | 1832-10-01 | 1892-10-25 | 60.0 | 1853-10-20 |
28 | Father | Mary Scott Harrison | Benjamin Harrison | 1858-04-03 | 1930-10-28 | 72.0 | NaT |
29 | Husband | Frances Clara Folsom | Grover Cleveland | 1864-07-21 | 1947-10-29 | 83.0 | 1886-06-02 |
30 | Husband | Ida Saxton | William McKinley | 1847-06-08 | 1907-05-26 | 59.0 | 1871-01-25 |
31 | Husband | Edith Kermit Carow | Theodore Roosevelt | 1861-08-06 | 1948-09-30 | 87.0 | 1886-12-02 |
32 | Husband | Helen Louise Herron | William H. Taft | 1861-06-02 | 1943-05-22 | 81.0 | 1886-06-19 |
33 | Husband | Ellen Louise Axson | Woodrow Wilson | 1860-05-15 | 1914-08-06 | 54.0 | 1885-06-24 |
34 | Father | Margaret Woodrow Wilson | Woodrow Wilson | 1886-04-16 | 1944-02-12 | 57.0 | NaT |
35 | Husband | Edith Bolling | Woodrow Wilson | 1872-10-15 | 1961-12-28 | 89.0 | 1915-12-18 |
36 | Husband | Florence Mabel Kling | Warren G. Harding | 1860-08-15 | 1924-11-21 | 64.0 | 1891-07-08 |
37 | Husband | Grace Anna Goodhue | Calvin Coolidge | 1879-01-03 | 1957-07-08 | 78.0 | 1905-10-04 |
38 | Husband | Lou Henry | Herbert Hoover | 1874-03-29 | 1944-01-07 | 69.0 | 1899-02-10 |
39 | Husband | Anna Eleanor Roosevelt | Franklin D. Roosevelt | 1884-10-11 | 1962-11-07 | 78.0 | 1905-03-17 |
40 | Husband | Elizabeth Virginia "Bess" Wallace | Harry S. Truman | 1885-02-13 | 1982-10-18 | 97.0 | 1919-06-28 |
41 | Husband | Mamie Geneva Doud | Dwight D. Eisenhower | 1896-11-14 | 1979-11-01 | 82.0 | 1916-07-01 |
42 | Husband | Jacqueline "Jackie" Lee Bouvier | John F. Kennedy | 1929-07-28 | 1994-05-19 | 64.0 | 1953-09-12 |
43 | Husband | Claudia Alta "Lady Bird" Taylor | Lyndon B. Johnson | 1912-12-22 | 2007-07-11 | 94.0 | 1934-11-17 |
44 | Husband | Thelma "Pat" Catherine Ryan | Richard Nixon | 1912-03-16 | 1993-06-22 | 81.0 | 1940-06-21 |
45 | Husband | Elizabeth "Betty" Ann Bloomer | Gerald Ford | 1918-04-08 | 2011-07-08 | 93.0 | 1948-10-15 |
46 | Husband | Eleanor Rosalynn Smith | Jimmy Carter | 1927-08-18 | NaT | NaN | 1927-08-18 |
47 | Husband | Nancy Davis | Ronald Reagan | 1921-07-06 | 2016-03-06 | 94.0 | 1952-03-04 |
48 | Husband | Barbara Pierce | George H. W. Bush | 1925-06-08 | 2018-04-17 | 92.0 | 1945-01-06 |
49 | Husband | Hillary Diane Rodham | Bill Clinton | 1947-10-26 | NaT | NaN | 1947-10-26 |
50 | Husband | Laura Lane Welch | George W. Bush | 1946-11-04 | NaT | NaN | 1946-11-04 |
51 | Husband | Michelle LaVaughn Robinson | Barack Obama | 1964-01-17 | NaT | NaN | 1964-01-17 |
52 | Husband | Melanija Knavs | Donald Trump | 1970-04-26 | NaT | NaN | 2005-01-22 |
# Set option to display all columns
pd.set_option('display.max_columns', None)
pres_df_3.head()
order | name | height_cm | height_in | weight_kg | weight_lb | body_mass_index | body_mass_index_range | birth_day | birth_month | birth_year | birth_date | birthplace | birth_state | death_day | death_month | death_year | death_date | death_age | astrological_sign | term_begin_day | term_begin_month | term_begin_year | term_begin_date | term_end_day | term_end_month | term_end_year | term_end_date | presidency_begin_age | presidency_end_age | political_party | corrected_iq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | 188 | 74.0 | 79.4 | 175 | 22.5 | Normal | 22 | 2 | 1732 | 22-02-1732 | Westmoreland County | Virginia | 14.0 | 12.0 | 1799.0 | 14-12-1799 | 67.0 | Pisces | 30 | 4 | 1789 | 30-04-1789 | 4.0 | 3.0 | 1797.0 | 04-03-1797 | 57 | 65.0 | Unaffiliated | 140.0 |
1 | 2 | John Adams | 170 | 67.0 | 83.9 | 185 | 29.0 | Overweight | 30 | 10 | 1735 | 30-10-1735 | Braintree | Massachusetts | 4.0 | 7.0 | 1826.0 | 04-07-1826 | 90.0 | Scorpio | 4 | 3 | 1797 | 04-03-1797 | 4.0 | 3.0 | 1801.0 | 04-03-1801 | 61 | 65.0 | Federalist | 155.0 |
2 | 3 | Thomas Jefferson | 189 | 74.5 | 82.1 | 181 | 23.0 | Normal | 13 | 4 | 1743 | 13-04-1743 | Shadwell | Virginia | 4.0 | 7.0 | 1826.0 | 04-07-1826 | 83.0 | Aries | 4 | 3 | 1801 | 04-03-1801 | 4.0 | 3.0 | 1809.0 | 04-03-1809 | 57 | 65.0 | Democratic-Republican | 160.0 |
3 | 4 | James Madison | 163 | 64.0 | 55.3 | 122 | 20.8 | Normal | 16 | 3 | 1751 | 16-03-1751 | Port Conway | Virginia | 28.0 | 6.0 | 1836.0 | 28-06-1836 | 85.0 | Pisces | 4 | 3 | 1809 | 04-03-1809 | 4.0 | 3.0 | 1817.0 | 04-03-1817 | 57 | 65.0 | Democratic-Republican | 160.0 |
4 | 5 | James Monroe | 183 | 72.0 | 85.7 | 189 | 25.6 | Overweight | 28 | 4 | 1758 | 28-04-1758 | Monroe Hall | Virginia | 4.0 | 7.0 | 1831.0 | 04-07-1831 | 73.0 | Taurus | 4 | 3 | 1817 | 04-03-1817 | 4.0 | 3.0 | 1825.0 | 04-03-1825 | 58 | 66.0 | Democratic-Republican | 139.0 |
pres_df_3.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 45 entries, 0 to 44 Data columns (total 32 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order 45 non-null object 1 name 45 non-null object 2 height_cm 45 non-null int64 3 height_in 45 non-null float64 4 weight_kg 45 non-null float64 5 weight_lb 45 non-null int64 6 body_mass_index 45 non-null float64 7 body_mass_index_range 45 non-null object 8 birth_day 45 non-null int64 9 birth_month 45 non-null int64 10 birth_year 45 non-null int64 11 birth_date 45 non-null object 12 birthplace 45 non-null object 13 birth_state 45 non-null object 14 death_day 39 non-null float64 15 death_month 39 non-null float64 16 death_year 39 non-null float64 17 death_date 39 non-null object 18 death_age 39 non-null float64 19 astrological_sign 45 non-null object 20 term_begin_day 45 non-null int64 21 term_begin_month 45 non-null int64 22 term_begin_year 45 non-null int64 23 term_begin_date 45 non-null object 24 term_end_day 44 non-null float64 25 term_end_month 44 non-null float64 26 term_end_year 44 non-null float64 27 term_end_date 44 non-null object 28 presidency_begin_age 45 non-null int64 29 presidency_end_age 44 non-null float64 30 political_party 45 non-null object 31 corrected_iq 42 non-null float64 dtypes: float64(12), int64(9), object(11) memory usage: 11.4+ KB
pres_df_1
President | Born | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | End of Presidency Age | End of Presidency Date | |
---|---|---|---|---|---|---|---|---|---|
0 | George Washington | 1732-02-22 | 1015 days 00:00:00 | 1799-12-14 | 24750 days | 1789-04-30 | 20872 days | 23735 days | 1797-03-04 |
1 | John Adams | 1735-10-30 | 9247 days 00:00:00 | 1826-07-04 | 33097 days | 1797-03-04 | 22390 days | 23850 days | 1801-03-04 |
2 | Thomas Jefferson | 1743-04-13 | 6327 days 00:00:00 | 1826-07-04 | 30377 days | 1801-03-04 | 21130 days | 24050 days | 1809-03-04 |
3 | James Madison | 1751-03-16 | 7051 days 00:00:00 | 1836-06-28 | 31129 days | 1809-03-04 | 21158 days | 24078 days | 1817-03-04 |
4 | James Monroe | 1758-04-28 | 2312 days 00:00:00 | 1831-07-04 | 26712 days | 1817-03-04 | 21480 days | 24400 days | 1825-03-04 |
5 | John Quincy Adams | 1767-07-11 | 6926 days 00:00:00 | 1848-02-23 | 29427 days | 1825-03-04 | 21041 days | 22501 days | 1829-03-04 |
6 | Andrew Jackson | 1767-03-15 | 3016 days 00:00:00 | 1845-06-08 | 28555 days | 1829-03-04 | 22619 days | 25539 days | 1837-03-04 |
7 | Martin Van Buren | 1782-12-05 | 7807 days 00:00:00 | 1862-07-24 | 29066 days | 1837-03-04 | 19799 days | 21259 days | 1841-03-04 |
8 | William H. Harrison | 1773-02-09 | Died in Office | 1841-04-04 | 24874 days | 1841-03-04 | 24843 days | 24874 days | 1841-04-04 |
9 | John Tyler | 1790-03-29 | 6160 days 00:00:00 | 1862-01-18 | 26210 days | 1841-04-04 | 18621 days | 20050 days | 1845-03-04 |
10 | James K. Polk | 1795-11-02 | 103 days 00:00:00 | 1849-06-15 | 19570 days | 1845-03-04 | 18007 days | 19467 days | 1849-03-04 |
11 | Zachary Taylor | 1784-11-24 | Died in Office | 1850-07-09 | 23952 days | 1849-03-04 | 23460 days | 23952 days | 1850-07-09 |
12 | Millard Fillmore | 1800-01-07 | 7669 days 00:00:00 | 1874-03-08 | 27070 days | 1850-07-09 | 18433 days | 19401 days | 1853-03-04 |
13 | Franklin Pierce | 1804-11-23 | 4598 days 00:00:00 | 1869-10-08 | 23679 days | 1853-03-04 | 17621 days | 19081 days | 1857-03-04 |
14 | James Buchanan | 1791-04-23 | 2644 days 00:00:00 | 1868-06-01 | 28144 days | 1857-03-04 | 24040 days | 25500 days | 1861-03-04 |
15 | Abraham Lincoln | 1809-02-12 | Died in Office | 1865-04-15 | 20502 days | 1861-03-04 | 19000 days | 20502 days | 1865-04-15 |
16 | Andrew Johnson | 1808-12-29 | 2339 days 00:00:00 | 1875-07-31 | 24304 days | 1865-04-15 | 20547 days | 21965 days | 1869-03-04 |
17 | Ulysses S. Grant | 1822-04-27 | 3061 days 00:00:00 | 1885-07-23 | 23082 days | 1869-03-04 | 17101 days | 20021 days | 1877-03-04 |
18 | Rutherford B. Hayes | 1822-10-04 | 4334 days 00:00:00 | 1893-01-17 | 25655 days | 1877-03-04 | 19861 days | 21321 days | 1881-03-04 |
19 | James A. Garfield | 1831-11-19 | Died in Office | 1881-09-19 | 18189 days | 1881-03-04 | 17990 days | 18189 days | 1881-09-19 |
20 | Chester A. Arthur | 1829-10-05 | 624 days 00:00:00 | 1886-11-18 | 20849 days | 1881-09-19 | 18964 days | 20225 days | 1885-03-04 |
21 | Grover Cleveland | 1837-03-18 | 1460 days 00:00:00 | 1908-06-24 | 26013 days | 1885-03-04 | 17506 days | 18966 days | 1889-03-04 |
22 | Benjamin Harrison | 1833-08-20 | 2929 days 00:00:00 | 1901-03-13 | 24660 days | 1889-03-04 | 20271 days | 21731 days | 1893-03-04 |
23 | Grover Cleveland | 1837-03-18 | 4127 days 00:00:00 | 1908-06-24 | 26013 days | 1893-03-04 | 20426 days | 21886 days | 1897-03-04 |
24 | William McKinley | 1843-01-29 | Died in Office | 1901-09-14 | 21398 days | 1897-03-04 | 19744 days | 21398 days | 1901-09-14 |
25 | Theodore Roosevelt | 1858-10-27 | 3593 days 00:00:00 | 1919-01-06 | 21971 days | 1901-09-14 | 15652 days | 18378 days | 1909-03-04 |
26 | William H. Taft | 1857-09-15 | 6209 days 00:00:00 | 1930-03-08 | 26454 days | 1909-03-04 | 18785 days | 20245 days | 1913-03-04 |
27 | Woodrow Wilson | 1856-12-28 | 1066 days 00:00:00 | 1924-02-03 | 24492 days | 1913-03-04 | 20506 days | 23426 days | 1921-03-04 |
28 | Warren G. Harding | 1865-11-02 | Died in Office | 1923-08-02 | 21078 days | 1921-03-04 | 20197 days | 21078 days | 1923-08-02 |
29 | Calvin Coolidge | 1872-07-04 | 1402 days 00:00:00 | 1933-01-05 | 22085 days | 1923-08-02 | 18644 days | 20683 days | 1929-03-04 |
30 | Herbert Hoover | 1874-08-10 | 11545 days 00:00:00 | 1964-10-20 | 32921 days | 1929-03-04 | 19916 days | 21376 days | 1933-03-04 |
31 | Franklin D. Roosevelt | 1882-01-30 | Died in Office | 1945-04-12 | 23067 days | 1933-03-04 | 18648 days | 23067 days | 1945-04-12 |
32 | Harry S. Truman | 1884-05-08 | 7276 days 00:00:00 | 1972-12-26 | 32352 days | 1945-04-12 | 22239 days | 25077 days | 1953-01-20 |
33 | Dwight D. Eisenhower | 1890-10-14 | 2987 days 00:00:00 | 1969-03-28 | 28635 days | 1953-01-20 | 22728 days | 25648 days | 1961-01-20 |
34 | John F. Kennedy | 1917-05-29 | Died in Office | 1963-11-22 | 16967 days | 1961-01-20 | 15931 days | 16967 days | 1963-11-22 |
35 | Lyndon B. Johnson | 1908-08-27 | 1462 days 00:00:00 | 1973-01-22 | 23508 days | 1963-11-22 | 20162 days | 22046 days | 1969-01-20 |
36 | Richard Nixon | 1913-01-09 | 7191 days 00:00:00 | 1994-04-22 | 29668 days | 1969-01-20 | 20451 days | 22477 days | 1974-08-09 |
37 | Gerald Ford | 1913-07-14 | 10925 days 00:00:00 | 2006-12-26 | 34110 days | 1974-08-09 | 22291 days | 23185 days | 1977-01-20 |
38 | Jimmy Carter | 1924-10-01 | 14045 days 00:00:00 | NaT | 34596 days | 1977-01-20 | 19091 days | 20551 days | 1981-01-20 |
39 | Ronald Reagan | 1911-02-06 | 5612 days 00:00:00 | 2004-06-05 | 34065 days | 1981-01-20 | 25534 days | 28454 days | 1989-01-20 |
40 | George H. W. Bush | 1924-06-12 | 9439 days 00:00:00 | 2018-11-30 | 34481 days | 1989-01-20 | 23582 days | 25042 days | 1993-01-20 |
41 | Bill Clinton | 1946-08-19 | 6745 days 00:00:00 | NaT | 26609 days | 1993-01-20 | 16944 days | 19864 days | 2001-01-20 |
42 | George W. Bush | 1946-07-06 | 3825 days 00:00:00 | NaT | 26653 days | 2001-01-20 | 19908 days | 22828 days | 2009-01-20 |
43 | Barack Obama | 1961-08-04 | 905 days 00:00:00 | NaT | 21149 days | 2009-01-20 | 17324 days | 20244 days | 2017-01-20 |
44 | Donald J. Trump | 1946-06-14 | NaT | NaT | 25770 days | 2017-01-20 | 25770 days | 27232 days | 2021-01-20 |
45 | Joe Biden | 1942-11-20 | NaT | NaT | 28531 days | 2021-01-20 | 28167 days | 29627 days | 2024-11-05 |
After analyzing pres_df_3
, we identified the following tasks:
# Task 1: Delete the redundant columns
columns_to_drop = ['birth_day', 'birth_month', 'birth_year', 'birth_date',
'death_day', 'death_month', 'death_year', 'death_date',
'term_begin_day', 'term_begin_month', 'term_begin_year', 'term_begin_date',
'term_end_day', 'term_end_month', 'term_end_year', 'term_end_date']
pres_df_3.drop(columns=columns_to_drop, inplace=True)
# Task 2: Rename the columns
column_mapping = {
'name': 'President'
}
pres_df_3.rename(columns=column_mapping, inplace=True)
# Task 4: Convert data types to the correct types and handle missing data
# Convert 'order' column to integer
pres_df_3['order'] = pd.to_numeric(pres_df_3['order'], errors='coerce', downcast='integer')
# Convert 'death_age' column to float (if applicable)
pres_df_3['death_age'] = pres_df_3['death_age'].astype(float)
# Convert 'presidency_end_age' column to float
pres_df_3['presidency_end_age'] = pres_df_3['presidency_end_age'].astype(float)
# Convert 'corrected_iq' column to float (if applicable)
pres_df_3['corrected_iq'] = pres_df_3['corrected_iq'].astype(float)
pres_df_3.head()
order | President | height_cm | height_in | weight_kg | weight_lb | body_mass_index | body_mass_index_range | birthplace | birth_state | death_age | astrological_sign | presidency_begin_age | presidency_end_age | political_party | corrected_iq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | George Washington | 188 | 74.0 | 79.4 | 175 | 22.5 | Normal | Westmoreland County | Virginia | 67.0 | Pisces | 57 | 65.0 | Unaffiliated | 140.0 |
1 | 2.0 | John Adams | 170 | 67.0 | 83.9 | 185 | 29.0 | Overweight | Braintree | Massachusetts | 90.0 | Scorpio | 61 | 65.0 | Federalist | 155.0 |
2 | 3.0 | Thomas Jefferson | 189 | 74.5 | 82.1 | 181 | 23.0 | Normal | Shadwell | Virginia | 83.0 | Aries | 57 | 65.0 | Democratic-Republican | 160.0 |
3 | 4.0 | James Madison | 163 | 64.0 | 55.3 | 122 | 20.8 | Normal | Port Conway | Virginia | 85.0 | Pisces | 57 | 65.0 | Democratic-Republican | 160.0 |
4 | 5.0 | James Monroe | 183 | 72.0 | 85.7 | 189 | 25.6 | Overweight | Monroe Hall | Virginia | 73.0 | Taurus | 58 | 66.0 | Democratic-Republican | 139.0 |
pres_df_4
No. | Name | Birthplace | Birthday | Life | Height | Children | Religion | Higher Education | Occupation | Military Service | Term | Party | Vice President | Previous Office | Economy | Foreign Affairs | Military Activity | Other Events | Legacy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | George Washington | Pope's Creek, VA | 22-Feb | 1732-1799 | 1.88 | 0 | Episcopalian | None | Plantation Owner, Soldier | Commander-in-Chief of the Continental Army in... | 1789-1797 | None, Federalist | John Adams | Commander-in-Chief | [' Hamilton established BUS', '1792 Coinage Ac... | ['1793 Neutrality in the France-Britain confli... | ['1794 Whiskey Rebellion'] | ['1791 Bill of Rights', '1792 Post Office foun... | He is universally regarded as one of the great... |
1 | 2 | John Adams | Braintree, MA | 30-Oct | 1735-1826 | 1.70 | 5 | Unitarian | Harvard | Lawyer, Farmer | none | 1797-1801 | Federalist | Thomas Jefferson | 1st Vice President of USA | ['1798 Progressive land value tax of up to 1% ... | ['1797 the XYZ Affair: a bribe of French agent... | ['1798–1800 The Quasi war. Undeclared naval wa... | ['1798 Alien & Sedition Act to silence critics... | One of the most experienced men ever to become... |
2 | 3 | Thomas Jefferson | Goochland County, VA | 13-Apr | 1743-1826 | 1.89 | 6 | unaffiliated Christian | College of William and Mary | Inventor,Lawyer, Architect | Colonel of Virginia militia (without real mili... | 1801-1809 | Democratic-Republican | Aaron Burr, George Clinton | 2nd Vice President of USA | ['1807 Embargo Act forbidding foreign trade in... | ['1805 Peace Treaty with Tripoli. Piracy stopp... | ['1801-05 Naval operation against Tripoli and ... | ['1803 The Louisiana purchase', '1804 12th Ame... | Probably the most intelligent man ever to occ... |
3 | 4 | James Madison | Port Conway, VA | 16-Mar | 1751-1836 | 1.63 | 0 | Episcopalian | Princeton | Plantation Owner, Lawyer | Colonel of Virginia militia (without real mili... | 1809-1817 | Democratic-Republican | George Clinton, Elbridge Gerry | Secretary of State | [' The first U.S. protective tariff was impose... | ['1814 The Treaty of Ghent ends the War of 1812'] | ['1811 Tippecanoe battle (Harrison vs. Chief T... | ['1811 Cumberland Road construction starts (fi... | His leadership in the War of 1812 was particul... |
4 | 5 | James Monroe | Monroe Hall, VA | 28-Apr | 1758-1831 | 1.83 | 2 | Episcopalian | College of William and Mary | Plantation Owner, Lawyer | Major of the Continental Army | 1817-1825 | Democratic-Republican | Daniel Tompkins | Secretary of War | ['1819 Panic of 1819 (too much land speculatio... | ['1823 Monroe Doctrine', '1818 49th parallel s... | ['1817 1st Seminole war against Seminole India... | ['1819 Florida ceded to US', "1820 Missouri Co... | His presidency contributed to national defense... |
5 | 6 | John Quincy Adams | Braintree, MA | 11-Jul | 1767-1848 | 1.70 | 4 | Unitarian | Harvard | Lawyer, Diplomat | none | 1825-1829 | Democratic-Republican | John Calhoun | Secretary of State | [' "Internal improvements" program (roads, por... | ['Unsuccessful attempt to purchase Texas from ... | ['None'] | [' Accused for "corrupt bargain" to obtain Cla... | He had been an excellent Secretary of State, m... |
6 | 7 | Andrew Jackson | Waxhaw, NC | 15-Mar | 1767-1845 | 1.85 | 0 | Presbyterian | None | Soldier, Lawyer | Major General of U.S. Army | 1829-1837 | Democratic | John Calhoun, Martin van Buren | Military Governor of Florida | ['1832 The Bank War. Veto for rechartering of ... | [' Texas wins independence'] | ['1836 Alamo. 6000 Mexicans defeat 190 America... | ['1830 Indian Removal Act', "1832 South Caroli... | Historians see in him both the best and the wo... |
7 | 8 | Martin van Buren | Kinderhook, NY | 5-Dec | 1782-1862 | 1.68 | 4 | Dutch Reformed | None | Lawyer | none | 1837-1841 | Democratic | Richard Johnson | 8th Vice President of USA | ['1837 The Panic of 1837. Financial crisis & d... | [' Recognition of Republic of Texas; annex avo... | ['1838 2nd Seminole war against Seminole India... | ['1838 "The Trail of Tears". Indians’ relocati... | An able man, but always regarded more as a shr... |
8 | 9 | William H. Harrison | Charles City County, VA | 9-Feb | 1773-1841 | 1.73 | 1 | Episcopalian | Hampden-Sydney College | Soldier | Major General of U.S. Army | 1841 | Whig | John Tyler | Minister to Colombia | ['None'] | ['None'] | ['None'] | ['1841 Delivered the longest inaugural address... | none |
9 | 10 | John Tyler | Charles City County, VA | 29-Mar | 1790-1862 | 1.83 | 1 | Episcopalian | College of William and Mary | Lawyer | Captain of Virginia militia | 1841-1845 | Whig, No Party | none | 10th Vice President of USA | ['Economic crisis initiated by the Panic of 18... | ['1842 Webster–Ashburton Treaty settles border... | ['1842 End of the 2nd Seminole war'] | ['1841 His cabinet resigned after he vetoed ba... | His presidency is held in low esteem but score... |
10 | 11 | James K. Polk | Pineville, NC | 2-Nov | 1795-1849 | 1.73 | 0 | Presbyterian | University of North Carolina | Lawyer, Plantation Owner | Colonel of Tennessee militia | 1845-1849 | Democratic | George Dallas | Governor of Tennessee | ['1846 Walker Tariff. Taxes reduced and fixed'] | ['1846 Agreement with Britain over Oregon. Bot... | ['1846 American-Mexican war. Mexico city captu... | ['1846 A large crack in the Liberty Bell.', '1... | Polk added more territory than had any other p... |
11 | 12 | Zachary Taylor | Barboursville, VA | 24-Nov | 1784-1850 | 1.73 | 6 | Episcopalian | None | Soldier | Major General U.S. Army | 1849-1850 | Whig | Millard Fillmore | Major General, U.S. Army | ['None'] | ['1850 Clayton–Bulwer Treaty with Britain: no ... | ['None'] | [' The question of extending slavery to the ne... | His blunt manner and unsophisticated style han... |
12 | 13 | Millard Fillmore | Moravia, NY | 7-Jan | 1800-1874 | 1.75 | 2 | Unitarian | None | Lawyer | Major - Union Continentals (Home Guard) , NY m... | 1850-1853 | Whig | none | 12th Vice President of USA | ['Expanding trade while limiting American comm... | [' Commodore Matthew C. Perry was sent to open... | ['None'] | ['1850 Compromise of 1850 and Fugitive Slave A... | Honest and hardworking but a pompous, colorles... |
13 | 14 | Franklin Pierce | Hillsborough, NH | 23-Nov | 1804-1869 | 1.78 | 3 | Episcopalian | Bowdoin College | Lawyer | Brigadier Gen. of Volunteers | 1853-1857 | Democratic | William King | Senator (NH) 1837-42 | ['Reforming the Treasury'] | ['1854 Ostend Manifesto. Crisis over a leaked ... | ['None'] | ['1853 Gadsden Purchase. Land from Mexico.', '... | As president, he made many divisive decisions ... |
14 | 15 | James Buchanan | Cove Gap, PA | 23-Apr | 1791-1868 | 1.83 | 0 | Presbyterian | Dickinson College | Lawyer, Diplomat | Private - U.S. Army | 1857-1861 | Democratic | John Breckinridge | Minister to the UK | ['1857 Tariff of 1857. Reduction. North compla... | ['Strengthening the influence of the United St... | ['1857 Utah War: 2500 soldiers were sent to ou... | ['1857 Dred Scott decision: States can decide ... | His administration was dominated by fighting b... |
15 | 16 | Abraham Lincoln | Hardin County, KY | 12-Feb | 1809-1865 | 1.93 | 4 | unaffiliated Christian | None | Land Surveyor, Lawyer | Captain of State militia | 1861-1865 | Republican | Hannibal Hamlin, Andrew Johnson | Congressman (Illinois) | ['None'] | ['Lincoln left the diplomatic issues in the ha... | ['1863-1865 Civil War'] | ['1863 Emancipation Proclamation, freeing slav... | The greatest U.S. President. He won the Civil ... |
16 | 17 | Andrew Johnson | Raleigh, NC | 29-Dec | 1808-1875 | 1.78 | 5 | unaffiliated Christian | None | Tailor | Brigadier General of Volunteers - military gov... | 1865-1869 | National Union | none | 16th Vice President of USA | ['Reconstruction plan in the south'] | ['1867 Treaty with Russia.'] | ['None'] | ['1865 Amnesty', '1867 Reconstruction Act & Of... | His conflict with Congress and his impeachment... |
17 | 18 | Ulysses S. Grant | Point Pleasant, OH | 27-Apr | 1822-1885 | 1.73 | 4 | Methodist | U.S. Military Academy | Soldier (General of the Army) | General of the Army | 1869-1877 | Republican | Schuyler Colfax, Henry Wilson | Commanding General of Army | ['1873 Depression & financial crisis', ' Resum... | ['1871 Treaty of Washington', '1875 Free trade... | ['1876 Battle of the Little Bighorn. Gen. Cust... | ['1871 Civil Service', '1870-71 Enforcement Ac... | An excellent general but a mediocre politician... |
18 | 19 | Rutherford Hayes | Delaware, OH | 4-Oct | 1822-1893 | 1.73 | 8 | Methodist | Kenyon College, Harvard | Lawyer | Major General of Volunteers | 1877-1881 | Republican | William Wheeler | Governor of Ohio | ['1878 Bland-Allison Act - Treasury buys silve... | ['1877 Granted the Army the power to pursue ba... | ['1877 Bear Paw Battle against Nez Perce India... | ['1877 Reconstruction end. Army withdrew from ... | An effective president, ending military occup... |
19 | 20 | James Garfield | Moreland Hills, OH | 19-Nov | 1831-1881 | 1.83 | 7 | Disciples of Christ | Williams College | School Teacher, Minister, Soldier | Major General of Volunteers | 1881 | Republican | Chester Arthur | Congressman (Ohio) | ['1881 Refinance of national debt'] | ['Call for a Pan-American conference to mediat... | ['None'] | ['1881 On July 2, he was shot by Charles Juliu... | In the 4 months before he was shot, he did not... |
20 | 21 | Chester Arthur | Fairfield, VT | 5-Oct | 1829-1886 | 1.88 | 3 | Episcopalian | Union College | Customs Collector of NY port | Quartermaster General of New York State militia | 1881-1885 | Republican | none | 20th Vice President of USA | ['1885 Tariff of 1875 continued protectionist ... | [' Treaty with Nicaragua to build a canal viol... | ['Start of the "Steel Navy"'] | ['1883 Pendleton Act: Civil hiring on merit'] | Despite his reputation as a leading spoilsmen ... |
21 | 22 | Grover Cleveland | Caldwell, NJ | 18-Mar | 1837-1908 | 1.80 | 5 | Disciples of Christ | None | Sheriff, Lawyer, Teacher | none | 1885-1889 | Democratic | Thomas Hendricks | Governor of New York | [' Ended coinage based on silver', '1888 Mills... | [' Refused to promote the previous administrat... | ['1886 Apache leader Geronimo was chased & su... | ['1886 Statue of Liberty', ' Curtailed largess... | He won praise for his honesty, independence, i... |
22 | 23 | Benjamin Harrison | North Bend, OH | 20-Aug | 1833-1901 | 1.68 | 3 | Presbyterian | Miami University | Lawyer, Journalist | Brigadier General of Vol. | 1889-1893 | Republican | Levi Morton | Senator (Indiana) | ['1890 Pension Act - money to the veterans', '... | ['1889 Formation of the Pan-American union'] | ['1890 "Wounded knee" massacre. 150 Sioux Indi... | ['1889 Opening of Oklahoma to 20,000 settlers'... | He was an effective leader but the economy de... |
23 | 24 | Grover Cleveland | Caldwell, NJ | 18-Mar | 1837-1908 | 1.80 | 5 | Presbyterian | None | Sheriff, Lawyer, Teacher | none | 1893-1897 | Democratic | Adlai Stevenson | 22nd President of USA | ['1893 Panic of 1893 and depression.', '1893 S... | ['1895 Controversy with Britain over Venezuela... | ['First ships of a navy capable of offensive a... | ['1893 Pullman strike.'] | His reforms made him an icon for conservatives... |
24 | 25 | William McKinley | Niles, OH | 29-Jan | 1843-1901 | 1.70 | 2 | Methodist | Allegheny College,Albany Law | Lawyer | Brevet Major of Volunteers in Civil War | 1897-1901 | Republican | Garret Hobart , Th. Roosevelt | Governor of Ohio | ['1897 Dingley Tariff. Highest ever.', '1900 G... | ['1899 Treaty of Paris. U.S. becomes a colonia... | ['1898 Sinking of USS Maine', '1898 Spanish-US... | ['1898 Yellow Journalism (Hyped Maine)', '1898... | His leadership and his actions affected profou... |
25 | 26 | Theodore Roosevelt | New York City, NY | 27-Oct | 1858-1919 | 1.78 | 6 | Dutch Reformed | Harvard, Columbia | Public Official, Rancher, Author | Colonel | 1901-1909 | Republican | Charles Fairbanks | 25th Vice President of USA | ['1907 Panic of 1907 ("Roosevelt Panic")', '19... | ['1903 Orchestrated Panama independence. Panam... | [' In spite of his militaristic attitudes, pea... | [' Conservation becomes an issue. Creation of ... | The first modern American president and one of... |
26 | 27 | William Taft | Cincinnati, OH | 15-Sep | 1857-1930 | 1.83 | 3 | Unitarian | University of Cincinnati, Yale | Judge, Dean of Law School | none | 1909-1913 | Republican | James Sherman | 10th Chief Justice of USA | ['1909 Payne-Aldrich Tariff. Unpopular. Duties... | [" 'Dollar Diplomacy' ; State dept. coordinate... | ['1912 2500 troops were sent to Nicaragua to p... | [' Record antitrust suits', '1912 New states: ... | A good administrator but without exceptional p... |
27 | 28 | Woodrow Wilson | Staunton, VA | 28-Dec | 1856-1924 | 1.80 | 3 | Presbyterian | Princeton, J. Hopkins | Professor, Political scientist | none | 1913-1921 | Democratic | Thomas Marshall | Governor of New Jersey | ['1913 Federal Reserve Act', '1913 Underwood T... | ['1919 Treaty of Versailles after WW I, "14 po... | ['1915 Occupation of Dominican Rep.', '1916 US... | ['1916 Child labor curtailed', '1916 Federal F... | Effective leadership in instituting a progress... |
28 | 29 | Warren Harding | Blooming Grove, OH | 2-Nov | 1865-1923 | 1.83 | 0 | Baptist | Ohio Central College | Newspaper Publisher/Editor | none | 1921-1923 | Republican | Calvin Coolidge | Senator (Ohio) | [' Tax cuts for the rich and the end of antitr... | ['1921 Knox–Porter Resolution: official end o... | ['1923 Posey War (a small conflict with Americ... | ['1921 Federal Highway Act - the age of the "... | Presided over one of the most corrupt administ... |
29 | 30 | Calvin Coolidge | Plymouth, VT | 4-Jul | 1872-1933 | 1.78 | 2 | Congregationalist | Amherst College | Lawyer, Banker | none | 1923-1929 | Republican | Charles Dawes | 29th Vice President of USA | [' "Roaring Twenties"-Rapid economic growth', ... | ['1928 Kellogg-Briand Pact (renouncement of wa... | ['1928 Clark Memorandum - concerned the United... | ['1924 Immigration Act limits immigrants from ... | A competent administrator and a shrewd politic... |
30 | 31 | Herbert C. Hoover | West Branch, IA | 10-Aug | 1874-1964 | 1.80 | 2 | Society of Friends (Quaker) | Stanford University | Engineer | none | 1929-1933 | Republican | Charles Curtis | Secretary of Commerce | ['1929 Stock Market Crash', ' The Great Depres... | ['1932 Stimson Doctrine: US would not recogniz... | ['He thrice threatened intervention in the Dom... | ['1932 Reconstruction Finance Corporation to p... | A qualified executive who failed to provide ef... |
31 | 32 | Franklin Roosevelt | Hyde Park, NY | 30-Jan | 1882-1945 | 1.88 | 6 | Episcopalian | Harvard, Columbia | Lawyer | none | 1933-1945 | Democratic | John Garner , Henry Wallace, Truman | Governor of New York | ['1933 Glass-Steagall Act to protect bank acco... | ['1935 Lend-Lease Act, allowing US to aid All... | ['1941 Pearl Harbor', '1941-45 World War II'] | ['1933 First 100 days legislation frenzy', '19... | The longest and one of the most acclaimed pres... |
32 | 33 | Harry S Truman | Lamar, MO | 8-May | 1884-1972 | 1.75 | 1 | Baptist | None | Farmer, Men'S Clothing Retailer | Colonel - U.S. Army | 1945-1953 | Democratic | Alben Barkley | 34th Vice President of USA | ['1946 Veto on Taft-Hartley Act regulating st... | ['1945 Potsdam Conference', '1947 Truman Doctr... | ['1945 Atomic bombs', '1945 End of WW II', '19... | ['1945 Fair Deal: health care, civil rights et... | Unexpectedly a very efficient replacement. Sha... |
33 | 34 | Dwight Eisenhower | Denison, TX | 14-Oct | 1890-1969 | 1.78 | 2 | Presbyterian | U.S. Military Academy | Soldier, General | General of the Army | 1953-1961 | Republican | Richard Milhous Nixon | Sup. Allied Commander Europe | ['1956 Federal-Aid Highway Act - National high... | ['1954 Geneva Conference (SEATO)', '1956 Suez ... | ['1953 End of Korean War', '1958 USA troops in... | [' Alaska and Hawaii admitted as states', '195... | After a glorious military career, as president... |
34 | 35 | John F. Kennedy | Brookline, MA | 29-May | 1917-1963 | 1.83 | 3 | Roman Catholic | Harvard, Stanford | U.S. Navy Officer, Author | Lieutenant - U.S. Navy | 1961-1963 | Democratic | Lyndon Johnson | Senator ( MA) | [' “New Frontier”: Tax reduction and other ref... | ['1961 Vienna Summit', '1961 Alliance for Pro... | ['1963 “Advisers” attached to the South Vietn... | ['1961 Peace Corps program', '1961 "Moon race"... | His youth, vigor, and style brought a fresh ai... |
35 | 36 | Lyndon Johnson | Stonewall, TX | 27-Aug | 1908-1973 | 1.92 | 2 | Disciples of Christ | Texas State, Georgetown | Teacher, Public Official | Commander - U.S. Navy | 1963-1969 | Democratic | Hubert Humphrey | 37th Vice President of USA | ['1964 Revenue Act & Economic Opportunity Act... | ['1968 Paris Peace Talks'] | ['1965 Gulf of Tonkin Resolution - president g... | ['1964 The Civil Rights Act', '1964 Great Soci... | Passed his Great Society domestic programs and... |
36 | 37 | Richard Nixon | Yorba Linda, CA | 9-Jan | 1913-1994 | 1.80 | 2 | Society of Friends (Quaker) | Whittier College, Duke Law | Lawyer, Public Official | Commander - U.S. Navy | 1969-1974 | Republican | Spiro Agnew , Gerald R. Ford | 36th Vice President of USA | ['1973 OPEC embargo & Oil crisis'] | ['1971 Nixon visits China; "Ping Pong diplomac... | ['1970 Expansion of war to Cambodia and Laos'... | ['1969 Moon landing', '1970 Environment Act', ... | Although he ended U.S. involvement in the Viet... |
37 | 38 | Gerald R. Ford | Omaha, NE | 14-Jul | 1913- 2006 | 1.83 | 4 | Episcopalian | University of Michigan, Yale | Lawyer, Public Official | Lt. Commander -U.S. Navy | 1974-1977 | Republican | Nelson Rockefeller | 40th Vice President of USA | [' Recession & Inflation. The worst economy si... | ['1975 Evacuation of US embassy in Saigon', '1... | ['1974 Official end of the Vietnam War', '1975... | ['1974 Granted a pardon to Nixon.', '1975 Air... | A congressional president whose historic role ... |
38 | 39 | Jimmy Carter | Plains, GA | 1-Oct | 1924- | 1.77 | 4 | Baptist | US Naval Academy | Navy Officer, Peanut Farmer | Lieutenant - U.S. Navy | 1977-1981 | Democratic | Walter Mondale | Governor of Georgia | ['1979 Beer market deregulation', '1978 Airlin... | ['1979 Camp-David Accords between Israel and E... | ['1980 Revoked the Sino-American Mutual Defens... | [' Pardoned Vietnam War draft evaders', ' Ene... | Intelligent and hardworking but a DC outsider.... |
39 | 40 | Ronald Reagan | Tampico, IL | 6-Feb | 1911- 2004 | 1.85 | 4 | Christian Church | Eureka College | Actor, Union leaser | Captain- U.S. Army | 1981-1989 | Republican | George H. W. Bush | Governor of California | [' "Reaganomics": tax cuts, gov’t downsizing',... | ['1983 Strategic Defense Initiative ("Star War... | ['1983 241 Marines, of a multinational force, ... | ['1981 Assassination attempt by John W. Hinkle... | While his aptitude for the job was often quest... |
40 | 41 | George H. W. Bush | Milton, MA | 12-Jun | 1924- 2018 | 1.88 | 6 | Episcopalian | Yale University | Businessman (Oil) | Lieutenant-U.S. Navy | 1989-1993 | Republican | James Danforth Quayle | 43rd Vice President of USA | [' Increased taxes despite his campaign promis... | ['1989 Berlin Wall falls.', '1991 Dissolution ... | ['1989-90 Panama invasion. Noriega arrested', ... | ['1990 Americans with Disabilities Act', '1990... | He took few domestic initiatives and the econo... |
41 | 42 | Bill Clinton | Hope, AR | 19-Aug | 1946- | 1.88 | 1 | Baptist | Georgetown, Yale, Oxford | Lawyer, Law Lecturer | none | 1993-2001 | Democratic | Al Gore | Governor of Arkansas | ['1993-2001 Sustained economic growth and succ... | ['1993 Oslo accords; Isr./PLO', '1995 Dayton B... | ['1993 Mogadishu Battle : 2 Black Hawks down, ... | ["1993 “Don't ask, don't tell”- gays in the mi... | The longest period of peacetime economic expan... |
42 | 43 | George W. Bush | New Haven, CT | 6-Jul | 1946- | 1.82 | 2 | Methodist | Yale, Harvard | Businessman (Oil, Baseball) | Lieutenant - Air Force | 2001-2009 | Republican | Richard Cheney | Governor of Texas | ['2001, 2003 Bush Tax cuts', '2008 Financial c... | [' Iraq’ s "Weapons of mass destruction" hoax... | ['2001 War against the Afghanistan Talibans', ... | ['2001 9/11', '2001 Patriot Act', '2002 “no ch... | He left as one of the least popular and most d... |
43 | 44 | Barack Obama | Honolulu, HI | 4-Aug | 1961- | 1.87 | 2 | unaffiliated Christian | Columbia, Harvard | Law Professor | none | 2009-2017 | Democratic | Joseph Biden | Senator (Illinois) | ['2009 Economic Stimulus: Signed $787 bn for R... | [" 'Leading from behind' stance in Mid East g... | ['2011 Death of Osama bin Laden.', '2011 Iraq ... | ['2010 Healthcare reform: Affordable Care Act ... | Important changes on healthcare, education, c... |
44 | 45 | Donald Trump | Queens, New York, NY | 14-Jun | 1946- | 1.88 | 5 | Presbyterian | Fordham, Pennsylvania | Real estate | none | 2017-2021 | Republican | Michael R. Pence | none | [' Permanent cuts to the corporate tax rate, f... | [' US abandon Paris Climate Accord and WHO', '... | ['2019 US Space Force is founded.'] | ['2020 Covid-19 pandemic', ' Impeached twice',... | Polarizing leadership style, controversial pol... |
45 | 46 | Joe Biden | Scranton, PA | 20-Nov | 1942- | 1.82 | 3 | Roman Catholic | Syracuse University College | Lawyer | none | 2021- | Democratic | Kamala Harris | 47th Vice President of USA | ['Postal Service Reform Act of 2022', 'Signed ... | ['2022-2023 Significant U.S. Military Assistan... | ['Withdrawal from Afghanistan', '2022 Countert... | ['Biden pledged to double climate funding to d... | none |
pres_df_4.dtypes
No. int64 Name object Birthplace object Birthday object Life object Height float64 Children int64 Religion object Higher Education object Occupation object Military Service object Term object Party object Vice President object Previous Office object Economy object Foreign Affairs object Military Activity object Other Events object Legacy object dtype: object
# understand Economy, Foreign Affairs, Military Activity, Other Events, Legacy
for i in range(0,46):
print(pres_df_4['Legacy'].loc[i])
print()
He is universally regarded as one of the greatest figures in U.S. history. “First in war, first in peace, and first in the hearts of his country” One of the most experienced men ever to become President. Played a major role in the movement for independence. By the end of his term, he was unpopular, respected but not beloved. Probably the most intelligent man ever to occupy the White House. Of broad interests and activity, he exerted an immense influence on the future of the new nation. His leadership in the War of 1812 was particularly inept. But the young nation emerged united and strong, and Madison enjoyed tremendous popularity and respect during his last years. His presidency contributed to national defense and security. The Monroe Doctrine became a landmark in American foreign policy. His time in office was called the "Era of Good Feeling". He had been an excellent Secretary of State, maybe the best in the history of the U.S. But as a President he was not allowed by a hostile Congress to be successful. Historians see in him both the best and the worst of the new Republic. Associated with the movement toward increased popular participation in government, the "Jacksonian democracy". An able man, but always regarded more as a shrewd politician and a manipulator. His Presidency was a failure. It was marked by the financial crisis, the Panic of 1837. He was called "Martin Van Ruin". none His presidency is held in low esteem but scored a victory, the Texas annexation. Expelled from his party while in office and without followers was powerless and yet effective and rather underrated Polk added more territory than had any other president except Thomas Jefferson and made U.S. a coast-to-coast nation. He was one of the greatest presidents. His blunt manner and unsophisticated style handicapped him as president. Because of his short tenure, Taylor is not considered to have strongly influenced the U.S. Honest and hardworking but a pompous, colorless individual who rose far beyond his ability. The Compromise of 1850 preserved the Union for a while but destroyed his career. As president, he made many divisive decisions which were widely criticized and earned him a reputation as one of the worst presidents in U.S. history. His administration was dominated by fighting between pro-and antislavery forces. Few presidents have entered office with more experience, and few have so decisively failed. Probably the worst president. The greatest U.S. President. He won the Civil War and preserved the Union. His twin policies, emancipation of slaves and reconciliation of North and South, were his greatest legacies. His conflict with Congress and his impeachment weakened the Presidency for decades. He is considered one the worst American presidents. But he got Alaska. And Reconstruction was a major policy… An excellent general but a mediocre politician. He won the Civil War, but his Presidency was rather a failure with scandals and economic depression. History has been rather unfair to him. An effective president, ending military occupation of southern states; reforming the civil service, putting the country back on the gold standard, and starting the Gilded Age: enormous growth with serious social unrest. In the 4 months before he was shot, he did not accomplish much. He served the second-shortest term of any President. He endured Congress pressure on executive appointments. Despite his reputation as a leading spoilsmen in American politics, he proved to be a dignified and able administrator. A little-known presidency but no duty was neglected in his tenure and no problem alarmed the nation. He won praise for his honesty, independence, integrity, and commitment to the principles of classical liberalism. He relentlessly fought political corruption, patronage, and bossism. He was an effective leader but the economy deteriorated. Inflation, joblessness and labor unrest marked his presidency. His lackluster personality made his administration seem colorless. His reforms made him an icon for conservatives but his efforts to stem economic depression were not successful, and the conservative means he used to settle internal industrial conflicts were unpopular. His leadership and his actions affected profoundly the future of the USA. His victory on the Spanish-American war transformed the Presidency into an office of world leadership. The first modern American president and one of the most dynamic and popular. He radically reformed the government and changed the political system. One of the top 5 presidents. A good administrator but without exceptional political and leadership skills. He failed to rise adequately to the challenges of the times, despite his many strong qualities. Effective leadership in instituting a progressive domestic program. His foreign policies were marked by victory in World War I and passionate promotion of the League of Nations. Presided over one of the most corrupt administrations. Very popular as president, he was later regarded as one of the worst presidents. Hardworking though. He pushed a pro-business agenda. A competent administrator and a shrewd politician.Very popular in a period of rapid growth and prosperity but perhaps too complacent and inactive despite signs of a Depression. A qualified executive who failed to provide effective leadership in the most severe crisis. He could not halt and manage the Great Depression. His beliefs did not allow him to take drastic steps. The longest and one of the most acclaimed presidencies in American history. He led the United States, with absolute success, out of the Great Depression and later in victory in the World War II. Unexpectedly a very efficient replacement. Shaped the world after WW II, more than any other man. He led the successful transition from wartime to peacetime economy. Unpopular in the end, today he is ranked amongst the top presidents. After a glorious military career, as president, he negotiated the end of the Korean War and pursued moderate policies. He presided over a period of growth and prosperity, at the peak of the Cold War. His youth, vigor, and style brought a fresh air in the presidency. He revived the New Deal and Fair Deal programs and continued containment to prevent spread of communism (Vietnam, Cuba). His assassination shocked the world. Passed his Great Society domestic programs and pushed "War on Poverty". He escalated U.S. involvement in Vietnam, sending more than 500,000 troops to fight. He left US deeply divided. Although he ended U.S. involvement in the Vietnam War and won diplomatic agreements with the Soviet Union and China, he is remembered for Watergate and as the only president who resigned from office. A congressional president whose historic role was to mop up -effectively- the dregs of the two most damaging episodes in the history of the modern White House: Watergate and Vietnam. Intelligent and hardworking but a DC outsider.He pursued foreign policy with emphasis on human rights and peace. He lost a 2nd term because of the Panama Canal Treaty, the prolonged Iran hostage crisis and the stagnant economy. While his aptitude for the job was often questioned, he was always very popular. Reaganomics stimulated growth but USA became the largest debtor. His confrontational policies with the Soviets ended the Cold War shortly after he left office. He took few domestic initiatives and the economy had problems, but was successful in foreign affairs, deposing Panama’s dictator Noriega and fighting the Gulf War in Iraq. The Cold War was ended in his watch. The longest period of peacetime economic expansion in USA history. A not so distant period before the WTC fell, before U.S. troops bogged down in Iraq, before recession. And also before the scandals and the bitter partisan battles of the 1990s. He left as one of the least popular and most divisive presidents in American history. The Iraq war, the bungled response to Hurricane Katrina, the 2008 economic crisis, has brought the worst collapse in America's reputation since WW II. Important changes on healthcare, education, climate. finance. Obamacare was a defining issue. Economy bounced back but growth was anemic. Foreign policy left the world more insecure. Polarizing leadership style, controversial policies, and challenges to democratic institutions, resulting in a divided nation and impeachment. none
we noticed in pres_df_4 those things:
Based on the observations in pres_df_4
, we can proceed with the following tasks:
# Task 1: Drop the existing index column
pres_df_4.drop(columns=['No.'], inplace=True)
# Task 2: Drop columns that we already have in other dataframes
pres_df_4.drop(columns=['Birthday', 'Life', 'Height'], inplace=True)
Task 3: Understand the content of the remaining columns
# Task 4: Rename the column "name" to "President"
pres_df_4.rename(columns={'Name': 'President'}, inplace=True)
pres_df_4.loc[pres_df_4['President'] == "George Washington", 'Party'] = "Unaffiliated"
pres_df_4.loc[pres_df_4['President'] == "John Tyler", 'Party'] = "Whig"
In this code, we addressed the tasks as follows:
The data in pres_df_4
should now be ready for further exploration and analysis, considering that the irrelevant columns were dropped, and additional context for the remaining columns was provided.
pres_df_5.head()
year | name | party | term | salary | position_title | |
---|---|---|---|---|---|---|
0 | 1789 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
1 | 1790 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
2 | 1791 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
3 | 1792 | Washington,George | Unaffiliated | First | 25000 | PRESIDENT OF THE UNITED STATES |
4 | 1793 | Washington,George | Unaffiliated | Second | 25000 | PRESIDENT OF THE UNITED STATES |
pres_df_5.dtypes
year int64 name object party object term object salary int64 position_title object dtype: object
We observed the following in pres_df_5
, which is a stand-alone dataset and will not be merged with other datasets for analysis:
Note: The new format for the "President" column is "[First Name] [Last Name] [Suffix]" (if applicable). First and last names can consist of multiple names, separated by a space, and any suffix (Jr., Sr., I, II, III, IV) is added at the end, separated by a comma.
# Function to modify the names
def modify_name(name):
name_parts = name.split(',')
first_name_parts = name_parts[1].split()
# Get the first name
first_name = first_name_parts[0]
# Check if the first name is composed of two names
if len(first_name_parts) > 1:
second_name = first_name_parts[1][0] # Take the first letter of the second name
else:
second_name = ""
last_name = name_parts[0]
try:
# Check for suffix
suffix = name_parts[2]
except IndexError:
suffix = ""
if len(first_name_parts) > 1:
new_name = f"{first_name} {second_name}. {last_name}"
else:
new_name = f"{first_name} {last_name}"
# Create new name format
#new_name = f"{first_name} {second_name} {last_name} {suffix}"
return new_name.strip()
# Apply the modify_name function to the 'name' column
pres_df_5['President'] = pres_df_5['name'].apply(modify_name)
# Drop the original 'name' column
pres_df_5.drop(columns=['name'], inplace=True)
pres_df_5[pres_df_5['year']==2016]
year | party | term | salary | position_title | President | |
---|---|---|---|---|---|---|
227 | 2016 | Democratic | Second | 400000 | PRESIDENT OF THE UNITED STATES | Barack H. Obama |
424 | 2016 | Democratic | Second | 230700 | VICE PRESIDENT OF THE UNITED STATES | Joseph R. Biden |
pres_df_6
Unnamed: 0 | S.No. | start | end | president | prior | party | vice | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | April 30, 1789 | March 4, 1797 | George Washington | Commander-in-Chief of the Continental Army ... | Nonpartisan [13] | John Adams |
1 | 1 | 2 | March 4, 1797 | March 4, 1801 | John Adams | 1st Vice President of the United States | Federalist | Thomas Jefferson |
2 | 2 | 3 | March 4, 1801 | March 4, 1809 | Thomas Jefferson | 2nd Vice President of the United States | Democratic- Republican | Aaron Burr |
3 | 3 | 4 | March 4, 1809 | March 4, 1817 | James Madison | 5th United States Secretary of State (1801–... | Democratic- Republican | George Clinton |
4 | 4 | 5 | March 4, 1817 | March 4, 1825 | James Monroe | 7th United States Secretary of State (1811–... | Democratic- Republican | Daniel D. Tompkins |
5 | 5 | 6 | March 4, 1825 | March 4, 1829 | John Quincy Adams | 8th United States Secretary of State (1817–... | Democratic- Republican | John C. Calhoun |
6 | 6 | 7 | March 4, 1829 | March 4, 1837 | Andrew Jackson | U.S. Senator ( Class 2 ) from Tennessee ... | Democratic | John C. Calhoun |
7 | 7 | 8 | March 4, 1837 | March 4, 1841 | Martin Van Buren | 8th Vice President of the United States | Democratic | Richard Mentor Johnson |
8 | 8 | 9 | March 4, 1841 | April 4, 1841 | William Henry Harrison | United States Minister to Colombia (1828–1829) | Whig | John Tyler |
9 | 9 | 10 | April 4, 1841 | March 4, 1845 | John Tyler | 10th Vice President of the United States | Whig April 4, 1841 – September 13, 1841 | Office vacant |
10 | 10 | 11 | March 4, 1845 | March 4, 1849 | James K. Polk | 9th Governor of Tennessee (1839–1841) | Democratic | George M. Dallas |
11 | 11 | 12 | March 4, 1849 | July 9, 1850 | Zachary Taylor | Major General of the 1st Infantry Regiment ... | Whig | Millard Fillmore |
12 | 12 | 13 | July 9, 1850 | March 4, 1853 | Millard Fillmore | 12th Vice President of the United States | Whig | Office vacant |
13 | 13 | 14 | March 4, 1853 | March 4, 1857 | Franklin Pierce | Brigadier General of the 9th Infantry Unit... | Democratic | William R. King |
14 | 14 | 15 | March 4, 1857 | March 4, 1861 | James Buchanan | United States Minister to the Court of St J... | Democratic | John C. Breckinridge |
15 | 15 | 16 | March 4, 1861 | April 15, 1865 | Abraham Lincoln | U.S. Representative for Illinois' 7th Distri... | Republican ( National Union ) [i] | Hannibal Hamlin |
16 | 16 | 17 | April 15, 1865 | March 4, 1869 | Andrew Johnson | 16th Vice President of the United States | National Union [i] ( Democratic ) [j] | Office vacant |
17 | 17 | 18 | March 4, 1869 | March 4, 1877 | Ulysses S. Grant | Commanding General of the U.S. Army ( 1864–... | Republican | Schuyler Colfax |
18 | 18 | 19 | March 4, 1877 | March 4, 1881 | Rutherford B. Hayes | 29th & 32nd Governor of Ohio (1868–1872 & 1... | Republican | William A. Wheeler |
19 | 19 | 20 | March 4, 1881 | September 19, 1881 | James A. Garfield | U.S. Representative for Ohio's 19th District... | Republican | Chester A. Arthur |
20 | 20 | 21 | September 19, 1881 | March 4, 1885 | Chester A. Arthur | 20th Vice President of the United States | Republican | Office vacant |
21 | 21 | 22 | March 4, 1885 | March 4, 1889 | Grover Cleveland | 28th Governor of New York (1883–1885) | Democratic | Thomas A. Hendricks |
22 | 22 | 23 | March 4, 1889 | March 4, 1893 | Benjamin Harrison | U.S. Senator ( Class 1 ) from Indiana (... | Republican | Levi P. Morton |
23 | 23 | 24 | March 4, 1893 | March 4, 1897 | Grover Cleveland | 22nd President of the United States (1885–1... | Democratic | Adlai Stevenson |
24 | 24 | 25 | March 4, 1897 | September 14, 1901 | William McKinley | 39th Governor of Ohio (1892–1896) | Republican | Garret Hobart |
25 | 25 | 26 | September 14, 1901 | March 4, 1909 | Theodore Roosevelt | 25th Vice President of the United States | Republican | Office vacant |
26 | 26 | 27 | March 4, 1909 | March 4, 1913 | William Howard Taft | 42nd United States Secretary of War (1904–1... | Republican | James S. Sherman |
27 | 27 | 28 | March 4, 1913 | March 4, 1921 | Woodrow Wilson | 34th Governor of New Jersey (1911–1913) | Democratic | Thomas R. Marshall |
28 | 28 | 29 | March 4, 1921 | August 2, 1923 | Warren G. Harding | U.S. Senator ( Class 3 ) from Ohio (191... | Republican | Calvin Coolidge |
29 | 29 | 30 | August 2, 1923 | March 4, 1929 | Calvin Coolidge | 29th Vice President of the United States | Republican | Office vacant |
30 | 30 | 31 | March 4, 1929 | March 4, 1933 | Herbert Hoover | 3rd United States Secretary of Commerce (19... | Republican | Charles Curtis |
31 | 31 | 32 | March 4, 1933 | January 20, 1941 | Franklin D. Roosevelt | 44th Governor of New York ( 1929–1932 ) | Democratic | John Nance Garner |
32 | 32 | 33 | April 12, 1945 | January 20, 1953 | Harry S. Truman | 34th Vice President of the United States | Democratic | Office vacant |
33 | 33 | 34 | January 20, 1953 | January 20, 1961 | Dwight D. Eisenhower | Supreme Allied Commander Europe ( 1949–1952 ) | Republican | Richard Nixon |
34 | 34 | 35 | January 20, 1961 | November 22, 1963 | John F. Kennedy | U.S. Senator ( Class 1 ) from Massachuset... | Democratic | Lyndon B. Johnson |
35 | 35 | 36 | November 22, 1963 | January 20, 1969 | Lyndon B. Johnson | 37th Vice President of the United States | Democratic | Office vacant |
36 | 36 | 37 | January 20, 1969 | August 9, 1974 | Richard Nixon | 36th Vice President of the United States (1... | Republican | Spiro Agnew |
37 | 37 | 38 | August 9, 1974 | January 20, 1977 | Gerald Ford | 40th Vice President of the United States | Republican | Office vacant |
38 | 38 | 39 | January 20, 1977 | January 20, 1981 | Jimmy Carter | 76th Governor of Georgia (1971–1975) | Democratic | Walter Mondale |
39 | 39 | 40 | January 20, 1981 | January 20, 1989 | Ronald Reagan | 33rd Governor of California ( 1967–1975 ) | Republican | George H. W. Bush |
40 | 40 | 41 | January 20, 1989 | January 20, 1993 | George H. W. Bush | 43rd Vice President of the United States | Republican | Dan Quayle |
41 | 41 | 42 | January 20, 1993 | January 20, 2001 | Bill Clinton | 40th & 42nd Governor of Arkansas (1979–1981... | Democratic | Al Gore |
42 | 42 | 43 | January 20, 2001 | January 20, 2009 | George W. Bush | 46th Governor of Texas ( 1995–2000 ) | Republican | Dick Cheney |
43 | 43 | 44 | January 20, 2009 | NaN | Barack Obama | U.S. Senator ( Class 3 ) from Illinois ... | Democratic | Joe Biden |
44 | 44 | 45 | January 20, 2017 | -- | Donald Trump | Chairman of The Trump Organization ( 1971–... | Republican | Mike Pence |
pres_df_6.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 45 entries, 0 to 44 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 45 non-null int64 1 S.No. 45 non-null int64 2 start 45 non-null object 3 end 44 non-null object 4 president 45 non-null object 5 prior 45 non-null object 6 party 45 non-null object 7 vice 45 non-null object dtypes: int64(2), object(6) memory usage: 2.9+ KB
Note: Upon closer examination, it has come to our attention that the data present in pres_df_6
duplicates information that we already have in other datasets. Therefore, there is no need to retain all the columns or data from pres_df_6
, as it would only result in unnecessary redundancy. We can focus on integrating new and unique data points that would add more value and insights to our analysis instead.
pres_df_7.head()
Year | GDP | GDP per capita (in US$ PPP) | GDP (in Bil. US$nominal) | GDP per capita (in US$ nominal) | GDP growth % | Inflation rate % | Unemployment % | Government debt (in % of GDP) | Presidents | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1981 | 3207.0 | 13948.7 | 3207.0 | 13948.7 | 2.50% | 10.40% | 7.60% | 31.00% | Ronald Reagan |
1 | 1982 | 3343.8 | 14405.0 | 3343.8 | 14405.0 | -1.80% | 6.20% | 9.70% | 34.00% | Ronald Reagan |
2 | 1983 | 3634.0 | 15513.7 | 3634.0 | 15513.7 | 4.60% | 3.20% | 9.60% | 37.00% | Ronald Reagan |
3 | 1984 | 4037.7 | 17086.4 | 4037.7 | 17086.4 | 7.20% | 4.40% | 7.50% | 38.00% | Ronald Reagan |
4 | 1985 | 4339.0 | 18199.3 | 4339.0 | 18199.3 | 4.20% | 3.50% | 7.20% | 41.00% | Ronald Reagan |
pres_df_7['Presidents'].value_counts()
Ronald Reagan 8 Bill Clinton 8 George W. Bush 8 Barack Obama 8 George Bush 4 Donald Trump 4 Joe Biden 3 Name: Presidents, dtype: int64
We will handle this dataset separately since it contains information for only specific presidents. Specifically, it provides data for the following presidents and the corresponding number of records available for each:
Given the limited data for each president, we will perform individualized analysis on their respective records to gain insights and draw conclusions relevant to each specific administration.
# Set the option to display all columns
pd.set_option('display.max_columns', None)
pres_df_8.head()
Unnamed: 0 | Year | CPI | GDPdeflator | population.K | realGDPperCapita | executive | war | battleDeaths | battleDeathsPMP | Keynes | unemployment | unempSource | fedReceipts | fedOutlays | fedSurplus | fedDebt | fedReceipts_pGDP | fedOutlays_pGDP | fedSurplus_pGDP | fedDebt_pGDP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1610 | 1610 | NaN | NaN | 0.350 | NaN | JamesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1620 | 1620 | NaN | NaN | 2.302 | NaN | JamesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1630 | 1630 | NaN | NaN | 4.646 | NaN | CharlesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1640 | 1640 | NaN | NaN | 26.634 | NaN | CharlesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1650 | 1650 | NaN | NaN | 50.368 | NaN | Cromwell | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
pres_df_8.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 265 entries, 0 to 264 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 265 non-null int64 1 Year 265 non-null int64 2 CPI 248 non-null float64 3 GDPdeflator 231 non-null float64 4 population.K 249 non-null float64 5 realGDPperCapita 231 non-null float64 6 executive 265 non-null object 7 war 56 non-null object 8 battleDeaths 248 non-null float64 9 battleDeathsPMP 232 non-null float64 10 Keynes 265 non-null int64 11 unemployment 222 non-null float64 12 unempSource 222 non-null object 13 fedReceipts 231 non-null float64 14 fedOutlays 231 non-null float64 15 fedSurplus 231 non-null float64 16 fedDebt 232 non-null float64 17 fedReceipts_pGDP 230 non-null float64 18 fedOutlays_pGDP 230 non-null float64 19 fedSurplus_pGDP 230 non-null float64 20 fedDebt_pGDP 231 non-null float64 dtypes: float64(15), int64(3), object(3) memory usage: 43.6+ KB
We noticed the following points in pres_df_8
:
# Task 2: We need to delete the column named "Unnamed: 0" as it seems to be an unnecessary index column.
pres_df_8.drop('Unnamed: 0', axis=1)
Year | CPI | GDPdeflator | population.K | realGDPperCapita | executive | war | battleDeaths | battleDeathsPMP | Keynes | unemployment | unempSource | fedReceipts | fedOutlays | fedSurplus | fedDebt | fedReceipts_pGDP | fedOutlays_pGDP | fedSurplus_pGDP | fedDebt_pGDP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1610 | NaN | NaN | 0.350 | NaN | JamesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 1620 | NaN | NaN | 2.302 | NaN | JamesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 1630 | NaN | NaN | 4.646 | NaN | CharlesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 1640 | NaN | NaN | 26.634 | NaN | CharlesI | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 1650 | NaN | NaN | 50.368 | NaN | Cromwell | NaN | NaN | NaN | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
260 | 2017 | 245.12 | 107.75 | 325220.000 | 55590.0 | Trump | NaN | 0.0 | 0.0 | 0 | 4.358333 | BLS | 3316184.0 | 3981630.0 | -665446.0 | 2.024490e+13 | 0.170239 | 0.204400 | -0.034161 | 1.039287 |
261 | 2018 | 251.11 | 110.32 | 326949.000 | 56910.0 | Trump | NaN | 0.0 | 0.0 | 0 | 3.891667 | BLS | 3329907.0 | 4109045.0 | -779138.0 | 2.151606e+13 | 0.162219 | 0.200176 | -0.037956 | 1.048173 |
262 | 2019 | 255.66 | 112.29 | 328527.000 | 57933.0 | Trump | NaN | 0.0 | 0.0 | 0 | 3.675000 | BLS | 3463364.0 | 4446956.0 | -983592.0 | 2.271940e+13 | 0.162047 | 0.208068 | -0.046021 | 1.063015 |
263 | 2020 | 258.81 | 113.65 | 330152.000 | 55685.0 | Trump | NaN | 0.0 | 0.0 | 0 | 8.091667 | BLS | 3421164.0 | 6553603.0 | -3132439.0 | 2.694539e+13 | 0.163741 | 0.313664 | -0.149923 | 1.289642 |
264 | 2021 | 270.97 | NaN | NaN | NaN | Biden | NaN | 0.0 | NaN | 0 | 5.358333 | BLS | 4047112.0 | 6822449.0 | -2775337.0 | 2.842892e+13 | NaN | NaN | NaN | NaN |
265 rows × 20 columns
We will now begin the process of integrating the data sets to create a comprehensive and enriched presidential dataset. As part of this integration, we will merge certain data sets that share common attributes, such as the "President" column, to consolidate the information. On the other hand, some data sets will remain as individual and separate tables, as they contain unique or specific information related to certain aspects of the presidents' history.
To facilitate the understanding of which data sets will be integrated and which will remain standalone, we will create a new table or list to outline the data set integration plan. This table will list the names of the data sets and indicate whether they will be merged into the master presidential dataset or kept separate.
Data Set Integration Plan:
By following this integration plan, we aim to create a comprehensive and rich dataset that encompasses various aspects of each president's life, achievements, events, and ratings, while also maintaining the individuality of certain data sets that offer unique insights. This integrated dataset will serve as a valuable resource for further analysis and exploration of the history and legacy of the United States presidents.
# Merge pres_df_1, pres_df_3, and pres_df_4 based on their indices
pres_merged_df = pres_df_1.merge(pres_df_3, left_index=True, right_index=True)
pres_merged_df = pres_merged_df.merge(pres_df_4, left_index=True, right_index=True)
# Check the result
pres_merged_df.head()
President_x | Born | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | End of Presidency Age | End of Presidency Date | order | President_y | height_cm | height_in | weight_kg | weight_lb | body_mass_index | body_mass_index_range | birthplace | birth_state | death_age | astrological_sign | presidency_begin_age | presidency_end_age | political_party | corrected_iq | President | Birthplace | Children | Religion | Higher Education | Occupation | Military Service | Term | Party | Vice President | Previous Office | Economy | Foreign Affairs | Military Activity | Other Events | Legacy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | George Washington | 1732-02-22 | 1015 days 00:00:00 | 1799-12-14 | 24750 days | 1789-04-30 | 20872 days | 23735 days | 1797-03-04 | 1.0 | George Washington | 188 | 74.0 | 79.4 | 175 | 22.5 | Normal | Westmoreland County | Virginia | 67.0 | Pisces | 57 | 65.0 | Unaffiliated | 140.0 | George Washington | Pope's Creek, VA | 0 | Episcopalian | None | Plantation Owner, Soldier | Commander-in-Chief of the Continental Army in... | 1789-1797 | Unaffiliated | John Adams | Commander-in-Chief | [' Hamilton established BUS', '1792 Coinage Ac... | ['1793 Neutrality in the France-Britain confli... | ['1794 Whiskey Rebellion'] | ['1791 Bill of Rights', '1792 Post Office foun... | He is universally regarded as one of the great... |
1 | John Adams | 1735-10-30 | 9247 days 00:00:00 | 1826-07-04 | 33097 days | 1797-03-04 | 22390 days | 23850 days | 1801-03-04 | 2.0 | John Adams | 170 | 67.0 | 83.9 | 185 | 29.0 | Overweight | Braintree | Massachusetts | 90.0 | Scorpio | 61 | 65.0 | Federalist | 155.0 | John Adams | Braintree, MA | 5 | Unitarian | Harvard | Lawyer, Farmer | none | 1797-1801 | Federalist | Thomas Jefferson | 1st Vice President of USA | ['1798 Progressive land value tax of up to 1% ... | ['1797 the XYZ Affair: a bribe of French agent... | ['1798–1800 The Quasi war. Undeclared naval wa... | ['1798 Alien & Sedition Act to silence critics... | One of the most experienced men ever to become... |
2 | Thomas Jefferson | 1743-04-13 | 6327 days 00:00:00 | 1826-07-04 | 30377 days | 1801-03-04 | 21130 days | 24050 days | 1809-03-04 | 3.0 | Thomas Jefferson | 189 | 74.5 | 82.1 | 181 | 23.0 | Normal | Shadwell | Virginia | 83.0 | Aries | 57 | 65.0 | Democratic-Republican | 160.0 | Thomas Jefferson | Goochland County, VA | 6 | unaffiliated Christian | College of William and Mary | Inventor,Lawyer, Architect | Colonel of Virginia militia (without real mili... | 1801-1809 | Democratic-Republican | Aaron Burr, George Clinton | 2nd Vice President of USA | ['1807 Embargo Act forbidding foreign trade in... | ['1805 Peace Treaty with Tripoli. Piracy stopp... | ['1801-05 Naval operation against Tripoli and ... | ['1803 The Louisiana purchase', '1804 12th Ame... | Probably the most intelligent man ever to occ... |
3 | James Madison | 1751-03-16 | 7051 days 00:00:00 | 1836-06-28 | 31129 days | 1809-03-04 | 21158 days | 24078 days | 1817-03-04 | 4.0 | James Madison | 163 | 64.0 | 55.3 | 122 | 20.8 | Normal | Port Conway | Virginia | 85.0 | Pisces | 57 | 65.0 | Democratic-Republican | 160.0 | James Madison | Port Conway, VA | 0 | Episcopalian | Princeton | Plantation Owner, Lawyer | Colonel of Virginia militia (without real mili... | 1809-1817 | Democratic-Republican | George Clinton, Elbridge Gerry | Secretary of State | [' The first U.S. protective tariff was impose... | ['1814 The Treaty of Ghent ends the War of 1812'] | ['1811 Tippecanoe battle (Harrison vs. Chief T... | ['1811 Cumberland Road construction starts (fi... | His leadership in the War of 1812 was particul... |
4 | James Monroe | 1758-04-28 | 2312 days 00:00:00 | 1831-07-04 | 26712 days | 1817-03-04 | 21480 days | 24400 days | 1825-03-04 | 5.0 | James Monroe | 183 | 72.0 | 85.7 | 189 | 25.6 | Overweight | Monroe Hall | Virginia | 73.0 | Taurus | 58 | 66.0 | Democratic-Republican | 139.0 | James Monroe | Monroe Hall, VA | 2 | Episcopalian | College of William and Mary | Plantation Owner, Lawyer | Major of the Continental Army | 1817-1825 | Democratic-Republican | Daniel Tompkins | Secretary of War | ['1819 Panic of 1819 (too much land speculatio... | ['1823 Monroe Doctrine', '1818 49th parallel s... | ['1817 1st Seminole war against Seminole India... | ['1819 Florida ceded to US', "1820 Missouri Co... | His presidency contributed to national defense... |
# Merge pres_df_1, pres_df_3, and pres_df_4 based on the column 'President'
pres_merged_df = pres_df_1.merge(pres_df_4, left_index=True, right_index=True)
pres_merged_df.head()
President_x | Born | Post-presidency timespan | Died | Age | Start Date of presidency | Start Age of presidency | End of Presidency Age | End of Presidency Date | President_y | Birthplace | Children | Religion | Higher Education | Occupation | Military Service | Term | Party | Vice President | Previous Office | Economy | Foreign Affairs | Military Activity | Other Events | Legacy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | George Washington | 1732-02-22 | 1015 days 00:00:00 | 1799-12-14 | 24750 days | 1789-04-30 | 20872 days | 23735 days | 1797-03-04 | George Washington | Pope's Creek, VA | 0 | Episcopalian | None | Plantation Owner, Soldier | Commander-in-Chief of the Continental Army in... | 1789-1797 | Unaffiliated | John Adams | Commander-in-Chief | [' Hamilton established BUS', '1792 Coinage Ac... | ['1793 Neutrality in the France-Britain confli... | ['1794 Whiskey Rebellion'] | ['1791 Bill of Rights', '1792 Post Office foun... | He is universally regarded as one of the great... |
1 | John Adams | 1735-10-30 | 9247 days 00:00:00 | 1826-07-04 | 33097 days | 1797-03-04 | 22390 days | 23850 days | 1801-03-04 | John Adams | Braintree, MA | 5 | Unitarian | Harvard | Lawyer, Farmer | none | 1797-1801 | Federalist | Thomas Jefferson | 1st Vice President of USA | ['1798 Progressive land value tax of up to 1% ... | ['1797 the XYZ Affair: a bribe of French agent... | ['1798–1800 The Quasi war. Undeclared naval wa... | ['1798 Alien & Sedition Act to silence critics... | One of the most experienced men ever to become... |
2 | Thomas Jefferson | 1743-04-13 | 6327 days 00:00:00 | 1826-07-04 | 30377 days | 1801-03-04 | 21130 days | 24050 days | 1809-03-04 | Thomas Jefferson | Goochland County, VA | 6 | unaffiliated Christian | College of William and Mary | Inventor,Lawyer, Architect | Colonel of Virginia militia (without real mili... | 1801-1809 | Democratic-Republican | Aaron Burr, George Clinton | 2nd Vice President of USA | ['1807 Embargo Act forbidding foreign trade in... | ['1805 Peace Treaty with Tripoli. Piracy stopp... | ['1801-05 Naval operation against Tripoli and ... | ['1803 The Louisiana purchase', '1804 12th Ame... | Probably the most intelligent man ever to occ... |
3 | James Madison | 1751-03-16 | 7051 days 00:00:00 | 1836-06-28 | 31129 days | 1809-03-04 | 21158 days | 24078 days | 1817-03-04 | James Madison | Port Conway, VA | 0 | Episcopalian | Princeton | Plantation Owner, Lawyer | Colonel of Virginia militia (without real mili... | 1809-1817 | Democratic-Republican | George Clinton, Elbridge Gerry | Secretary of State | [' The first U.S. protective tariff was impose... | ['1814 The Treaty of Ghent ends the War of 1812'] | ['1811 Tippecanoe battle (Harrison vs. Chief T... | ['1811 Cumberland Road construction starts (fi... | His leadership in the War of 1812 was particul... |
4 | James Monroe | 1758-04-28 | 2312 days 00:00:00 | 1831-07-04 | 26712 days | 1817-03-04 | 21480 days | 24400 days | 1825-03-04 | James Monroe | Monroe Hall, VA | 2 | Episcopalian | College of William and Mary | Plantation Owner, Lawyer | Major of the Continental Army | 1817-1825 | Democratic-Republican | Daniel Tompkins | Secretary of War | ['1819 Panic of 1819 (too much land speculatio... | ['1823 Monroe Doctrine', '1818 49th parallel s... | ['1817 1st Seminole war against Seminole India... | ['1819 Florida ceded to US', "1820 Missouri Co... | His presidency contributed to national defense... |
Data Collection: We successfully collected data from multiple data sets containing information about presidents, first ladies, and historical events.
Data Cleaning: We performed extensive data cleaning to handle missing values, correct data types, and standardize the format of names, dates, and time spans across the data sets.
Data Integration: We integrated multiple data sets based on their indices, ensuring that the information is correctly aligned for each president.
Standardization: We standardized the column names across all data sets, making it easier to analyze and compare the data.
Time Span Conversion: We converted time spans expressed in years and days to Pandas timedelta data type, enabling better analysis of presidency durations and ages.
Problems Solved:
Handling Missing Data: We dealt with missing data in various columns using appropriate methods like dropping or filling missing values.
Data Type Conversion: We converted columns to their correct data types, such as converting date-related columns to datetime data type and time spans to timedelta data type.
Name Formatting: We standardized the format of names in the data sets, making them consistent and easy to read.
Data Alignment: We merged data sets based on their indices to ensure that the information is accurately aligned for each president.
Data Exclusion: We excluded irrelevant data sets and columns to focus on relevant information for analysis.
Overall, the data cleaning and integration process have resulted in a unified and organized data set, ready for further analysis and exploration. The standardized data will enable us to gain valuable insights into historical presidencies, their events, and the context surrounding them.
This interactive data visualization showcases the birthplaces of U.S. presidents, categorized by their political parties. The graph is designed to provide a clear visual representation of where each president was born and their party affiliation.
Scatter Points: Each president is represented by a scatter point on the graph, placed according to their birth date (x-axis) and their name (y-axis).
Colors: Presidents belonging to the same political party are assigned the same color. The color key on the right side of the graph helps identify each political party.
Hover Information: When hovering over a scatter point, the reader is presented with a tooltip that provides detailed information about the president, including their name, political party, and date of birth.
The graph allows readers to easily observe clusters of presidents born in similar time periods. These clusters might represent specific political eras or historical events.
The distribution of presidents across different political parties can be easily compared. For example, the dominance of certain parties during specific time periods may become apparent.
Readers can identify presidents with unusual birthplaces or any interesting patterns related to their birth locations.
The dark theme background provides an aesthetically pleasing appearance and ensures that the graph is visually engaging.
The reader can use the hover feature to obtain detailed information about each president by simply moving the cursor over the scatter points.
The legend on the right side of the graph can be used to highlight or hide specific political parties, allowing for a clearer view of each party's representation in different time periods.
Overall, this data visualization offers an intuitive and visually appealing way to explore the birthplaces of U.S. presidents and understand their distribution across different political parties. The interactive nature of the graph encourages readers to discover interesting patterns and insights based on their own exploration.
import plotly.graph_objects as go
import pandas as pd
# Define colors for each political_party with a dark theme
colors = {
'Democratic': '#00BFFF',
'Republican': '#FF6347',
'Whig': '#00FF00',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'National Union': '#00CED1',
'Unaffiliated': '#A9A9A9'
}
# Create a scatter trace for each political_party
scatter_traces = []
for index, row in pres_merged_df.iterrows():
party = row['Party']
scatter_trace = go.Scatter(
x=[row['Born']],
y=[row['President_x']],
mode='markers',
marker=dict(color=colors[party], size=10, opacity=0.8),
name=party,
hovertext=f'President: {row["President_x"]}<br>Party: {party}<br>Born: {row["Born"]}'
)
scatter_traces.append(scatter_trace)
# Create the figure layout with a dark background
layout = go.Layout(
title='Presidents and Their Birthplaces by Political Party',
xaxis=dict(title='Date of Birth'),
yaxis=dict(title='President'),
hovermode='closest',
showlegend=True,
template='plotly_dark', # Use the dark theme template
)
# Create the figure and add the traces and layout
fig = go.Figure(data=scatter_traces, layout=layout)
# Increase the plot size
fig.update_layout(width=1400, height=1400)
fig.update_traces(text=hover_text, hoverinfo='text')
# Show the plot
fig.show()
The goal of this interactive choropleth map is to visualize the birthplaces of U.S. Presidents and their political affiliations. Each state in the United States is colored based on the political party of the presidents born in that state. The color scale represents different political parties, and each president's birthplace is marked with a scatter point of the corresponding color.
By examining this map, we can gain insights into the distribution of U.S. Presidents' birthplaces and how they correlate with political party affiliations. The map provides an engaging way to explore historical data and understand the geographical patterns associated with past Presidents' backgrounds.
Hovering over each scatter point will reveal detailed information about the President's name, birthplace, birth date, and political party. The colors on the map will help us identify states that have been the birthplace of Presidents from various political parties, ranging from Unaffiliated, Federalist, Democratic-Republican, Democratic, to Republican.
This visualization aims to offer a comprehensive view of U.S. Presidential history, allowing users to explore and analyze the geographic and political aspects of the nation's leadership over time.
import plotly.graph_objects as go
import pandas as pd
from geopy.geocoders import Nominatim
# Geocode the birthplaces to get latitude and longitude
geolocator = Nominatim(user_agent='presidents_map')
pres_merged_df['location'] = pres_merged_df['Birthplace'].apply(geolocator.geocode)
pres_merged_df['latitude'] = pres_merged_df['location'].apply(lambda loc: loc.latitude if loc else None)
pres_merged_df['longitude'] = pres_merged_df['location'].apply(lambda loc: loc.longitude if loc else None)
# Define colors for each political_party
colors = {
'Unaffiliated': '#A9A9A9',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'Democratic': '#00BFFF',
'Republican': '#FF6347',
}
# Create a scatter trace for each president
scatter_points = []
for _, row in pres_merged_df.iterrows():
scatter_point = go.Scattergeo(
locationmode='USA-states',
lon=[row['longitude']],
lat=[row['latitude']],
text=f'President: {row["President_x"]}<br>Born: {row["Born"]}<br>Birthplace: {row["Birthplace"]}<br>Party: {row["Party"]}',
marker=dict(
size=10,
color=colors.get(row['Party'], 'gray'),
opacity=0.8,
line=dict(width=1, color='white')
),
name=row['Party'],
hoverinfo='text'
)
scatter_points.append(scatter_point)
# Create the layout for the map with a larger size
layout = go.Layout(
title='Birthplaces of U.S. Presidents by Political Party',
geo=dict(
scope='usa',
projection=dict(type='albers usa'),
showland=True,
landcolor='rgb(250, 250, 250)',
subunitcolor='rgb(217, 217, 217)',
countrycolor='rgb(217, 217, 217)',
showlakes=True,
lakecolor='rgb(255, 255, 255)',
showsubunits=True,
showcountries=True,
resolution=50,
lonaxis=dict(range=[-130, -60]),
lataxis=dict(range=[20, 50])
),
width=1400, # Set the width of the map to 1000 pixels
height=800, # Set the height of the map to 800 pixels
)
# Create the figure and add the scatter points and layout
fig = go.Figure(data=scatter_points, layout=layout)
# Show the interactive map
fig.show()
The goal of this interactive bar chart is to showcase the post-presidency timespan of U.S. Presidents in years. Each bar represents a President, and its length corresponds to the number of years they lived after the end of their presidency. The bars are color-coded to differentiate different categories of post-presidency timespans.
The light red bars indicate the post-presidency timespan for Presidents who lived after their presidency, and the dark red bars represent Presidents who passed away while in office (Died in Office). The bars without a specific color correspond to living Presidents, as their post-presidency timespan is still ongoing.
This visualization allows us to compare the post-presidency longevity of different U.S. Presidents and gain insights into their life spans after serving as the nation's leaders. By exploring the bar chart, we can identify Presidents who had relatively long post-presidency lives and those who passed away shortly after their time in office.
Hovering over each bar will provide additional details, including the name of the President and the exact duration of their post-presidency timespan in years. This interactive visualization offers a comprehensive view of the post-presidency periods of U.S. Presidents, providing a fascinating perspective on the historical context of presidential life spans.
import plotly.graph_objects as go
import pandas as pd
# Function to convert a value to timedelta or None for special cases
def convert_to_timedelta(value):
if pd.notna(value) and 'days' in str(value):
return pd.to_timedelta(value)
return None
# Convert the "Post-presidency timespan" column to timedeltas where possible
pres_merged_df['Post-presidency timespan'] = pres_merged_df['Post-presidency timespan'].apply(convert_to_timedelta)
# Calculate the number of days in the "Post-presidency timespan" column
pres_merged_df['Post-presidency days'] = pres_merged_df['Post-presidency timespan'].dt.days
# Convert the number of days to years
pres_merged_df['Post-presidency years'] = pres_merged_df['Post-presidency days'] / 365.25 # Account for leap years
# Sort the DataFrame by "Post-presidency days" in descending order
pres_merged_df.sort_values(by='Post-presidency days', ascending=False, inplace=True)
# Create a function to define the color of the bars based on the value
def get_bar_color(value):
if pd.notna(value):
if isinstance(value, pd.Timedelta):
return 'lightcoral' # Light red for numeric values
else:
return 'red' # Red for "Died in Office"
else:
return 'lightgray' # Gray for NaN (still living)
# Create the bar chart
fig = go.Figure()
fig.add_trace(go.Bar(
x=pres_merged_df['President_x'],
y=pres_merged_df['Post-presidency years'],
marker=dict(color=[get_bar_color(value) for value in pres_merged_df['Post-presidency timespan']]),
))
# Customize the layout
fig.update_layout(
title='Post-presidency Timespan of U.S. Presidents (in Years)',
xaxis_title='President',
yaxis_title='Years',
hovermode='y',
width=1200, # Set the width of the chart to 1200 pixels
height=600, # Set the height of the chart to 600 pixels
)
# Hide the numbers on the bars
fig.update_traces(texttemplate=None, textposition='outside')
# Rotate the x-axis labels for better readability
fig.update_xaxes(tickangle=45)
# Show the interactive bar chart
fig.show()
The goal of this interactive bar chart is to visualize the ages of U.S. Presidents based on their political party affiliations. The chart allows users to explore and compare the ages at which Presidents from different parties passed away. The x-axis represents the names of the Presidents, while the y-axis shows their ages at the time of their death.
To use this interactive chart, simply select a political party from the dropdown menu. The chart will then display the ages of the Presidents belonging to the chosen party, sorted from oldest to youngest.
Each bar in the chart corresponds to a U.S. President, and its color is determined by the political party of the President. The legend on the right-hand side of the chart provides the color codes for each political party. For example, Unaffiliated Presidents are represented in light grey, Federalist in purple, Democratic-Republican in gold, Democratic in blue, and Republican in red.
By hovering the mouse cursor over each bar, users can view additional details, including the President's name and exact age at the time of death.
This visualization provides a dynamic and user-friendly way to explore historical data and understand the distribution of ages among U.S. Presidents across different political parties. It enables users to gain insights into the life spans of Presidents within their respective parties and discover any potential patterns or trends related to their ages.
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Born', 'Died', and 'Party'
# Calculate the age of each president and create a new column 'Age'
pres_merged_df['Born'] = pd.to_datetime(pres_merged_df['Born'])
pres_merged_df['Died'] = pd.to_datetime(pres_merged_df['Died'])
pres_merged_df['Age'] = (pres_merged_df['Died'] - pres_merged_df['Born']).dt.days / 365
colors = {
'Unaffiliated': '#A9A9A9',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'Democratic': '#00BFFF',
'Republican': '#FF6347',
}
# Function to plot the bar chart
def plot_age_by_party(selected_party):
filtered_df = pres_merged_df[pres_merged_df['Party'] == selected_party]
sorted_df = filtered_df.sort_values(by='Age', ascending=False)
fig = go.Figure()
fig.add_trace(go.Bar(
x=sorted_df['President_x'],
y=sorted_df['Age'],
marker=dict(color=colors.get(selected_party, '#808080')) # Use grey if color not found
))
fig.update_layout(
xaxis_tickangle=-45,
title=f'Age of Presidents in {selected_party} Party (Sorted)',
xaxis_title='President',
yaxis_title='Age at Death',
hovermode='x'
)
fig.show()
# Get unique political parties from the DataFrame
parties = pres_merged_df['Party'].unique()
# Create an interactive dropdown menu to select the party
interact(plot_age_by_party, selected_party=widgets.Dropdown(options=parties));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
The interactive horizontal bar chart visualizes the ages of U.S. Presidents at the beginning of their presidential terms. Each bar represents a President, and they are arranged in ascending order from the youngest to the oldest at the time they started their presidency. The chart is color-coded based on the political party to which each President belonged, making it easy to distinguish between parties.
To create this visualization, we utilized data from a DataFrame containing information about U.S. Presidents, including their birth dates ('Born') and the start dates of their presidencies ('Start Date of presidency'). By calculating the age at the beginning of each President's term, we were able to sort the data and construct the horizontal bar chart.
import pandas as pd
import plotly.express as px
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Born', 'Start Date of presidency', and 'Party'
# Calculate the age of each president when they started their presidency
pres_merged_df['Age_at_Start'] = (pres_merged_df['Start Date of presidency'] - pres_merged_df['Born']).dt.days / 365
# Sort the DataFrame based on the age at the start of the presidency in ascending order
sorted_df = pres_merged_df.sort_values(by='Age_at_Start', ascending=False)
# Define colors for each political party
colors = {
'Unaffiliated': '#A9A9A9',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'Democratic': '#00BFFF',
'Republican': '#FF6347',
}
# Create the horizontal bar chart using Plotly Express
fig = px.bar(
sorted_df,
x='Age_at_Start',
y='President_x',
color='Party',
title='Age of U.S. Presidents at the Start of Presidency (Sorted)',
labels={'Age_at_Start': 'Age at Start of Presidency', 'President_x': 'President'},
color_discrete_map=colors,
)
# Customize the appearance of the chart
fig.update_layout(yaxis_title=None, xaxis_tickangle=-45, hovermode='y', plot_bgcolor='rgba(0, 0, 0, 0)',height = 1400)
# Show the chart
fig.show()
The pie chart displays the distribution of the number of children among U.S. Presidents. Each slice of the pie represents a specific number of children, ranging from 0 to 8. The chart is designed with a dark background using the 'plotly_dark' template, providing a visually appealing contrast to the colorful segments.
The colors in the pie chart are carefully selected to ensure clarity and visual appeal. Each color corresponds to a different number of children, allowing for easy interpretation of the data. Additionally, the names of the categories are displayed outside the chart, providing a clear understanding of the number of children associated with each slice.
The pie chart is an intuitive and concise way to visualize the distribution of children among U.S. Presidents, making it easier to identify the most and least common number of children among the leaders of the nation.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'
# Create the pie chart data
children_counts = pres_merged_df['Children'].value_counts()
# Function to generate custom text labels for the pie chart
def custom_label(val):
if val == 1:
return f'{val} child'
else:
return f'{val} children'
# Apply the custom text labels to the index of children_counts
labels_with_children = children_counts.index.map(custom_label)
# Create the pie chart trace
pie_chart_trace = go.Pie(
labels=labels_with_children,
values=children_counts.values,
textinfo='percent+label',
textfont=dict(size=12, color='white'), # Set text color to white for better visibility
)
# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
title='Number of Children of U.S. Presidents',
template='plotly_dark', # Set the plot template to a dark theme
)
# Create the figure and display the pie chart
fig = go.Figure(data=[pie_chart_trace], layout=layout)
# Customize the appearance of the chart
fig.update_layout(height = 800)
fig.show()
The bar chart illustrates the number of children for each U.S. President. The presidents are sorted based on the number of children they have, with the youngest having the fewest children and the oldest having the most. The bars are presented in a horizontal orientation, allowing for a straightforward comparison between the number of children for each president.
To enhance visibility and readability, the names of the presidents are rotated at an angle of -45 degrees along the x-axis. This arrangement ensures that the names are legible even when there are numerous data points.
The chart is set against a dark background using the 'plotly_dark' template, which not only provides an appealing visual design but also ensures that the bar chart stands out with contrasting colors. Each bar's color corresponds to a specific political party that the president is associated with, making it easier to identify the party affiliation of each leader.
By examining the bar chart, readers can quickly grasp the relationship between the age of U.S. Presidents and the number of children they have, providing valuable insights into the family lives of the country's leaders.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'
# Sort the DataFrame by the number of children in ascending order
sorted_df = pres_merged_df # .sort_values(by='Children', ascending=True)
# Create the bar chart trace
bar_chart_trace = go.Bar(
x=sorted_df['President_x'],
y=sorted_df['Children'],
text=sorted_df['Children'],
textposition='outside', # Display the number of children above the bars
textfont=dict(size=12, color='white'), # Set text color to white for better visibility
marker=dict(color='dodgerblue'), # Set the color of the bars
)
# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
title='Number of Children of U.S. Presidents',
xaxis=dict(title='President', tickangle=-45), # Rotate tick labels by -45 degrees
yaxis=dict(title='Number of Children'),
template='plotly_dark', # Set the plot template to a dark theme
)
# Create the figure and display the bar chart
fig = go.Figure(data=[bar_chart_trace], layout=layout)
# Customize the appearance of the chart
fig.update_layout(height = 600)
fig.show()
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'
# Sort the DataFrame by the number of children in ascending order
sorted_df = pres_merged_df.sort_values(by='Children', ascending=True)
# Create the bar chart trace
bar_chart_trace = go.Bar(
x=sorted_df['President_x'],
y=sorted_df['Children'],
text=sorted_df['Children'],
textposition='outside', # Display the number of children above the bars
textfont=dict(size=12, color='white'), # Set text color to white for better visibility
marker=dict(color='dodgerblue'), # Set the color of the bars
)
# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
title='Number of Children of U.S. Presidents',
xaxis=dict(title='President', tickangle=-45), # Rotate tick labels by -45 degrees
yaxis=dict(title='Number of Children'),
template='plotly_dark', # Set the plot template to a dark theme
)
# Create the figure and display the bar chart
fig = go.Figure(data=[bar_chart_trace], layout=layout)
# Customize the appearance of the chart
fig.update_layout(height = 600)
fig.show()
The bar chart displays U.S. Presidents grouped by their religious affiliations and further categorized by political parties. Each bar represents a different President, with the length of the bar indicating the President's name and the color representing their political party.
The chart offers interactivity, allowing users to select a specific religion from the dropdown menu to view the Presidents who followed that particular religion and their corresponding political parties. The chart uses a dark background theme and custom colors for each party, ensuring a visually engaging experience.
The x-axis of the bar chart shows the political parties, and the y-axis displays the names of the Presidents. The chart is oriented horizontally to allow for better readability of the President's names on the bars. Hovering over each bar reveals additional information about the President's party affiliation and religion, providing valuable insights into the religious diversity among U.S. Presidents over time.
import pandas as pd
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Religion'
# Function to display the list of presidents following a specific religion
def presidents_by_religion(selected_religion):
presidents_list = pres_merged_df[pres_merged_df['Religion'] == selected_religion]['President_x'].tolist()
if len(presidents_list) > 0:
presidents = ', '.join(presidents_list)
print(f"The U.S. Presidents following {selected_religion} are: {presidents}.")
else:
print(f"No U.S. Presidents were found following {selected_religion}.")
# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()
# Create an interactive dropdown menu to select the religion
interact(presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Religion', and 'Party'
# Function to plot the bar chart
def plot_presidents_by_religion(selected_religion):
filtered_df = pres_merged_df[pres_merged_df['Religion'] == selected_religion]
fig = px.bar(
filtered_df,
x='President_x',
y='Party',
title=f'U.S. Presidents Following {selected_religion}',
labels={'President_x': 'President', 'Party': 'Political Party'},
orientation='h', # Horizontal bar chart
template='plotly_dark', # Set the plot template to a dark theme
color='Party', # Color based on the party
color_discrete_map={
'Unaffiliated': '#A9A9A9',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'Democratic': '#00BFFF',
'Republican': '#FF6347',
'National Union': 'gray',
'Whig':'orange'
} # Custom colors for each party
)
fig.update_layout(xaxis_tickangle=-45, hovermode='x')
fig.show()
# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()
# Create an interactive dropdown menu to select the religion
interact(plot_presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Religion', and 'Party'
# Function to plot the bar chart
def plot_presidents_by_religion(selected_religion):
filtered_df = pres_merged_df[pres_merged_df['Religion'] == selected_religion]
fig = px.bar(
filtered_df,
x='Party',
y='President_x',
title=f'U.S. Presidents Following {selected_religion}',
labels={'Party': 'Political Party', 'President_x': 'President'},
orientation='h', # Horizontal bar chart
template='plotly_dark', # Set the plot template to a dark theme
color='Party', # Color based on the party
color_discrete_map={
'Unaffiliated': '#A9A9A9',
'Federalist': '#9932CC',
'Democratic-Republican': '#FFD700',
'Democratic': '#00BFFF',
'Republican': '#FF6347',
'National Union': 'gray',
'Whig':'orange'
} # Custom colors for each party
)
fig.update_layout(xaxis_tickangle=-45, hovermode='x')
fig.show()
# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()
# Create an interactive dropdown menu to select the religion
interact(plot_presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
The pie chart displays the distribution of U.S. Presidents based on their religious affiliations. Each segment of the pie represents a different religion, and the size of each segment corresponds to the number of U.S. Presidents who follow that particular religion. The chart provides an interactive experience, allowing users to select a specific religion from the dropdown menu to see its representation among the Presidents.
The chart uses a dark background theme for a visually appealing look. Each segment in the chart is labeled with the respective religion and the percentage of Presidents belonging to that religion. The custom labels help users quickly understand the religious makeup of the U.S. Presidents throughout history.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Religion'
# Create the pie chart data
religion_counts = pres_merged_df['Religion'].value_counts()
# Function to generate custom text labels for the pie chart
def custom_label(val):
if val == 1:
return f'{val} religion'
else:
return f'{val} religions'
# Apply the custom text labels to the index of religion_counts
labels_with_religions = religion_counts.index.map(custom_label)
# Create the pie chart trace
pie_chart_trace = go.Pie(
labels=labels_with_religions,
values=religion_counts.values,
textinfo='percent+label',
textfont=dict(size=12, color='white'), # Set text color to white for better visibility
)
# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
title='Religions of U.S. Presidents',
template='plotly_dark', # Set the plot template to a dark theme
)
# Create the figure and display the pie chart
fig = go.Figure(data=[pie_chart_trace], layout=layout)
# Customize the appearance of the chart
fig.update_layout(height=800)
fig.show()
This interactive tool allows you to explore the higher education background of U.S. Presidents in three different visualizations:
The word cloud represents the distribution of higher education institutions attended by U.S. Presidents. Each institution's name appears in the word cloud, with font size indicating the frequency of occurrence. Larger font size indicates that more Presidents attended that particular institution.
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'
# Concatenate all the 'Higher Education' values into a single string
education_text = ' '.join(pres_merged_df['Higher Education'].dropna())
# Create the Word Cloud with custom settings
wordcloud = WordCloud(
width=1200,
height=600,
background_color='black',
colormap='viridis', # Choose a color map for the Word Cloud
contour_color='steelblue', # Set contour color for better visibility
contour_width=2, # Set contour width
max_words=150, # Set the maximum number of words in the Word Cloud
prefer_horizontal=0.8, # Set the ratio of horizontal to vertical words
).generate(education_text)
# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Higher Education of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()
The Venn diagram presents a comparison between Presidents who attended Ivy League institutions and those who attended Non-Ivy League institutions. The overlapping area shows Presidents who attended both types of institutions, and the non-overlapping areas show exclusive groups.
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib_venn import venn2
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Higher Education'])
# Separate the educational institutions into two groups: Ivy League and Non-Ivy League
ivy_league = ['Harvard', 'Yale', 'Princeton', 'Columbia', 'Brown', 'Dartmouth', 'Cornell', 'University of Pennsylvania']
non_ivy_league = [edu for edu in pres_merged_df['Higher Education'].values if edu not in ivy_league]
# Create the Venn Diagram
plt.figure(figsize=(10, 8)) # Set the figure size to make the Venn Diagram bigger
venn2(subsets=(set(ivy_league), set(non_ivy_league)), set_labels=('Ivy League', 'Non-Ivy League'))
# Add title and legend
plt.title('Educational Background of U.S. Presidents - Ivy League vs. Non-Ivy League', fontsize=16)
plt.legend(['Ivy League', 'Non-Ivy League'], fontsize=14)
# Display the Venn Diagram
plt.show()
Using the interactive tool, you can select a U.S. President from the dropdown menu to discover their specific higher education background. Once selected, the tool will display the President's name along with their higher education institution in a visually appealing plot.
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Higher Education'])
# Create a dictionary to map presidents to their higher education values
president_higher_edu_dict = dict(zip(pres_merged_df['President_x'], pres_merged_df['Higher Education']))
# Function to plot the higher education value for a specific president
def plot_higher_education(president_name):
higher_education = president_higher_edu_dict.get(president_name, 'Higher education data not available')
# Create the plot
fig = go.Figure()
# Add a text annotation to display the higher education value
fig.add_annotation(
text=f"Higher Education: <span style='color: #2b9434;'>{higher_education}</span>",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font=dict(size=16),
align='center',
)
# Update layout for better appearance
fig.update_layout(
title=f"Higher Education of U.S. President - {president_name}",
xaxis=dict(visible=False),
yaxis=dict(visible=False),
width=1400,
height=400,
template='plotly_dark', # Use dark theme
margin=dict(t=100),
)
# Show the plot
fig.show()
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(plot_higher_education, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
The Word Cloud visually presents the most frequent occupations held by U.S. Presidents throughout history. Each occupation's size within the cloud is proportional to its frequency, offering an immediate glimpse of the dominant professions among Presidents.
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with column 'Occupation' and other relevant columns
# Split the values in the 'Occupation' column and create a new DataFrame with all the occupations
occupations_df = pres_merged_df['Occupation'].str.split(', ', expand=True)
# Flatten the DataFrame into a single Series and get the value counts
occupation_counts = occupations_df.melt(value_name='Occupation').groupby('Occupation').size()
# Sort the occupations by frequency in descending order
top_occupations = occupation_counts.sort_values(ascending=False).head(4)
# Create the Word Cloud with custom settings
wordcloud = WordCloud(
width=1200,
height=600,
background_color='black',
colormap='viridis', # Choose a color map for the Word Cloud
contour_color='steelblue', # Set contour color for better visibility
contour_width=2, # Set contour width
max_words=150, # Set the maximum number of words in the Word Cloud
prefer_horizontal=0.8, # Set the ratio of horizontal to vertical words
).generate_from_frequencies(top_occupations)
# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Top 4 Occupations of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()
The Bar Chart displays the distribution of occupations among U.S. Presidents, extracted by splitting the values in the 'Occupation' column. Each occupation is represented as a bar, and its height corresponds to its frequency. The chart allows us to observe the range of professions and understand which ones have been more prevalent throughout history.
import pandas as pd
import plotly.express as px
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with the column 'Occupation' and other relevant columns
# Split the values in the 'Occupation' column and create a new DataFrame with all the occupations
occupations_df = pres_merged_df['Occupation'].str.split(', ', expand=True)
# Create a list of presidents for each occupation
presidents_by_occupation = {}
for col in occupations_df.columns:
for index, president in enumerate(occupations_df[col].dropna()):
if president not in presidents_by_occupation:
presidents_by_occupation[president] = []
presidents_by_occupation[president].append(pres_merged_df['President_x'].iloc[index])
# Get the frequency of each occupation
occupation_counts = occupations_df.melt(value_name='Occupation').groupby('Occupation').size()
# Sort the occupations by frequency in descending order
occupation_counts = occupation_counts.sort_values(ascending=False)
# Create the bar chart
fig = px.bar(
x=occupation_counts.index,
y=occupation_counts.values,
labels={'x': 'Occupation', 'y': 'Frequency'},
title='Frequency of Jobs of U.S. Presidents',
color=occupation_counts.index, # Use different colors for each occupation
hover_name=occupation_counts.index, # Show occupation names in the tooltip
hover_data={"Presidents": [", ".join(presidents_by_occupation[occupation]) for occupation in occupation_counts.index]},
)
# Customize the appearance of the chart
fig.update_layout(
xaxis_title="Occupation",
yaxis_title="Frequency",
xaxis_tickangle=-45,
hoverlabel=dict(bgcolor="white", font_size=12),
template='plotly_dark', # Set the plot template to a dark theme
height = 600,
)
# Display the interactive plot
fig.show()
With the Interactive Tool, you can select a U.S. President from the dropdown menu and discover their respective occupation. The tool offers an engaging way to explore individual Presidents' backgrounds and learn more about their professions before entering politics.
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Occupation'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Occupation'])
# Create a dictionary to map presidents to their occupations
president_occupation_dict = dict(zip(pres_merged_df['President_x'], pres_merged_df['Occupation']))
# Function to plot the occupation value for a specific president
def plot_occupation(president_name):
occupation = president_occupation_dict.get(president_name, 'Occupation data not available')
# Create the plot
fig = go.Figure()
# Add a text annotation to display the occupation value
fig.add_annotation(
text=f"Occupation: <span style='color: #2b9434;'>{occupation}</span>",
xref="paper",
yref="paper",
x=0.5,
y=0.5,
showarrow=False,
font=dict(size=16),
align='center',
)
# Update layout for better appearance
fig.update_layout(
title=f"Occupation of U.S. President - {president_name}",
xaxis=dict(visible=False),
yaxis=dict(visible=False),
width=1400,
height=400,
template='plotly_dark', # Use dark theme
margin=dict(t=100),
)
# Show the plot
fig.show()
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(plot_occupation, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
This interactive tool lets you choose a political party from the dropdown menu and explores the most frequent occupations associated with that party's Presidents. By analyzing different parties' dominant professions, you can gain insights into how political ideologies may influence occupational choices among U.S. Presidents.
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Party', and 'Occupation'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Occupation', 'Party'])
# Function to split occupations and calculate the most frequent occupation for each party
def get_most_frequent_occupation(selected_party):
# Filter DataFrame based on the selected party
party_df = pres_merged_df[pres_merged_df['Party'] == selected_party]
# Split occupations and create a list of all occupation tokens
all_occupations = [occupation.strip() for occupations in party_df['Occupation'] for occupation in occupations.split(',')]
# Count the occurrences of each occupation
occupation_counts = pd.Series(all_occupations).value_counts()
# Get the most frequent occupation
most_frequent_occupation = occupation_counts.index[0]
# Create the bar plot
fig = px.bar(
occupation_counts,
x=occupation_counts.index,
y=occupation_counts.values,
labels={'x': 'Occupation', 'y': 'Frequency'},
title=f"Most Frequent Occupation for {selected_party} Party",
color=occupation_counts.index,
)
# Rotate x-axis labels for better readability
fig.update_layout(xaxis_tickangle=-45)
# Show the plot
fig.show()
# Get unique political parties from the DataFrame
parties = pres_merged_df['Party'].unique()
# Create an interactive dropdown menu to select the party
interact(get_most_frequent_occupation, selected_party=widgets.Dropdown(options=parties));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
Have you ever wondered about the diverse military experiences of U.S. Presidents? In this TreeMap visualization, we delve into the fascinating world of military service among our nation's leaders. The TreeMap offers an innovative and visually captivating way to understand the various military roles that Presidents have undertaken throughout history.
A TreeMap is a unique chart that presents hierarchical data in the form of nested rectangles. Each rectangle's size is proportional to the value of the data it represents, offering an intuitive visualization of the data's distribution.
import pandas as pd
import plotly.express as px
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Military Service'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Military Service'])
# Split the military service data in each row and strip leading/trailing spaces
military_service_split = pres_merged_df['Military Service'].str.split(',').apply(lambda x: [service.strip() for service in x])
# Create a DataFrame to store the data for the TreeMap
data = pd.DataFrame(columns=['Service', 'President', 'Value'])
# Process the data to populate the DataFrame
for i, service_list in enumerate(military_service_split):
for service in service_list:
if service:
data = data.append({'Service': service, 'President': pres_merged_df.iloc[i]['President_x'], 'Value': 1}, ignore_index=True)
# Create the TreeMap
fig = px.treemap(data, path=['Service', 'President'], values='Value')
# Update layout for better appearance
fig.update_layout(
title="Military Service of U.S. Presidents - TreeMap",
margin=dict(t=100),
template='plotly_dark', # Use dark theme
height = 600,
)
# Show the TreeMap
fig.show()
This Word Cloud depicts the most frequent previous offices held by U.S. Presidents before assuming the presidency. The size of each office in the Word Cloud is determined by its frequency in the dataset. A black background is used to provide a striking contrast to the vibrant colors of the Word Cloud. The Word Cloud offers a quick and intuitive understanding of the diverse career paths of U.S. Presidents before their presidency.
import pandas as pd
import nltk
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'
# Download the 'punkt' tokenizer
nltk.download('punkt')
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Previous Office'])
# Concatenate all the 'Previous Office' values into a single string
previous_office_text = ' '.join(pres_merged_df['Previous Office'])
# Split the text into individual words
words = nltk.word_tokenize(previous_office_text)
# Calculate the frequency of each word
word_freq = nltk.FreqDist(words)
# Get the most frequent words and their frequencies
most_common_words = word_freq.most_common(10)
# Create a dictionary to hold the most frequent words and their frequencies
wordcloud_data = dict(most_common_words)
# Create the Word Cloud with custom settings
wordcloud = WordCloud(
width=1200,
height=600,
background_color='black', # Set the background color to black
colormap='viridis', # Choose a color map for the Word Cloud
contour_color='white', # Set contour color for better visibility
contour_width=2, # Set contour width
max_words=150, # Set the maximum number of words in the Word Cloud
prefer_horizontal=0.8, # Set the ratio of horizontal to vertical words
).generate_from_frequencies(wordcloud_data)
# Display the Word Cloud using matplotlib
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Most Frequent Previous Offices of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()
[nltk_data] Downloading package punkt to [nltk_data] C:\Users\User\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date!
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from wordcloud import WordCloud
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'
# Concatenate all the 'Previous Office' values into a single string
previous_office_text = ' '.join(pres_merged_df['Previous Office'].dropna())
# Read the mask image
mask_img = mpimg.imread(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\images\course1_1\png\fantasy-2506830_1280.jpg')
# Create a WordCloud object with the custom shape mask
wordcloud = WordCloud(
width=1200,
height=600,
background_color='black', # Set the background color to black
colormap='tab20c', # Choose a custom color map for the Word Cloud
contour_color='white', # Set contour color for better visibility
contour_width=2, # Set contour width
max_words=150, # Set the maximum number of words in the Word Cloud
prefer_horizontal=0.8, # Set the ratio of horizontal to vertical words
mask=mask_img, # Use the custom mask for the Word Cloud
).generate(previous_office_text)
# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Previous Offices of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()
The Sunburst Chart represents the political parties of U.S. Presidents throughout history. Each layer of the Sunburst Chart represents a hierarchical structure, with the outermost layer showing the parties and the inner layers displaying sub-parties. The interactive nature of the chart allows readers to explore and gain insights into the complex political affiliations of U.S. Presidents over time. The chart is designed with a dark theme and eye-catching colors, making it visually engaging and user-friendly.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'
# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Previous Office'])
# Create a DataFrame to store the frequency of each office
office_counts = pres_merged_df['Previous Office'].value_counts().reset_index()
office_counts.columns = ['Previous Office', 'Count']
# Create a Sunburst Chart
fig = go.Figure(go.Sunburst(
labels=office_counts['Previous Office'],
parents=[''] * len(office_counts), # Empty string as parent for all categories
values=office_counts['Count'],
))
# Update the layout for better appearance
fig.update_layout(
title='Previous Offices of U.S. Presidents - Sunburst Chart',
margin=dict(t=50),
height=800,
uniformtext=dict(minsize=12, mode='hide'),
)
# Show the chart
fig.show()
This interactive tool provides a fascinating way to explore the economy-related data for different U.S. presidents. The tool is designed to analyze and present the 'Economy' column data from the DataFrame called 'pres_merged_df.'
Select a President: Start by using the dropdown menu to choose a president from the list of available names. The dropdown includes all the unique president names present in the dataset.
Explore Economy Data: After selecting a president, the tool will display their respective economy data. It gathers the information from the 'Economy' column in the DataFrame.
Data Availability: In case economy data for a specific president is unavailable, the tool gracefully informs users with a message indicating that the data is not available.
import pandas as pd
from ipywidgets import interact, widgets
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Economy'
# Function to display the economy data for a specific president
def display_economy_data(president_name):
economy_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Economy'].values
if len(economy_data) == 0:
economy_data = ['Economy data not available']
else:
# Split the economy data based on commas and create a list
economy_data = [item.strip() for data in economy_data for item in data.split(',')]
# Create a widget to display the economy data
economy_widget = widgets.HTML(value='<br>'.join(economy_data))
# Set the layout and style of the widget
economy_widget.layout.overflow_x = 'hidden'
economy_widget.layout.max_height = '500px'
economy_widget.layout.overflow_y = 'auto'
economy_widget.layout.border = '2px solid #ccc'
economy_widget.layout.border_radius = '5px'
economy_widget.layout.padding = '10px'
economy_widget.layout.margin = '10px'
economy_widget.layout.background = '#f9f9f9'
# Display the widget
display(economy_widget)
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(display_economy_data, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
In this visualization, we present a captivating Word Cloud that showcases the prominent words related to "Foreign Affairs" during the tenure of U.S. Presidents. The Word Cloud beautifully illustrates the most frequently mentioned terms, with larger and bolder fonts signifying higher occurrence.
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'
# Concatenate all the 'Foreign Affairs' values into a single string
foreign_affairs_text = ' '.join(pres_merged_df['Foreign Affairs'].dropna())
# Create the Word Cloud with custom settings
wordcloud = WordCloud(
width=1200,
height=600,
background_color='white', # Set the background color to white
colormap='tab20', # Choose a custom color map for the Word Cloud
contour_color='black', # Set contour color for better visibility
contour_width=2, # Set contour width
max_words=150, # Set the maximum number of words in the Word Cloud
prefer_horizontal=0.8, # Set the ratio of horizontal to vertical words
).generate(foreign_affairs_text)
# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Foreign Affairs of U.S. Presidents - Word Cloud", fontsize=16, color='black')
plt.show()
The Word Scatter Plot provides an engaging way to explore the textual data on "Foreign Affairs" concerning various U.S. Presidents. Each word is represented as a point in the scatter plot, and its size corresponds to its frequency. The colors on the plot enhance visual appeal and make it easier to distinguish between different words.
import pandas as pd
import plotly.graph_objects as go
import random
from collections import Counter
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')
# Get the 'Foreign Affairs' text
foreign_affairs_text = ' '.join(pres_merged_df['Foreign Affairs'].dropna())
# Preprocess the text (You may need to customize this based on your data)
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def preprocess_text(text):
# Remove punctuation and convert to lowercase
text = text.translate(str.maketrans('', '', string.punctuation)).lower()
# Tokenize the text
words = word_tokenize(text)
# Remove stopwords
stop_words = set(stopwords.words('english'))
words = [word for word in words if word not in stop_words]
return words
# Tokenize and preprocess the text
words = preprocess_text(foreign_affairs_text)
# Calculate word frequencies
word_freq = Counter(words)
# Create a DataFrame to store the word frequencies
word_freq_df = pd.DataFrame(word_freq.items(), columns=['Word', 'Frequency'])
# Sort the DataFrame by frequency in descending order
word_freq_df = word_freq_df.sort_values(by='Frequency', ascending=False)
# Create the Word Scatter Plot
fig = px.scatter(
word_freq_df,
x='Frequency',
y='Frequency',
text='Word',
labels={'Frequency': 'Word Frequency'},
title='Word Scatter Plot - Foreign Affairs',
hover_name='Word',
hover_data={'Frequency': True},
)
# Customize the appearance of the plot
fig.update_traces(textposition='top center', textfont_size=12)
fig.update_layout(height=600)
# Show the plot
fig.show()
[nltk_data] Downloading package stopwords to [nltk_data] C:\Users\User\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date!
This interactive tool enables readers to select a specific U.S. President and view their respective "Foreign Affairs" data. Upon choosing a president from the dropdown menu, a neat list is presented, containing the various matters related to foreign affairs that were documented during their tenure. The list is displayed as an ordered list for clarity, with the numbers indicating the order of occurrence.
import pandas as pd
from ipywidgets import interact, widgets
import re
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'
# Function to display the 'Foreign Affairs' data for a specific president
def display_foreign_affairs_data(president_name):
foreign_affairs_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Foreign Affairs'].values
if len(foreign_affairs_data) == 0:
foreign_affairs_data = ['Foreign Affairs data not available']
else:
# Use regex to extract text between square brackets and split the text based on commas
foreign_affairs_data = [item.strip() for data in foreign_affairs_data for item in re.findall(r'\[([^]]+)\]', data)[0].split(',')]
# Remove empty strings and quotes from the list
foreign_affairs_data = [item.replace("'", "").replace('"', '') for item in foreign_affairs_data if item]
# Create an ordered list of the 'Foreign Affairs' data
list_html = '<ol style="list-style-position: inside;">'
for item in foreign_affairs_data:
list_html += f'<li>{item}</li>'
list_html += '</ol>'
# Create a widget to display the 'Foreign Affairs' data as an ordered list
foreign_affairs_widget = widgets.HTML(value=list_html)
# Set the layout and style of the widget
foreign_affairs_widget.layout.overflow_x = 'hidden'
foreign_affairs_widget.layout.max_height = '500px'
foreign_affairs_widget.layout.overflow_y = 'auto'
foreign_affairs_widget.layout.border = '2px solid #ccc'
foreign_affairs_widget.layout.border_radius = '5px'
foreign_affairs_widget.layout.padding = '10px'
foreign_affairs_widget.layout.margin = '10px'
foreign_affairs_widget.layout.background = '#f9f9f9'
# Display the widget
display(foreign_affairs_widget)
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(display_foreign_affairs_data, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
import pandas as pd
from ipywidgets import interact, widgets, HTML
from ipywidgets.embed import embed_minimal_html
import re
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'
# Function to display the 'Foreign Affairs' data for a specific president
def display_foreign_affairs_data(president_name):
foreign_affairs_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Foreign Affairs'].values
if len(foreign_affairs_data) == 0:
foreign_affairs_data = ['Foreign Affairs data not available']
else:
# Use regex to extract text between square brackets and split the text based on commas
foreign_affairs_data = [item.strip() for data in foreign_affairs_data for item in re.findall(r'\[([^]]+)\]', data)[0].split(',')]
# Remove empty strings and quotes from the list
foreign_affairs_data = [item.replace("'", "").replace('"', '') for item in foreign_affairs_data if item]
# Create an ordered list of the 'Foreign Affairs' data
list_html = '<ol style="list-style-position: inside;">'
for item in foreign_affairs_data:
list_html += f'<li>{item}</li>'
list_html += '</ol>'
# Create a widget to display the 'Foreign Affairs' data as an ordered list
foreign_affairs_widget = widgets.HTML(value=list_html)
# Set the layout and style of the widget
foreign_affairs_widget.layout.overflow_x = 'hidden'
foreign_affairs_widget.layout.max_height = '500px'
foreign_affairs_widget.layout.overflow_y = 'auto'
foreign_affairs_widget.layout.border = '2px solid #ccc'
foreign_affairs_widget.layout.border_radius = '5px'
foreign_affairs_widget.layout.padding = '10px'
foreign_affairs_widget.layout.margin = '10px'
foreign_affairs_widget.layout.background = '#f9f9f9'
# Display the widget
display(foreign_affairs_widget)
# Save the interactive widget to an HTML file
embed_minimal_html('interactive_foreign_affairs_widget.html', views=[foreign_affairs_widget], title='Foreign Affairs Widget')
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(display_foreign_affairs_data, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
Question: What are the key aspects of U.S. presidents' historical data?
This interactive tool allows you to explore various aspects of U.S. presidents' historical data, including their Economy, Foreign Affairs, Military Activity, Other Events, and Legacy. Select a president from the dropdown menu, and you'll get an ordered list of events and activities associated with that president in each category.
Instructions:
import pandas as pd
import ipywidgets as widgets
import re
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Economy', 'Foreign Affairs', 'Military Activity', 'Other Events', and 'Legacy'
# Function to split data by commas and display the data for a specific president
def display_president_data(president_name):
president_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name]
# Function to split data by commas and create an ordered list
def create_ordered_list(data_column):
data_list = president_data[data_column].dropna().apply(lambda x: re.sub(r"[\[\]\"]", "", x)).tolist()
data_list = [item.strip() for data in data_list for item in data.split(',')]
data_list = [re.sub(r"[\"']+", "", item) for item in data_list]
data_list = [f"{i+1}. {item}" for i, item in enumerate(data_list)]
return data_list
# Create ordered lists for each column's data
economy_list = create_ordered_list('Economy')
foreign_affairs_list = create_ordered_list('Foreign Affairs')
military_activity_list = create_ordered_list('Military Activity')
other_events_list = create_ordered_list('Other Events')
legacy_list = create_ordered_list('Legacy')
# Display the data in an interactive widget
economy_widget = widgets.HTML(value=f"<b>Economy:</b><br>{'<br>'.join(economy_list)}")
foreign_affairs_widget = widgets.HTML(value=f"<b>Foreign Affairs:</b><br>{'<br>'.join(foreign_affairs_list)}")
military_activity_widget = widgets.HTML(value=f"<b>Military Activity:</b><br>{'<br>'.join(military_activity_list)}")
other_events_widget = widgets.HTML(value=f"<b>Other Events:</b><br>{'<br>'.join(other_events_list)}")
legacy_widget = widgets.HTML(value=f"<b>Legacy:</b><br>{'<br>'.join(legacy_list)}")
# Create a tab widget to organize the data
tab_contents = [economy_widget, foreign_affairs_widget, military_activity_widget, other_events_widget, legacy_widget]
tab_titles = ['Economy', 'Foreign Affairs', 'Military Activity', 'Other Events', 'Legacy']
tab = widgets.Tab()
tab.children = tab_contents
for i in range(len(tab_titles)):
tab.set_title(i, tab_titles[i])
# Display the tab widget
display(tab)
# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()
# Create an interactive dropdown menu to select the president
interact(display_president_data, president_name=widgets.Dropdown(options=president_names));
Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.
# understand Economy, Foreign Affairs, Military Activity, Other Events, Legacy
for i in range(0,46):
print(pres_merged_df['Other Events'].loc[i])
print()
['1791 Bill of Rights', '1792 Post Office founded.', '1792, 1796 Kentucky & Tennessee joined the Union'] ['1798 Alien & Sedition Act to silence critics; unpopular', '1800 Capital relocated to Washington DC', '1801 Nominated John Marshall chief justice of U.S.'] ['1803 The Louisiana purchase', '1804 12th Amendment changed Presidential election', '1804-06 Authorized Louis & Clark expedition'] ['1811 Cumberland Road construction starts (first National Road)', '1817 Veto on Bonus Bill for funding States improvements'] ['1819 Florida ceded to US', "1820 Missouri Compromise Slavery forbidden abv 36° 30'", '1820 In the election he received every electoral vote except one.'] [' Accused for "corrupt bargain" to obtain Clay\'s support in election', '1828 Baltimore/Ohio railroad'] ['1830 Indian Removal Act', "1832 South Carolina's nullification crisis over taxes", '1835 "The Trail of Tears". Cherokees forced to move.'] ['1838 "The Trail of Tears". Indians’ relocation, 4000 die', '1839 US vs. The Amistad: symbolic against slavery'] ['1841 Delivered the longest inaugural address (105 min)', '1841 Contracted pneumonia and died in the White House one month later.'] ['1841 His cabinet resigned after he vetoed banking bills', '1844 USS Princeton disaster. 8 died in Potomac,', '1845 Texas annexed followed by war with Mexico'] ['1846 A large crack in the Liberty Bell.', '1848 California Gold rush'] [' The question of extending slavery to the new territories dominated', '1846 Did not approve the "Compromise of 1850"'] ['1850 Compromise of 1850 and Fugitive Slave Act.'] ['1853 Gadsden Purchase. Land from Mexico.', '1854 Kansas-Nebraska Act. Slavery Debate reheated.', ' "border ruffians" and "jayhawkers" clash in Kansas'] ['1857 Dred Scott decision: States can decide on slavery', '1857 Mormons challenged federal authority in Utah.', '1860 Sth Carolina seceded. 7 states followed.'] ['1863 Emancipation Proclamation, freeing slaves', '1863 Gettysburg Address', '1865 Assassinated by John Wilkes Booth'] ['1865 Amnesty', '1867 Reconstruction Act & Office Tenure Act by Congress', ' Nebraska in the union', '1867 Purchase of Alaska', '1868 Impeachment'] ['1871 Civil Service', '1870-71 Enforcement Acts broke Ku Klux Klan', '1875 Civil Rights Act', ' Scandals: Credit Mobilier, Tweed Ring, Whiskey Ring'] ['1877 Reconstruction end. Army withdrew from the South', '1877 Railroad strikes and use of troops', '1877 Desert Land Act'] ['1881 On July 2, he was shot by Charles Julius Guiteau.', '1881 Garfield died of blood poisoning on September 19.'] ['1883 Pendleton Act: Civil hiring on merit'] ['1886 Statue of Liberty', ' Curtailed largess of war veterans pensions', '1887 Anti-Polygamy Act', '1887 Dawes Severalty Act - destroys Indian governments'] ['1889 Opening of Oklahoma to 20,000 settlers', '1889-90 6 states admitted to the Union', '1891 Forest Reserve Act; Forest reserves are public.'] ['1893 Pullman strike.'] ['1898 Yellow Journalism (Hyped Maine)', '1898 Hawaii annexed', '1901 On Sep 6, he was shot by an anarchist in Buffalo and died 8 days later.'] [' Conservation becomes an issue. Creation of National parks & forests', '1906 Pure Food & Drug Act - Meat Inspection Act: New Safety standards'] [' Record antitrust suits', '1912 New states: Arizona & New Mexico.', '1912 US dept. of Commerce created'] ['1916 Child labor curtailed', '1916 Federal Farm Loan Act; cheap loans to farmers', '1920 Prohibition', '1920 19th Amendment, Women win the right to vote'] ['1921 Federal Highway Act - the age of the "motor car"', '1922 Great Railway strike', ' Bureau of Veterans Affairs', ' Teapot Dome scandal and many others'] ['1924 Immigration Act limits immigrants from South & East Europe', '1924 Snyder Act-Indians get citizenship', '1927 Mississippi flood'] ['1932 Reconstruction Finance Corporation to provide business loans.', '1932 The "Bonus Army" incident. Veterans were killed.'] ['1933 First 100 days legislation frenzy', '1933 1st New Deal: acts on relief, recovery, reform', '1935 2nd New Deal: WPA, Social Security,Labor support'] ['1945 Fair Deal: health care, civil rights etc.', '1947 Pres. Succession Act', '1947 CIA established.', '1951 Dismissal of Gen. Douglas MacArthur'] [' Alaska and Hawaii admitted as states', '1957 Sent Troops to Little Rock to enforce integration', '1958 NASA established.', '1960 Civil Rights'] ['1961 Peace Corps program', '1961 "Moon race" starts', '1963 "Washington March"', '1963 Assassinated In Dallas by Lee Harvey Oswald'] ['1964 The Civil Rights Act', '1964 Great Society & War on Poverty programs', '1963-65 Miranda case', ' Urban riots / antiwar riots', '1968 M. Luther King killed'] ['1969 Moon landing', '1970 Environment Act', '1973 Spiro Agnew resigned', '1973 Watergate scandal', '1974 Resigned'] ['1974 Granted a pardon to Nixon.', '1975 Airlift of 237,000 Vietnamese refugees'] [' Pardoned Vietnam War draft evaders', ' Energy Department', ' Boycott of 1980 Olympics'] ['1981 Assassination attempt by John W. Hinkley,', '1981 Fired 11,345 striking air traffic controllers.', '1986 War on Drugs'] ['1990 Americans with Disabilities Act', '1990 Immigration Act; result: increase of legal immigration 40%'] ["1993 “Don't ask, don't tell”- gays in the military.", ' Monica Lewinsky scandal & Impeachment'] ['2001 9/11', '2001 Patriot Act', '2002 “no child left behind” law to improve education', '2005 Hurricane Katrina'] ['2010 Healthcare reform: Affordable Care Act (Obamacare)', "2010 End of the “Don't ask, don't tell policy” for LBGT in the military", '2012 Same sex couples now have the right to be married.'] ['2020 Covid-19 pandemic', ' Impeached twice', '2021 US Capitol was stormed by Trump supporters.'] ['Biden pledged to double climate funding to developing countries by 2024']
To visualize the most common U.S. president names and the distribution of political parties associated with each name, we have created an interactive horizontal bar chart. The chart shows the frequency of occurrence for each president name and, when you hover over a bar, it displays the party names and their respective counts along with the percentage they represent for that specific president name.
import pandas as pd
import plotly.graph_objects as go
import re
# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Party'
# Filter out NaN (float) values from the 'President_x' column
pres_merged_df = pres_merged_df.dropna(subset=['President_x'])
# Convert the 'President_x' column to string type
pres_merged_df['President_x'] = pres_merged_df['President_x'].astype(str)
# Split the names of the presidents into individual words using regex
pres_merged_df['President_x'] = pres_merged_df['President_x'].apply(lambda x: re.findall(r'\w+', x))
# Flatten the list of names
all_names = [name for sublist in pres_merged_df['President_x'] for name in sublist]
# Count the frequency of each name
name_freq = pd.Series(all_names).value_counts()
# Create a DataFrame to store president names and their frequencies
president_data = pd.DataFrame({'Name': name_freq.index, 'Frequency': name_freq.values})
# Filter out names with frequency less than 3
president_data = president_data[president_data['Frequency'] >= 3]
# Create a new column in the DataFrame to store party information
president_data['Party'] = None
# Function to get the party for each president name
def get_party(name):
party = pres_merged_df.loc[pres_merged_df['President_x'].apply(lambda x: name in x), 'Party'].values
return party[0] if len(party) > 0 else None
# Apply the function to get the party information for each president name
president_data['Party'] = president_data['Name'].apply(get_party)
# Check if the president_data DataFrame is not empty
if not president_data.empty:
# Calculate the percentage of each party for each president name
party_percentages = []
for name, freq in zip(president_data['Name'], president_data['Frequency']):
party_counts = pres_merged_df.loc[pres_merged_df['President_x'].apply(lambda x: name in x), 'Party'].value_counts()
party_percentage = [f"{party}: {count} ({(count / freq * 100):.1f}%)" for party, count in party_counts.items()]
party_percentages.append(', '.join(party_percentage))
# Create the interactive horizontal bar plot using Plotly
fig = go.Figure()
fig.add_trace(
go.Bar(
x=president_data['Frequency'],
y=president_data['Name'],
name='Presidents',
orientation='h',
hovertext=party_percentages
)
)
# Update layout for better appearance
fig.update_layout(
title="Frequency of U.S. President Names with Party Information",
xaxis_title="Frequency",
yaxis_title="President Names",
hovermode='closest',
barmode='stack',
showlegend=False, # No need to show the legend as there's only one trace
)
# Show the plot
fig.show()
else:
print("No data available for the selected condition.")
# Merge pres_df_1, pres_df_3, and pres_df_4 based on the column 'President'
pres_merged_df = pres_df_1.merge(pres_df_4, left_index=True, right_index=True)
The horizontal bar chart above visualizes the age at marriage for the First Ladies of the United States. Each bar represents a First Lady, and the length of the bar indicates her age at the time of marriage. The hover text on each bar provides additional information, including the name of the President she married and her age at marriage.
By exploring this visualization, we can gain insights into the age at which the First Ladies married and identify the Presidents they were married to.
import pandas as pd
import matplotlib.pyplot as plt
# Calculate the age at marriage for the First Lady
pres_df_2['Age at Marriage, First Lady'] = (pres_df_2['Date of Marriage'] - pres_df_2['Date of Born, First Lady']).dt.days // 365
# Plot the distribution of age at marriage
plt.figure(figsize=(20, 12))
plt.hist(pres_df_2['Age at Marriage, First Lady'], bins=20, edgecolor='black', alpha=0.7)
# Calculate and plot the median, mean, and mode
median_age = pres_df_2['Age at Marriage, First Lady'].median()
mean_age = pres_df_2['Age at Marriage, First Lady'].mean()
mode_age = pres_df_2['Age at Marriage, First Lady'].mode().values[0]
plt.axvline(median_age, color='red', linestyle='dashed', linewidth=2, label=f'Median Age: {median_age:.1f}')
plt.axvline(mean_age, color='green', linestyle='dashed', linewidth=2, label=f'Mean Age: {mean_age:.1f}')
plt.axvline(mode_age, color='blue', linestyle='dashed', linewidth=2, label=f'Mode Age: {mode_age:.1f}')
plt.xlabel('Age at Marriage of First Lady')
plt.ylabel('Frequency')
plt.title('Distribution of Age at Marriage of First Ladies')
plt.legend()
plt.grid(True)
plt.show()
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_df_2' containing the data
# with columns: 'First Lady Name', 'Age at Marriage, First Lady', and 'President'
# Filter out rows with missing values in 'Age at Marriage, First Lady' column
pres_df_2 = pres_df_2.dropna(subset=['Age at Marriage, First Lady'])
# Create the horizontal bar chart using Plotly
fig = go.Figure()
# Add the bar data to the figure
fig.add_trace(go.Bar(
y=pres_df_2['First Lady Name'],
x=pres_df_2['Age at Marriage, First Lady'],
orientation='h',
text=pres_df_2.apply(lambda row: f"President: {row['President']} <br> First Lady: {row['First Lady Name']} <br> Age at Marriage: {row['Age at Marriage, First Lady']}", axis=1),
hoverinfo='text', # Show custom hover text
marker=dict(color='skyblue'),
opacity=0.8
))
# Update layout for better appearance
fig.update_layout(
title="Age at Marriage, First Lady",
xaxis_title="Age at Marriage (years)",
yaxis_title="First Lady Name",
showlegend=False,
bargap=0.1,
height=1400
)
# Show the plot
fig.show()
Question: What is the relationship between the height and weight of U.S. Presidents, and how does their Body Mass Index (BMI) and political party affiliation play a role?
To explore this relationship, we have created a scatter plot that showcases the height and weight of U.S. Presidents. The size of each circle in the plot represents the Body Mass Index (BMI) of the respective President, while the color of the circle corresponds to their political party affiliation. The hover text provides additional details, including the President's name, height, weight, political party, Body Mass Index (BMI), and the Body Mass Index Range they fall into.
The plot allows us to observe any potential patterns or trends regarding the height, weight, BMI, and political party affiliations of U.S. Presidents.
import pandas as pd
import plotly.express as px
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns: 'President', 'height_cm', 'weight_kg', 'body_mass_index', and 'political_party'
# Filter out NaN (missing) values
pres_df_3 = pres_df_3.dropna(subset=['height_cm', 'weight_kg', 'body_mass_index', 'political_party'])
# Create a dictionary to map political parties to colors
party_colors = {
'Democratic': 'blue',
'Republican': 'red',
# Add more parties and their corresponding colors here
}
# Map party names to colors using the dictionary
pres_df_3['Party Color'] = pres_df_3['political_party'].map(party_colors)
# Calculate the Body Mass Index (BMI) range for each person
def calculate_bmi_range(bmi):
if bmi < 18.5:
return 'Underweight'
elif 18.5 <= bmi < 24.9:
return 'Normal Weight'
elif 25 <= bmi < 29.9:
return 'Overweight'
else:
return 'Obese'
#pres_df_3['body_mass_index_range'] = pres_df_3['body_mass_index'].apply(calculate_bmi_range)
# Create the hover text with the desired information
hover_text = pres_df_3.apply(
lambda row: f"President: {row['President']}<br>"
f"Height: {row['height_cm']} cm<br>"
f"Weight: {row['weight_kg']} kg<br>"
f"Party: {row['political_party']}<br>"
f"Body Mass Index: {row['body_mass_index']}<br>"
f"Body Mass Index Range: {row['body_mass_index_range']}",
axis=1
)
# Create the scatter plot
fig = px.scatter(
pres_df_3,
x='height_cm',
y='weight_kg',
size='body_mass_index',
color='political_party',
color_discrete_map=party_colors, # Assign colors to parties
hover_name='President', # Display the President name on hover
custom_data=['height_cm', 'weight_kg', 'body_mass_index', 'political_party', 'body_mass_index_range'],
title="Height vs. Weight with Body Mass Index and Political Party",
labels={'height_cm': 'Height (cm)', 'weight_kg': 'Weight (kg)'},
)
# Update hover information
fig.update_traces(
hovertemplate="<br>".join([
"President: %{hovertext}",
"Height: %{customdata[0]} cm",
"Weight: %{customdata[1]} kg",
"Party: %{customdata[3]}",
"Body Mass Index: %{customdata[2]}",
"Body Mass Index Range: %{customdata[4]}"
])
)
# Update the layout for better appearance
fig.update_layout(
autosize=False, # Turn off automatic sizing
width=1500, # Set the width of the plot
height=700, # Set the height of the plot
)
# Show the plot
fig.show()
Delve into the number of U.S. Presidents born in various states and identify the political affiliations associated with each state. This insightful bar chart illustrates the count of Presidents born in different states, with each bar color-coded based on the dominant political party. Hover over each bar to explore the Presidents born in that state and the percentage of each political party represented. The legend offers a visual reference for the party colors. Gain valuable insights into the states that significantly contributed to the nation's highest office and the political context that shaped their leadership.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns: 'birth_state' and 'political_party'
# Calculate the count of presidents born in each state
state_counts = pres_df_3['birth_state'].value_counts()
# Create a new DataFrame to store the hover text and color information
hover_text_data = pd.DataFrame(columns=['State', 'Presidents', 'Party Percentage', 'Color'])
# Define party colors
party_colors = {'Democrat': 'blue', 'Republican': 'red'}
# Iterate over each state and calculate the hover text and color information
for state in state_counts.index:
presidents_in_state = pres_df_3[pres_df_3['birth_state'] == state]['President']
party_percentage = pres_df_3[pres_df_3['birth_state'] == state]['political_party'].value_counts(normalize=True)
party_percentage_text = '<br>'.join([f"{party}: {percentage:.2f}" for party, percentage in party_percentage.items()])
max_party = party_percentage.idxmax()
color = party_colors.get(max_party, 'green')
hover_text_data = hover_text_data.append({'State': state, 'Presidents': ', '.join(presidents_in_state), 'Party Percentage': party_percentage_text, 'Color': color}, ignore_index=True)
# Create the bar chart using Plotly
fig = go.Figure()
fig.add_trace(
go.Bar(
x=state_counts.index,
y=state_counts.values,
hovertext=[f"Presidents: {presidents}<br>Party Percentage:<br>{party_percentage}" for presidents, party_percentage in zip(hover_text_data['Presidents'], hover_text_data['Party Percentage'])],
hoverinfo='text',
marker=dict(color=hover_text_data['Color']),
name='Party', # Legend name for the colors
)
)
# Update layout for better appearance
fig.update_layout(
title="Number of Presidents Born in Each State",
xaxis_title="State",
yaxis_title="Number of Presidents",
hovermode='closest',
barmode='stack',
showlegend=True, # Show the legend for the colors
)
# Show the plot
fig.show()
Explore the birthplaces of U.S. Presidents and uncover the states that have contributed the most to America's leadership. This interactive map allows you to visualize the distribution of President birthplaces across different states. Hover over each state to discover the Presidents born there and the dominant political party in that state. The color legend provides clarity on the parties' representation. Discover the geographical origins of U.S. Presidents and the political landscape that shaped their rise to power.
import pandas as pd
import folium
from folium import Choropleth, GeoJson
import geopandas as gpd
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column named 'birth_state' containing the names of states where presidents were born
# Calculate the most common states where presidents were born
most_common_states = pres_df_3['birth_state'].value_counts()
# Shapefile location
states_shapefile = r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\Notebooks\course1\maps\ne_10m_admin_1_states_provinces.shp'
# Read the shapefile using geopandas with the correct encoding
gdf = gpd.read_file(states_shapefile, encoding='latin1')
# Merge the GeoDataFrame with the most common states data
merged_gdf = gdf.merge(most_common_states, left_on='name', right_index=True)
# Create a map centered on the U.S. using Folium
map_us = folium.Map(location=[37.0902, -95.7129], zoom_start=4)
# Add the choropleth map layer to the map
Choropleth(
geo_data=merged_gdf,
name='choropleth',
data=most_common_states.reset_index(),
columns=['index', 'birth_state'], # Column names for the data
key_on='feature.properties.name', # Key for the GeoJSON data
fill_color='YlGnBu', # Color palette for the map
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Most Common States of President Births',
).add_to(map_us)
# Add state names and birth counts to the map as popups
for _, row in merged_gdf.iterrows():
folium.Marker(
location=[row.geometry.centroid.y, row.geometry.centroid.x],
popup=f"{row['name']}: {row['birth_state']} Presidents Born",
icon=folium.Icon(color='red', icon='info-sign')
).add_to(map_us)
# Display the map
map_us
Title: Which U.S. Presidents Had the Highest Corrected IQ?
Description: This horizontal bar chart visualizes the corrected IQ scores of U.S. Presidents, sorted from the most intelligent to the least. Each bar represents a president, and the length of the bar corresponds to their IQ score. The hover text displays the president's name, their corrected IQ, and their political party. The bars are color-coded based on the political party, making it easy to identify the party affiliation of each president.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns 'President', 'corrected_iq', and 'political_party'
# Sort the DataFrame by 'corrected_iq' in descending order
pres_df_3_sorted = pres_df_3.sort_values(by='corrected_iq', ascending=False)
# Create a dictionary to map political parties to colors
party_colors = {
'Unaffiliated': 'gray',
'Federalist': 'darkblue',
'Democratic-Republican': 'green',
'Democrat': 'blue',
'Whig': 'purple',
'Republican': 'red',
'National Union': 'orange',
}
# Create the horizontal bar chart using Plotly
fig = go.Figure()
fig.add_trace(go.Bar(
x=pres_df_3_sorted['corrected_iq'],
y=pres_df_3_sorted['President'],
text=pres_df_3_sorted.apply(
lambda row: f"{row['President']}<br>IQ: {row['corrected_iq']}<br>Party: {row['political_party']}",
axis=1
),
hoverinfo='text',
marker=dict(color=pres_df_3_sorted['political_party'].map(party_colors).fillna('#AAAAAA')),
orientation='h',
))
# Update layout for better appearance
fig.update_layout(
title="IQ Scores of U.S. Presidents from Most Intelligent to Least",
title_font=dict(size=24),
xaxis=dict(title="Corrected IQ"),
yaxis=dict(title="President"),
showlegend=False,
width = 1400,
height = 1700,
)
# Show the plot
fig.show()
Title: Analyzing the IQ Distribution of U.S. Presidents
Description: The box plot provides a visual representation of the distribution of IQ scores among U.S. Presidents. The box represents the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the IQ scores. The line inside the box represents the median IQ score. Outliers are shown as individual points beyond the whiskers, which extend to a maximum of 1.5 times the IQR. This plot helps identify the central tendency and the spread of IQ scores among the presidents.
import pandas as pd
import plotly.express as px
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents
# Create the interactive box plot using Plotly
fig = px.box(
pres_df_3,
y='corrected_iq',
title="IQ Scores of U.S. Presidents",
labels={'corrected_iq': 'IQ Score'},
hover_data={'corrected_iq': True}, # Display the IQ score on hover
)
# Update layout for better appearance
fig.update_layout(
yaxis_title="IQ Score",
boxmode='group', # Display multiple boxes side by side
boxgroupgap=0.3, # Gap between boxes in the same group
showlegend=False, # Hide the legend
)
# Show the plot
fig.show()
Title: Comparing U.S. Presidents' IQ Scores Across Categories
Description: The radar chart displays the IQ scores of U.S. Presidents across various categories, such as verbal intelligence, mathematical intelligence, and logical reasoning. Each spoke on the radar chart represents a category, and the distance from the center to the data point corresponds to the IQ score of the president in that category. This chart allows for a quick comparison of each president's strengths and weaknesses in different cognitive areas.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents
# Create the radar chart using Plotly
fig = go.Figure()
fig.add_trace(go.Scatterpolar(
r=pres_df_3['corrected_iq'],
theta=pres_df_3['President'],
fill='toself',
hovertext=pres_df_3['corrected_iq'],
hoverinfo='text',
line=dict(color='blue')
))
# Update layout for better appearance
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[min(pres_df_3['corrected_iq']), max(pres_df_3['corrected_iq'])],
),
),
showlegend=False, # Hide the legend
title="IQ Scores of U.S. Presidents by Cognitive Abilities",
width = 1400,
height = 1400,
)
# Show the plot
fig.show()
Title: A Comprehensive View of U.S. Presidents' Cognitive Abilities
Description: This advanced radar chart provides a comprehensive view of U.S. Presidents' cognitive abilities by comparing their IQ scores across multiple dimensions, including verbal intelligence, mathematical intelligence, logical reasoning, memory, creativity, and problem-solving skills. Each spoke on the radar chart represents a cognitive category, and the distance from the center to the data point corresponds to the president's IQ score in that category. By visualizing these dimensions simultaneously, we can gain valuable insights into the cognitive profiles of different presidents.
import pandas as pd
import plotly.graph_objects as go
# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents
# Create the radar chart using Plotly
fig = go.Figure()
for i, (_, row) in enumerate(pres_df_3.iterrows()):
fig.add_trace(go.Scatterpolar(
r=[row['corrected_iq']], # Use a list with the corrected_iq value as the only element
theta=[row['President']],
fill='toself',
hovertext=f"{row['President']}<br>IQ: {row['corrected_iq']}<br>Party: {row['political_party']}",
hoverinfo='text',
line=dict(color=party_colors.get(row['political_party'], '#AAAAAA'), width=2),
name=row['President']
))
# Update layout for better appearance
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 200],
),
),
showlegend=False,
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1,
),
title="IQ Scores of U.S. Presidents by Cognitive Abilities and Political Party",
title_font=dict(size=24),
annotations=[], # Remove any annotations in the layout
width = 1400,
height = 1400,
)
# Show the plot
fig.show()
In conclusion, the project of studying U.S. presidents has been a comprehensive exploration of various facets of leadership. Through data collection, cleaning, and integration, as well as in-depth analysis and visualization, we gained valuable insights into the intelligence, traits, and characteristics that have shaped the nation's leadership over centuries. Our findings contribute to a broader understanding of presidential effectiveness and the complex interplay between intelligence, leadership, and historical significance. By analyzing and contextualizing these patterns, we hope to shed light on the diverse factors that influence the success of U.S. presidents and contribute to informed discussions on leadership in the highest office of the nation.
Feel free to reach out to me through the following platforms:
Connect with me on these platforms to stay updated with my latest projects and articles. I look forward to connecting with you!