cover

¶

Exploring U.S. Presidents: A Multidimensional Study

Introduction¶

Welcome to my in-depth study on the U.S. presidents throughout history! In this research project, I'll be delving into a wealth of data from numerous resources to uncover insights and patterns related to the American presidency. The data sets I have collected cover diverse aspects of each president's term, ranging from personal information to political achievements and economic indicators.

Research Objectives¶

  1. Understand demographic characteristics: Analyzing age at inauguration, birthplaces, and educational backgrounds.
  2. Assess political careers: Investigating the number of terms served, political party affiliations, and approval ratings.
  3. Examine electoral performance: Studying popular vote percentages and electoral college results.
  4. Analyze economic impacts: Exploring economic indicators during each presidency, such as GDP growth and unemployment rates.
  5. Visualize geographic patterns: Mapping voting outcomes by state and the birthplaces of each president.

Methodology¶

To achieve our research objectives and derive meaningful insights, we will follow a structured approach:

1. Data Collection¶

Gathering data from various reputable sources, including historical archives, government databases, and academic research papers. The datasets encompass information on all U.S. presidents from George Washington to the present day.

2. Data Cleaning and Integration¶

Ensuring the data is accurate and consistent by handling missing values, removing duplicates, and standardizing formats. Integrating relevant information from different datasets into a unified data repository for analysis.

3. Exploratory Data Analysis (EDA)¶

Conducting exploratory data analysis to understand the basic characteristics and distributions of the data. We will use visualizations like bar charts, line plots, and scatter plots to gain initial insights.

4. In-Depth Data Analysis¶

Performing comprehensive analyses using advanced statistical techniques to answer specific research questions. We will leverage regression analysis, correlation studies, and hypothesis testing to reveal underlying patterns and relationships.

5. Data Visualization¶

Creating visually appealing and informative charts, graphs, and maps to communicate our findings effectively. The visualizations will assist in conveying complex information in a clear and understandable manner.

6. Interpretation and Conclusion¶

Interpreting the results of our analyses and drawing meaningful conclusions. We will contextualize our findings within historical and political contexts to provide a comprehensive perspective on the U.S. presidency.

Acknowledgements¶

I would like to express my gratitude to all the data providers and researchers who have made this study possible. Their efforts in collecting and sharing valuable data have been instrumental in the success of this research.

Let's embark on this exciting journey together and discover the fascinating insights hidden within the historical data of the U.S. presidents!

¶

1. Data Collection

¶

Import the important libraries

In [3]:
import numpy as np  # Library for numerical computations
import pandas as pd  # Library for data manipulation and analysis
import seaborn as sns  # Library for statistical data visualization
import matplotlib.pyplot as plt  # Library for creating plots and visualizations
In [4]:
# Import the necessary Plotly libraries
import plotly.graph_objects as go  # Low-level interface for creating Plotly plots
import plotly.express as px  # Higher-level interface for creating interactive plots
In [5]:
# Import the necessary ipywidgets libraries
import ipywidgets as widgets  # Library for creating interactive widgets
from ipywidgets import interact, interact_manual  # Functions for creating interactive controls
In [6]:
import warnings
warnings.filterwarnings('ignore')

¶

Sourcing the Datasets: My Exploration and Methods

In this study, I compiled a collection of diverse datasets related to U.S. Presidents. To ensure the data's quality and reliability, I employed various methods to obtain these resources from different platforms. Below, I outline the steps I took to gather the datasets:

1. Coursera:¶

One of the primary sources for the U.S. Presidents' information was the online learning platform, Coursera. I accessed relevant datasets during my Data Science course titled "Data Science with Python." The datasets offered valuable information about the U.S. Presidents' backgrounds and accomplishments.

2. Kaggle:¶

Kaggle, a well-known data science community and platform, served as another reliable resource. I found multiple datasets on Kaggle, including "First Ladies' Data" and "Historical Presidents Physical Data," which provided unique insights into related aspects.

3. Google Search:¶

To complement the datasets from specialized platforms, I conducted targeted Google searches. Through this method, I discovered the dataset "U.S. Presidents Dataset (1)," which offered specific information not found in other sources.

4. Statista:¶

Statista, a reputable statistics portal, contributed valuable data to my study. The dataset "Most Common Names of U.S. Presidents (1789-2021)" provided an intriguing analysis of presidential names over the years.

5. World Bank:¶

For economic-related datasets, I referred to the World Bank, a renowned international organization. The datasets "USA Economic Growth" and "U.S. GDP During Presidencies" provided essential economic indicators during presidential tenures.

Each dataset underwent meticulous evaluation to ensure its relevance and accuracy. The combination of resources from various platforms allowed me to present a comprehensive study of U.S. Presidents with diverse perspectives and insights. Through this multi-faceted approach, I aimed to enhance the study's credibility and provide readers with a holistic understanding of the dataset collection process.

¶

Data Sets Information

Data Set Name Provider URL to Provider Variable Name in Code
U.S. Presidents' Information Coursera Coursera pres_df_1
First Ladies' Data Kaggle Kaggle pres_df_2
Historical Presidents Physical Data (More) Kaggle Kaggle pres_df_3
U.S. Presidents Dataset (1) Google Search Google Search pres_df_4
U.S. Presidents Popular Vote Percentage Dataset (1) Kaggle Kaggle pres_df_5
Most Common Names of U.S. Presidents (1789-2021) (1) Statista Statista pres_df_6
USA Economic Growth Dataset World Bank World Bank pres_df_7
U.S. GDP During Presidencies Dataset Statista Statista pres_df_8
Interactive Resource Viewer in JupyterLab¶

This code example demonstrates an interactive widget that allows you to choose a resource from a list and view its website directly within JupyterLab. The widget is created using ipywidgets and IPython.display.IFrame.

How it works:¶

  1. The code defines a dictionary resources that maps resource names to their corresponding URLs. For example, it includes providers like Coursera, Kaggle, Google Search, Statista, and World Bank.

  2. It creates a dropdown widget called resource_dropdown, where the available options are the names of the resources listed in the dictionary.

  3. When you select a resource from the dropdown, the update_iframe function is triggered. This function extracts the selected resource from the dropdown and displays its website URL using the IFrame widget.

  4. The IFrame is embedded in the output area, allowing you to interact with the chosen website directly within JupyterLab.

Instructions:¶

  1. Run the code in a JupyterLab code cell.

  2. After running the code, a dropdown widget will appear, showing the available resources.

  3. Choose a resource from the dropdown to view its website in the output area.

  4. The website will be displayed using an embedded IFrame, allowing you to explore the chosen resource without leaving JupyterLab.

Please note that this example uses hardcoded URLs for demonstration purposes. You can modify the resources dictionary to include the actual URLs of the resources you want to showcase in the widget. Additionally, you may replace the existing providers with your desired resources.

In [7]:
import ipywidgets as widgets
from IPython.display import display, IFrame

# Dictionary of resource names and their corresponding URLs
resources = {
    "Coursera": "https://www.coursera.org/",
    "Kaggle": "https://www.kaggle.com",
    "Google Search": "https://www.google.com",
    "Statista": "https://www.statista.com",
    "World Bank": "https://www.data.worldbank.org",
    "Wikipedia": "https://www.wikipedia.org/"
}

# Dropdown widget to select the resource
resource_dropdown = widgets.Dropdown(
    options=list(resources.keys()),
    description='Select Resource:',
    layout=widgets.Layout(width='400px')
)

# Output widget to display the IFrame
output = widgets.Output()

# Function to update the IFrame based on the selected resource
def update_iframe(change):
    selected_resource = resource_dropdown.value
    with output:
        output.clear_output()
        display(IFrame(resources[selected_resource], width="100%", height="400px"))

# Attach the update function to the dropdown widget
resource_dropdown.observe(update_iframe, names='value')

# Display the dropdown widget and the output widget
display(resource_dropdown)
display(output)

You could use the code below if you like as well.

In [8]:
import webbrowser

resources = {
    "Coursera": "https://www.coursera.org",
    "Kaggle": "https://www.kaggle.com",
    "Google Search": "https://www.google.com",
    "Statista": "https://www.statista.com",
    "World Bank": "https://data.worldbank.org"
}

# Function to open the web page in the default browser
def open_website(url):
    webbrowser.open_new_tab(url)

# Loop through the resources and open their websites
for resource, url in resources.items():
    print(f"Opening {resource} website: {url}")
    open_website(url)
Opening Coursera website: https://www.coursera.org
Opening Kaggle website: https://www.kaggle.com
Opening Google Search website: https://www.google.com
Opening Statista website: https://www.statista.com
Opening World Bank website: https://data.worldbank.org

¶

load the dataset & start discovering

Let's read the files:

In this Python script, we will read the content of various files and display their contents.

In [10]:
# Reading the datasets into DataFrames

# Dataset 1: U.S. Presidents' Information
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents.csv
pres_df_1 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents.csv')

# Dataset 2: First Ladies' Data
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\first_ladies.csv
pres_df_2 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\first_ladies.csv')

# Dataset 3: Historical Presidents Physical Data (More)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\Historical Presidents Physical Data (More).csv
pres_df_3 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\Historical Presidents Physical Data (More).csv')

# Dataset 4: U.S. Presidents Dataset (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents(1).csv
pres_df_4 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\presidents(1).csv')

# Dataset 5: U.S. Presidents Popular Vote Percentage Dataset (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\pvp_dataset(1).csv
pres_df_5 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\pvp_dataset(1).csv')

# Dataset 6: Most Common Names of U.S. Presidents (1789-2021) (1)
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\us_presidents.csv
pres_df_6 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\us_presidents.csv')

# Dataset 7: USA Economy Growth Dataset
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USA Economy Growth.csv
# Note: Using encoding='iso-8859-1' to handle non-utf-8 encoded characters
pres_df_7 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USA Economy Growth.csv', encoding='iso-8859-1')

# Dataset 8: U.S. GDP During Presidencies Dataset
# Path: C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USGDPpresidents.csv
pres_df_8 = pd.read_csv(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\dataSets\course 1\week 3\datasets\presidents\USGDPpresidents.csv')

Let's start discovering the data sets:

In [11]:
pres_df_1.head()
Out[11]:
# President Born Age atstart of presidency Age atend of presidency Post-presidencytimespan Died Age
0 1 George Washington Feb 22, 1732[a] 57 years, 67 daysApr 30, 1789 65 years, 10 daysMar 4, 1797 2 years, 285 days Dec 14, 1799 67 years, 295 days
1 2 John Adams Oct 30, 1735[a] 61 years, 125 daysMar 4, 1797 65 years, 125 daysMar 4, 1801 25 years, 122 days Jul 4, 1826 90 years, 247 days
2 3 Thomas Jefferson Apr 13, 1743[a] 57 years, 325 daysMar 4, 1801 65 years, 325 daysMar 4, 1809 17 years, 122 days Jul 4, 1826 83 years, 82 days
3 4 James Madison Mar 16, 1751[a] 57 years, 353 daysMar 4, 1809 65 years, 353 daysMar 4, 1817 19 years, 116 days Jun 28, 1836 85 years, 104 days
4 5 James Monroe Apr 28, 1758 58 years, 310 daysMar 4, 1817 66 years, 310 daysMar 4, 1825 6 years, 122 days Jul 4, 1831 73 years, 67 days
In [12]:
pres_df_1.head()
Out[12]:
# President Born Age atstart of presidency Age atend of presidency Post-presidencytimespan Died Age
0 1 George Washington Feb 22, 1732[a] 57 years, 67 daysApr 30, 1789 65 years, 10 daysMar 4, 1797 2 years, 285 days Dec 14, 1799 67 years, 295 days
1 2 John Adams Oct 30, 1735[a] 61 years, 125 daysMar 4, 1797 65 years, 125 daysMar 4, 1801 25 years, 122 days Jul 4, 1826 90 years, 247 days
2 3 Thomas Jefferson Apr 13, 1743[a] 57 years, 325 daysMar 4, 1801 65 years, 325 daysMar 4, 1809 17 years, 122 days Jul 4, 1826 83 years, 82 days
3 4 James Madison Mar 16, 1751[a] 57 years, 353 daysMar 4, 1809 65 years, 353 daysMar 4, 1817 19 years, 116 days Jun 28, 1836 85 years, 104 days
4 5 James Monroe Apr 28, 1758 58 years, 310 daysMar 4, 1817 66 years, 310 daysMar 4, 1825 6 years, 122 days Jul 4, 1831 73 years, 67 days
In [13]:
pres_df_2.head()
Out[13]:
Unnamed: 0 relation name president born death age_of_death marriage_date
0 0 Husband Martha Dandridge George Washington June 13, 1731 May 22, 1802 70.0 January 6, 1759
1 1 Husband Abigail Smith John Adams November 22, 1744 October 28, 1818 73.0 October 25, 1764
2 2 Father Martha Jefferson Thomas Jefferson September 27, 1772 October 10, 1836 64.0 NaN
3 3 Husband Dolley Payne James Madison May 20, 1768 July 12, 1849 81.0 September 14, 1794
4 4 Husband Elizabeth Kortright James Monroe June 30, 1768 September 23, 1830 62.0 February 16, 1786
In [14]:
pres_df_3.head()
Out[14]:
order name height_cm height_in weight_kg weight_lb body_mass_index body_mass_index_range birth_day birth_month ... term_begin_year term_begin_date term_end_day term_end_month term_end_year term_end_date presidency_begin_age presidency_end_age political_party corrected_iq
0 1 George Washington 188 74.0 79.4 175 22.5 Normal 22 2 ... 1789 30-04-1789 4.0 3.0 1797.0 04-03-1797 57 65.0 Unaffiliated 140.0
1 2 John Adams 170 67.0 83.9 185 29.0 Overweight 30 10 ... 1797 04-03-1797 4.0 3.0 1801.0 04-03-1801 61 65.0 Federalist 155.0
2 3 Thomas Jefferson 189 74.5 82.1 181 23.0 Normal 13 4 ... 1801 04-03-1801 4.0 3.0 1809.0 04-03-1809 57 65.0 Democratic-Republican 160.0
3 4 James Madison 163 64.0 55.3 122 20.8 Normal 16 3 ... 1809 04-03-1809 4.0 3.0 1817.0 04-03-1817 57 65.0 Democratic-Republican 160.0
4 5 James Monroe 183 72.0 85.7 189 25.6 Overweight 28 4 ... 1817 04-03-1817 4.0 3.0 1825.0 04-03-1825 58 66.0 Democratic-Republican 139.0

5 rows × 32 columns

In [15]:
pres_df_4.head()
Out[15]:
No. Name Birthplace Birthday Life Height Children Religion Higher Education Occupation Military Service Term Party Vice President Previous Office Economy Foreign Affairs Military Activity Other Events Legacy
0 1 George Washington Pope's Creek, VA 22-Feb 1732-1799 1.88 0 Episcopalian None Plantation Owner, Soldier Commander-in-Chief of the Continental Army in... 1789-1797 None, Federalist John Adams Commander-in-Chief [' Hamilton established BUS', '1792 Coinage Ac... ['1793 Neutrality in the France-Britain confli... ['1794 Whiskey Rebellion'] ['1791 Bill of Rights', '1792 Post Office foun... He is universally regarded as one of the great...
1 2 John Adams Braintree, MA 30-Oct 1735-1826 1.70 5 Unitarian Harvard Lawyer, Farmer none 1797-1801 Federalist Thomas Jefferson 1st Vice President of USA ['1798 Progressive land value tax of up to 1% ... ['1797 the XYZ Affair: a bribe of French agent... ['1798–1800 The Quasi war. Undeclared naval wa... ['1798 Alien & Sedition Act to silence critics... One of the most experienced men ever to become...
2 3 Thomas Jefferson Goochland County, VA 13-Apr 1743-1826 1.89 6 unaffiliated Christian College of William and Mary Inventor,Lawyer, Architect Colonel of Virginia militia (without real mili... 1801-1809 Democratic-Republican Aaron Burr, George Clinton 2nd Vice President of USA ['1807 Embargo Act forbidding foreign trade in... ['1805 Peace Treaty with Tripoli. Piracy stopp... ['1801-05 Naval operation against Tripoli and ... ['1803 The Louisiana purchase', '1804 12th Ame... Probably the most intelligent man ever to occ...
3 4 James Madison Port Conway, VA 16-Mar 1751-1836 1.63 0 Episcopalian Princeton Plantation Owner, Lawyer Colonel of Virginia militia (without real mili... 1809-1817 Democratic-Republican George Clinton, Elbridge Gerry Secretary of State [' The first U.S. protective tariff was impose... ['1814 The Treaty of Ghent ends the War of 1812'] ['1811 Tippecanoe battle (Harrison vs. Chief T... ['1811 Cumberland Road construction starts (fi... His leadership in the War of 1812 was particul...
4 5 James Monroe Monroe Hall, VA 28-Apr 1758-1831 1.83 2 Episcopalian College of William and Mary Plantation Owner, Lawyer Major of the Continental Army 1817-1825 Democratic-Republican Daniel Tompkins Secretary of War ['1819 Panic of 1819 (too much land speculatio... ['1823 Monroe Doctrine', '1818 49th parallel s... ['1817 1st Seminole war against Seminole India... ['1819 Florida ceded to US', "1820 Missouri Co... His presidency contributed to national defense...
In [16]:
pres_df_5.head()
Out[16]:
year name party term salary position_title
0 1789 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
1 1790 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
2 1791 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
3 1792 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
4 1793 Washington,George Unaffiliated Second 25000 PRESIDENT OF THE UNITED STATES
In [17]:
pres_df_6.head()
Out[17]:
Unnamed: 0 S.No. start end president prior party vice
0 0 1 April 30, 1789 March 4, 1797 George Washington Commander-in-Chief of the Continental Army ... Nonpartisan [13] John Adams
1 1 2 March 4, 1797 March 4, 1801 John Adams 1st Vice President of the United States Federalist Thomas Jefferson
2 2 3 March 4, 1801 March 4, 1809 Thomas Jefferson 2nd Vice President of the United States Democratic- Republican Aaron Burr
3 3 4 March 4, 1809 March 4, 1817 James Madison 5th United States Secretary of State (1801–... Democratic- Republican George Clinton
4 4 5 March 4, 1817 March 4, 1825 James Monroe 7th United States Secretary of State (1811–... Democratic- Republican Daniel D. Tompkins
In [18]:
pres_df_7.head()
Out[18]:
Year GDP GDP per capita (in US$ PPP) GDP (in Bil. US$nominal) GDP per capita (in US$ nominal) GDP growth % Inflation rate % Unemployment % Government debt (in % of GDP) Presidents
0 1981 3207.0 13948.7 3207.0 13948.7 2.50% 10.40% 7.60% 31.00% Ronald Reagan
1 1982 3343.8 14405.0 3343.8 14405.0 -1.80% 6.20% 9.70% 34.00% Ronald Reagan
2 1983 3634.0 15513.7 3634.0 15513.7 4.60% 3.20% 9.60% 37.00% Ronald Reagan
3 1984 4037.7 17086.4 4037.7 17086.4 7.20% 4.40% 7.50% 38.00% Ronald Reagan
4 1985 4339.0 18199.3 4339.0 18199.3 4.20% 3.50% 7.20% 41.00% Ronald Reagan
In [19]:
pres_df_8.head()
Out[19]:
Unnamed: 0 Year CPI GDPdeflator population.K realGDPperCapita executive war battleDeaths battleDeathsPMP ... unemployment unempSource fedReceipts fedOutlays fedSurplus fedDebt fedReceipts_pGDP fedOutlays_pGDP fedSurplus_pGDP fedDebt_pGDP
0 1610 1610 NaN NaN 0.350 NaN JamesI NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1620 1620 NaN NaN 2.302 NaN JamesI NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1630 1630 NaN NaN 4.646 NaN CharlesI NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1640 1640 NaN NaN 26.634 NaN CharlesI NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1650 1650 NaN NaN 50.368 NaN Cromwell NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 21 columns

Interactive Data Set Exploration

This Python code below showcases an interactive data set exploration tool designed to facilitate data analysis. The primary objective is to allow users to interactively choose from a range of data sets, explore the datasets, and gain valuable insights without delving into the intricacies of programming.

The code offers a dropdown menu with a list of eight data sets. Once the user selects a data set, they can choose whether to view the beginning (head) or the end (tail) of the dataset. The chosen data set's summary is displayed, including the number of rows and columns in the dataset, presented in a bar plot for better visualization. The bars have numeric labels, representing the counts of rows in each data set.

Furthermore, the code generates a detailed HTML report using the pandas-profiling module. This report provides a comprehensive overview of the selected dataset, including data types, statistics, and any missing values.

Users can quickly switch between datasets and access important information effortlessly, without the need to dive into complex programming concepts. This interactive approach empowers users to make informed decisions and perform data-driven analyses efficiently.

To use this tool, the reader should follow these steps:

  1. Review the list of available data sets in the dropdown menu.
  2. Choose a dataset of interest from the menu.
  3. Decide whether to view the beginning (head) or the end (tail) of the selected dataset by checking the appropriate checkbox.
  4. Explore the data by observing the displayed information and the interactive bar plot, which shows the number of rows and columns for the chosen dataset.
  5. Further examine the dataset by referring to the detailed HTML report generated using the pandas-profiling module.
  6. To explore another dataset, simply choose a different dataset from the dropdown menu and repeat the steps above.

This interactive data set exploration tool makes it easy for users to interact with and analyze multiple datasets, making data exploration more intuitive, accessible, and informative.

In [20]:
import pandas as pd
import matplotlib.pyplot as plt
import pandas_profiling
from ipywidgets import interact, widgets
from IPython.display import display, HTML


# Dictionary to store the dataset names and corresponding DataFrames
datasets = {
    "U.S. Presidents' Information": pres_df_1,
    "First Ladies' Data": pres_df_2,
    "Historical Presidents Physical Data (More)": pres_df_3,
    "U.S. Presidents Dataset (1)": pres_df_4,
    "U.S. Presidents Popular Vote Percentage Dataset (1)": pres_df_5,
    "Most Common Names of U.S. Presidents (1789-2021) (1)": pres_df_6,
    "USA Economy Growth Dataset": pres_df_7,
    "U.S. GDP During Presidencies Dataset": pres_df_8,
}

# Function to display dataset information, head, tail, and profiling report
@interact(dataset=datasets.keys(), show_head=True)
def display_dataset_info(dataset, show_head):
    df = datasets[dataset]

    # Display head or tail of the dataset based on user selection
    if show_head:
        display(df.head())
    else:
        display(df.tail())

# Plotting number of columns and rows (shape) with numbers on the bars
    rows, cols = df.shape
    shape_plot = pd.DataFrame({'Rows': [rows], 'Columns': [cols]})
    ax = shape_plot.plot(kind='bar', legend=True, title='Number of Rows and Columns', figsize=(8, 6))
    plt.xlabel('Dataset')
    plt.ylabel('Count')
    plt.xticks(rotation=0)

    # Adding numbers on top of the bars
    for index, value in enumerate(shape_plot['Rows']):
        ax.text(index, value, str(value), ha='center', va='bottom', fontsize=10)

    plt.show()
    
    # Displaying information about column types and missing values using pandas-profiling
    display(HTML(f"<h3>{dataset} Information:</h3>"))
    profile = pandas_profiling.ProfileReport(df, title=dataset)
    display(profile)

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png image.png image.png

¶

2. Data Cleaning and Integration

Ensuring the data is accurate and consistent by handling missing values, removing duplicates, and standardizing formats. Integrating relevant information from different datasets into a unified data repository for analysis.

Data Understanding Completed: Time for Data Cleaning and Integration

Having successfully explored the datasets and gained insights into the data, we now have a solid understanding of our data sources and their respective attributes. We have familiarized ourselves with the intricacies of data gathering and the characteristics of each dataset, empowering us to make well-informed decisions moving forward.

With the initial data exploration completed, we are now ready to proceed to the crucial phases of data cleaning, combining, and integration. During these stages, we will focus on refining and preparing the data to be in its optimal form for analysis. This involves addressing various issues that may include:

  1. Data Cleaning: We will identify and handle missing values, inconsistent data formats, and potential outliers. Cleaning the data ensures its accuracy and reliability, mitigating any potential biases that could impact our analyses.

  2. Data Transformation: We may need to transform the data by performing feature engineering, scaling, or normalizing certain variables. These transformations can enhance the data's usability and enable more effective analysis.

  3. Data Integration: We will combine and merge datasets if necessary, ensuring that related information from multiple sources is unified into a cohesive dataset. Integration enables a more comprehensive view of the data and facilitates seamless analysis.

  4. Data Validation: We will validate the data to ensure its quality and verify that it aligns with our research objectives. Data validation is vital for maintaining the integrity of our analyses.

By performing these critical data preparation tasks, we are setting the foundation for robust and accurate data analysis. The subsequent stages of data exploration, modeling, and interpretation will be greatly enhanced, allowing us to extract valuable insights and make informed decisions based on the processed data.

As we embark on the data cleaning, combining, and integration journey, we must approach the process with precision, attention to detail, and a deep understanding of the datasets. By doing so, we will uncover meaningful patterns, trends, and relationships, transforming raw data into valuable knowledge and driving impactful outcomes from our analyses. Let us proceed with enthusiasm and diligence to unlock the true potential of our data and uncover valuable insights that will guide our decision-making process.

Now, we will examine each dataset independently, conducting data cleaning to address any issues like missing values and inconsistencies. Simultaneously, we will carefully choose the relevant columns from each dataset to facilitate smooth data integration. This targeted approach will prepare the datasets for seamless integration, setting the stage for comprehensive and informed data analysis.

¶

1.1 U.S. Presidents' Information : pres_df_1

In [21]:
pres_df_1
Out[21]:
# President Born Age atstart of presidency Age atend of presidency Post-presidencytimespan Died Age
0 1 George Washington Feb 22, 1732[a] 57 years, 67 daysApr 30, 1789 65 years, 10 daysMar 4, 1797 2 years, 285 days Dec 14, 1799 67 years, 295 days
1 2 John Adams Oct 30, 1735[a] 61 years, 125 daysMar 4, 1797 65 years, 125 daysMar 4, 1801 25 years, 122 days Jul 4, 1826 90 years, 247 days
2 3 Thomas Jefferson Apr 13, 1743[a] 57 years, 325 daysMar 4, 1801 65 years, 325 daysMar 4, 1809 17 years, 122 days Jul 4, 1826 83 years, 82 days
3 4 James Madison Mar 16, 1751[a] 57 years, 353 daysMar 4, 1809 65 years, 353 daysMar 4, 1817 19 years, 116 days Jun 28, 1836 85 years, 104 days
4 5 James Monroe Apr 28, 1758 58 years, 310 daysMar 4, 1817 66 years, 310 daysMar 4, 1825 6 years, 122 days Jul 4, 1831 73 years, 67 days
5 6 John Quincy Adams Jul 11, 1767 57 years, 236 daysMar 4, 1825 61 years, 236 daysMar 4, 1829 18 years, 356 days Feb 23, 1848 80 years, 227 days
6 7 Andrew Jackson Mar 15, 1767 61 years, 354 daysMar 4, 1829 69 years, 354 daysMar 4, 1837 8 years, 96 days Jun 8, 1845 78 years, 85 days
7 8 Martin Van Buren Dec 5, 1782 54 years, 89 daysMar 4, 1837 58 years, 89 daysMar 4, 1841 21 years, 142 days Jul 24, 1862 79 years, 231 days
8 9 William H. Harrison Feb 9, 1773 68 years, 23 daysMar 4, 1841 68 years, 54 days Apr 4, 1841[b] NaN Apr 4, 1841 68 years, 54 days
9 10 John Tyler Mar 29, 1790 51 years, 6 daysApr 4, 1841 54 years, 340 daysMar 4, 1845 16 years, 320 days Jan 18, 1862 71 years, 295 days
10 11 James K. Polk Nov 2, 1795 49 years, 122 daysMar 4, 1845 53 years, 122 daysMar 4, 1849 103 days Jun 15, 1849 53 years, 225 days
11 12 Zachary Taylor Nov 24, 1784 64 years, 100 daysMar 4, 1849 65 years, 227 daysJul 9, 1850[b] NaN Jul 9, 1850 65 years, 227 days
12 13 Millard Fillmore Jan 7, 1800 50 years, 183 daysJul 9, 1850 53 years, 56 daysMar 4, 1853 21 years, 4 days Mar 8, 1874 74 years, 60 days
13 14 Franklin Pierce Nov 23, 1804 48 years, 101 daysMar 4, 1853 52 years, 101 daysMar 4, 1857 12 years, 218 days Oct 8, 1869 64 years, 319 days
14 15 James Buchanan Apr 23, 1791 65 years, 315 daysMar 4, 1857 69 years, 315 daysMar 4, 1861 7 years, 89 days Jun 1, 1868 77 years, 39 days
15 16 Abraham Lincoln Feb 12, 1809 52 years, 20 daysMar 4, 1861 56 years, 62 daysApr 15, 1865[b] NaN Apr 15, 1865 56 years, 62 days
16 17 Andrew Johnson Dec 29, 1808 56 years, 107 daysApr 15, 1865 60 years, 65 daysMar 4, 1869 6 years, 149 days Jul 31, 1875 66 years, 214 days
17 18 Ulysses S. Grant Apr 27, 1822 46 years, 311 daysMar 4, 1869 54 years, 311 daysMar 4, 1877 8 years, 141 days Jul 23, 1885 63 years, 87 days
18 19 Rutherford B. Hayes Oct 4, 1822 54 years, 151 daysMar 4, 1877 58 years, 151 daysMar 4, 1881 11 years, 319 days Jan 17, 1893 70 years, 105 days
19 20 James A. Garfield Nov 19, 1831 49 years, 105 daysMar 4, 1881 49 years, 304 daysSep 19, 1881[b] NaN Sep 19, 1881 49 years, 304 days
20 21 Chester A. Arthur Oct 5, 1829 51 years, 349 daysSep 19, 1881 55 years, 150 daysMar 4, 1885 1 year, 259 days Nov 18, 1886 57 years, 44 days
21 22 Grover Cleveland Mar 18, 1837 47 years, 351 daysMar 4, 1885 51 years, 351 daysMar 4, 1889 4 years, 0 days[c] Jun 24, 1908 71 years, 98 days
22 23 Benjamin Harrison Aug 20, 1833 55 years, 196 daysMar 4, 1889 59 years, 196 daysMar 4, 1893 8 years, 9 days Mar 13, 1901 67 years, 205 days
23 24 Grover Cleveland Mar 18, 1837 55 years, 351 daysMar 4, 1893 59 years, 351 daysMar 4, 1897 11 years, 112 days[d] Jun 24, 1908 71 years, 98 days
24 25 William McKinley Jan 29, 1843 54 years, 34 daysMar 4, 1897 58 years, 228 daysSep 14, 1901[b] NaN Sep 14, 1901 58 years, 228 days
25 26 Theodore Roosevelt Oct 27, 1858 42 years, 322 daysSep 14, 1901 50 years, 128 daysMar 4, 1909 9 years, 308 days Jan 6, 1919 60 years, 71 days
26 27 William H. Taft Sep 15, 1857 51 years, 170 daysMar 4, 1909 55 years, 170 daysMar 4, 1913 17 years, 4 days Mar 8, 1930 72 years, 174 days
27 28 Woodrow Wilson Dec 28, 1856 56 years, 66 daysMar 4, 1913 64 years, 66 daysMar 4, 1921 2 years, 336 days Feb 3, 1924 67 years, 37 days
28 29 Warren G. Harding Nov 2, 1865 55 years, 122 daysMar 4, 1921 57 years, 273 daysAug 2, 1923[b] NaN Aug 2, 1923 57 years, 273 days
29 30 Calvin Coolidge Jul 4, 1872 51 years, 29 daysAug 2, 1923 56 years, 243 daysMar 4, 1929 3 years, 307 days Jan 5, 1933 60 years, 185 days
30 31 Herbert Hoover Aug 10, 1874 54 years, 206 daysMar 4, 1929 58 years, 206 daysMar 4, 1933 31 years, 230 days Oct 20, 1964 90 years, 71 days
31 32 Franklin D. Roosevelt Jan 30, 1882 51 years, 33 daysMar 4, 1933 63 years, 72 daysApr 12, 1945[b] NaN Apr 12, 1945 63 years, 72 days
32 33 Harry S. Truman May 8, 1884 60 years, 339 daysApr 12, 1945 68 years, 257 daysJan 20, 1953 19 years, 341 days Dec 26, 1972 88 years, 232 days
33 34 Dwight D. Eisenhower Oct 14, 1890 62 years, 98 daysJan 20, 1953 70 years, 98 daysJan 20, 1961 8 years, 67 days Mar 28, 1969 78 years, 165 days
34 35 John F. Kennedy May 29, 1917 43 years, 236 daysJan 20, 1961 46 years, 177 daysNov 22, 1963[b] NaN Nov 22, 1963 46 years, 177 days
35 36 Lyndon B. Johnson Aug 27, 1908 55 years, 87 daysNov 22, 1963 60 years, 146 daysJan 20, 1969 4 years, 2 days Jan 22, 1973 64 years, 148 days
36 37 Richard Nixon Jan 9, 1913 56 years, 11 daysJan 20, 1969 61 years, 212 daysAug 9, 1974[e] 19 years, 256 days Apr 22, 1994 81 years, 103 days
37 38 Gerald Ford Jul 14, 1913 61 years, 26 daysAug 9, 1974 63 years, 190 daysJan 20, 1977 29 years, 340 days Dec 26, 2006 93 years, 165 days
38 39 Jimmy Carter Oct 1, 1924 52 years, 111 daysJan 20, 1977 56 years, 111 daysJan 20, 1981 38 years, 175 days (living) 94 years, 286 days
39 40 Ronald Reagan Feb 6, 1911 69 years, 349 daysJan 20, 1981 77 years, 349 daysJan 20, 1989 15 years, 137 days Jun 5, 2004 93 years, 120 days
40 41 George H. W. Bush Jun 12, 1924 64 years, 222 daysJan 20, 1989 68 years, 222 daysJan 20, 1993 25 years, 314 days Nov 30, 2018 94 years, 171 days
41 42 Bill Clinton Aug 19, 1946 46 years, 154 daysJan 20, 1993 54 years, 154 daysJan 20, 2001 18 years, 175 days (living) 72 years, 329 days
42 43 George W. Bush Jul 6, 1946 54 years, 198 daysJan 20, 2001 62 years, 198 daysJan 20, 2009 10 years, 175 days (living) 73 years, 8 days
43 44 Barack Obama Aug 4, 1961 47 years, 169 daysJan 20, 2009 55 years, 169 daysJan 20, 2017 2 years, 175 days (living) 57 years, 344 days
In [22]:
pres_df_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44 entries, 0 to 43
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   #                          44 non-null     int64 
 1   President                  44 non-null     object
 2   Born                       44 non-null     object
 3   Age atstart of presidency  44 non-null     object
 4   Age atend of presidency    44 non-null     object
 5   Post-presidencytimespan    36 non-null     object
 6   Died                       44 non-null     object
 7   Age                        44 non-null     object
dtypes: int64(1), object(7)
memory usage: 2.9+ KB

Data Set (pres_df_1) Exploration and Required Transformations:

In our analysis of pres_df_1, we identified several necessary transformations and data handling tasks:

  1. Rename the columns.
  2. Remove the redundant index that came with the data frame since we already have one.
  3. Split the names of the presidents into first and last names to enable better analysis, we will do this after merging the datasets.
  4. Convert the 'Born' column from text to datetime format for accurate time-based analysis.
  5. Add missing data for the names of Presidents Trump and Biden.
  6. Split the 'Age at Start of Presidency' column into two separate columns, one for the date and one for the year.
  7. Split the 'Age at End of Presidency' column into two separate columns, one for the date and one for the year.
  8. Handle missing values in the 'Post-presidency timespan' column, indicating whether the president is living or has passed away after their presidency based on information from the website:
    • https://potus.com/presidential-facts/time-after-presidency/.
  9. Convert the 'Died' column to datetime format for accurate time-based analysis.
  10. Convert all columns with time-related data to datetime format since they are currently in string format.

By performing these necessary transformations, we ensure the dataset's accuracy, consistency, and completeness, paving the way for robust data integration and insightful analysis.

In [23]:
# Task 0: Rename the columns
# Rename the columns
pres_df_1 = pres_df_1.rename(columns={'Age atstart of presidency': 'Age at start of presidency', 
                                     'Age atend of presidency': 'Age at end of presidency', 'Post-presidencytimespan': 'Post-presidency timespan'})
In [24]:
# Task 1: Remove the redundant index
pres_df_1 = pres_df_1.drop(columns=['#'], axis=1)
In [25]:
# Task 3: Split the 'Age at start of presidency' column into separate columns for age and date

import re

# Function to split the 'Age at start of presidency' column into separate columns for age and date
def split_age_and_date(row):
    age_date = row['Age at start of presidency']
    age_match = re.search(r'\d+ years?, \d+ days?', age_date)
    date_match = re.search(r'[A-Za-z]{3} \d{1,2}, \d{4}', age_date)
    age = age_match.group() if age_match else ""
    date = date_match.group() if date_match else ""
    return pd.Series({'Age at Start of Presidency': age.strip(), 'Start Date': date.strip()})

pres_df_1[['Age at Start of Presidency', 'Start Date of presidency']] = pres_df_1.apply(split_age_and_date, axis=1)


pres_df_1['Start Age of presidency'] = (pres_df_1['Age at start of presidency'].str.split('days', expand=True))[0]

pres_df_1 = pres_df_1.drop(['Age at Start of Presidency','Age at start of presidency'], axis=1)

# Display the updated DataFrame after performing Task 5
pres_df_1.head()
Out[25]:
President Born Age at end of presidency Post-presidency timespan Died Age Start Date of presidency Start Age of presidency
0 George Washington Feb 22, 1732[a] 65 years, 10 daysMar 4, 1797 2 years, 285 days Dec 14, 1799 67 years, 295 days Apr 30, 1789 57 years, 67
1 John Adams Oct 30, 1735[a] 65 years, 125 daysMar 4, 1801 25 years, 122 days Jul 4, 1826 90 years, 247 days Mar 4, 1797 61 years, 125
2 Thomas Jefferson Apr 13, 1743[a] 65 years, 325 daysMar 4, 1809 17 years, 122 days Jul 4, 1826 83 years, 82 days Mar 4, 1801 57 years, 325
3 James Madison Mar 16, 1751[a] 65 years, 353 daysMar 4, 1817 19 years, 116 days Jun 28, 1836 85 years, 104 days Mar 4, 1809 57 years, 353
4 James Monroe Apr 28, 1758 66 years, 310 daysMar 4, 1825 6 years, 122 days Jul 4, 1831 73 years, 67 days Mar 4, 1817 58 years, 310
In [26]:
# Task 4: Split the 'Age at end of presidency' column into separate columns for number with days and years, and the date
pres_df_1[['End of Presidency Age', 'End of Presidency Date']] = pres_df_1['Age at end of presidency'].str.split('days', expand=True)

# Remove leading and trailing whitespaces from 'End of Presidency Age' and 'End of Presidency Date' columns
pres_df_1['End of Presidency Age'] = pres_df_1['End of Presidency Age'].str.strip()
pres_df_1['End of Presidency Date'] = pres_df_1['End of Presidency Date'].str.strip()
pres_df_1 = pres_df_1.drop('Age at end of presidency',axis=1)
pres_df_1
Out[26]:
President Born Post-presidency timespan Died Age Start Date of presidency Start Age of presidency End of Presidency Age End of Presidency Date
0 George Washington Feb 22, 1732[a] 2 years, 285 days Dec 14, 1799 67 years, 295 days Apr 30, 1789 57 years, 67 65 years, 10 Mar 4, 1797
1 John Adams Oct 30, 1735[a] 25 years, 122 days Jul 4, 1826 90 years, 247 days Mar 4, 1797 61 years, 125 65 years, 125 Mar 4, 1801
2 Thomas Jefferson Apr 13, 1743[a] 17 years, 122 days Jul 4, 1826 83 years, 82 days Mar 4, 1801 57 years, 325 65 years, 325 Mar 4, 1809
3 James Madison Mar 16, 1751[a] 19 years, 116 days Jun 28, 1836 85 years, 104 days Mar 4, 1809 57 years, 353 65 years, 353 Mar 4, 1817
4 James Monroe Apr 28, 1758 6 years, 122 days Jul 4, 1831 73 years, 67 days Mar 4, 1817 58 years, 310 66 years, 310 Mar 4, 1825
5 John Quincy Adams Jul 11, 1767 18 years, 356 days Feb 23, 1848 80 years, 227 days Mar 4, 1825 57 years, 236 61 years, 236 Mar 4, 1829
6 Andrew Jackson Mar 15, 1767 8 years, 96 days Jun 8, 1845 78 years, 85 days Mar 4, 1829 61 years, 354 69 years, 354 Mar 4, 1837
7 Martin Van Buren Dec 5, 1782 21 years, 142 days Jul 24, 1862 79 years, 231 days Mar 4, 1837 54 years, 89 58 years, 89 Mar 4, 1841
8 William H. Harrison Feb 9, 1773 NaN Apr 4, 1841 68 years, 54 days Mar 4, 1841 68 years, 23 68 years, 54 Apr 4, 1841[b]
9 John Tyler Mar 29, 1790 16 years, 320 days Jan 18, 1862 71 years, 295 days Apr 4, 1841 51 years, 6 54 years, 340 Mar 4, 1845
10 James K. Polk Nov 2, 1795 103 days Jun 15, 1849 53 years, 225 days Mar 4, 1845 49 years, 122 53 years, 122 Mar 4, 1849
11 Zachary Taylor Nov 24, 1784 NaN Jul 9, 1850 65 years, 227 days Mar 4, 1849 64 years, 100 65 years, 227 Jul 9, 1850[b]
12 Millard Fillmore Jan 7, 1800 21 years, 4 days Mar 8, 1874 74 years, 60 days Jul 9, 1850 50 years, 183 53 years, 56 Mar 4, 1853
13 Franklin Pierce Nov 23, 1804 12 years, 218 days Oct 8, 1869 64 years, 319 days Mar 4, 1853 48 years, 101 52 years, 101 Mar 4, 1857
14 James Buchanan Apr 23, 1791 7 years, 89 days Jun 1, 1868 77 years, 39 days Mar 4, 1857 65 years, 315 69 years, 315 Mar 4, 1861
15 Abraham Lincoln Feb 12, 1809 NaN Apr 15, 1865 56 years, 62 days Mar 4, 1861 52 years, 20 56 years, 62 Apr 15, 1865[b]
16 Andrew Johnson Dec 29, 1808 6 years, 149 days Jul 31, 1875 66 years, 214 days Apr 15, 1865 56 years, 107 60 years, 65 Mar 4, 1869
17 Ulysses S. Grant Apr 27, 1822 8 years, 141 days Jul 23, 1885 63 years, 87 days Mar 4, 1869 46 years, 311 54 years, 311 Mar 4, 1877
18 Rutherford B. Hayes Oct 4, 1822 11 years, 319 days Jan 17, 1893 70 years, 105 days Mar 4, 1877 54 years, 151 58 years, 151 Mar 4, 1881
19 James A. Garfield Nov 19, 1831 NaN Sep 19, 1881 49 years, 304 days Mar 4, 1881 49 years, 105 49 years, 304 Sep 19, 1881[b]
20 Chester A. Arthur Oct 5, 1829 1 year, 259 days Nov 18, 1886 57 years, 44 days Sep 19, 1881 51 years, 349 55 years, 150 Mar 4, 1885
21 Grover Cleveland Mar 18, 1837 4 years, 0 days[c] Jun 24, 1908 71 years, 98 days Mar 4, 1885 47 years, 351 51 years, 351 Mar 4, 1889
22 Benjamin Harrison Aug 20, 1833 8 years, 9 days Mar 13, 1901 67 years, 205 days Mar 4, 1889 55 years, 196 59 years, 196 Mar 4, 1893
23 Grover Cleveland Mar 18, 1837 11 years, 112 days[d] Jun 24, 1908 71 years, 98 days Mar 4, 1893 55 years, 351 59 years, 351 Mar 4, 1897
24 William McKinley Jan 29, 1843 NaN Sep 14, 1901 58 years, 228 days Mar 4, 1897 54 years, 34 58 years, 228 Sep 14, 1901[b]
25 Theodore Roosevelt Oct 27, 1858 9 years, 308 days Jan 6, 1919 60 years, 71 days Sep 14, 1901 42 years, 322 50 years, 128 Mar 4, 1909
26 William H. Taft Sep 15, 1857 17 years, 4 days Mar 8, 1930 72 years, 174 days Mar 4, 1909 51 years, 170 55 years, 170 Mar 4, 1913
27 Woodrow Wilson Dec 28, 1856 2 years, 336 days Feb 3, 1924 67 years, 37 days Mar 4, 1913 56 years, 66 64 years, 66 Mar 4, 1921
28 Warren G. Harding Nov 2, 1865 NaN Aug 2, 1923 57 years, 273 days Mar 4, 1921 55 years, 122 57 years, 273 Aug 2, 1923[b]
29 Calvin Coolidge Jul 4, 1872 3 years, 307 days Jan 5, 1933 60 years, 185 days Aug 2, 1923 51 years, 29 56 years, 243 Mar 4, 1929
30 Herbert Hoover Aug 10, 1874 31 years, 230 days Oct 20, 1964 90 years, 71 days Mar 4, 1929 54 years, 206 58 years, 206 Mar 4, 1933
31 Franklin D. Roosevelt Jan 30, 1882 NaN Apr 12, 1945 63 years, 72 days Mar 4, 1933 51 years, 33 63 years, 72 Apr 12, 1945[b]
32 Harry S. Truman May 8, 1884 19 years, 341 days Dec 26, 1972 88 years, 232 days Apr 12, 1945 60 years, 339 68 years, 257 Jan 20, 1953
33 Dwight D. Eisenhower Oct 14, 1890 8 years, 67 days Mar 28, 1969 78 years, 165 days Jan 20, 1953 62 years, 98 70 years, 98 Jan 20, 1961
34 John F. Kennedy May 29, 1917 NaN Nov 22, 1963 46 years, 177 days Jan 20, 1961 43 years, 236 46 years, 177 Nov 22, 1963[b]
35 Lyndon B. Johnson Aug 27, 1908 4 years, 2 days Jan 22, 1973 64 years, 148 days Nov 22, 1963 55 years, 87 60 years, 146 Jan 20, 1969
36 Richard Nixon Jan 9, 1913 19 years, 256 days Apr 22, 1994 81 years, 103 days Jan 20, 1969 56 years, 11 61 years, 212 Aug 9, 1974[e]
37 Gerald Ford Jul 14, 1913 29 years, 340 days Dec 26, 2006 93 years, 165 days Aug 9, 1974 61 years, 26 63 years, 190 Jan 20, 1977
38 Jimmy Carter Oct 1, 1924 38 years, 175 days (living) 94 years, 286 days Jan 20, 1977 52 years, 111 56 years, 111 Jan 20, 1981
39 Ronald Reagan Feb 6, 1911 15 years, 137 days Jun 5, 2004 93 years, 120 days Jan 20, 1981 69 years, 349 77 years, 349 Jan 20, 1989
40 George H. W. Bush Jun 12, 1924 25 years, 314 days Nov 30, 2018 94 years, 171 days Jan 20, 1989 64 years, 222 68 years, 222 Jan 20, 1993
41 Bill Clinton Aug 19, 1946 18 years, 175 days (living) 72 years, 329 days Jan 20, 1993 46 years, 154 54 years, 154 Jan 20, 2001
42 George W. Bush Jul 6, 1946 10 years, 175 days (living) 73 years, 8 days Jan 20, 2001 54 years, 198 62 years, 198 Jan 20, 2009
43 Barack Obama Aug 4, 1961 2 years, 175 days (living) 57 years, 344 days Jan 20, 2009 47 years, 169 55 years, 169 Jan 20, 2017
In [27]:
# Task: Remove the [b] from the 'End of Presidency Date' column
pres_df_1['End of Presidency Date'] = pres_df_1['End of Presidency Date'].str.replace(r'\[b\]|\[e\]', '', regex=True)
In [28]:
# Task 5: Fill missing values in 'Post-presidency timespan' with 'Died in Office'
pres_df_1['Post-presidency timespan'].fillna('Died in Office', inplace=True)
In [29]:
# Task: Remove the [b], [c] and [d] from the 'Post-presidency timespan' column
pres_df_1['Post-presidency timespan'] = pres_df_1['Post-presidency timespan'].str.replace(r'\[b\]|\[c\]|\[d\]', '', regex=True)
In [30]:
# Task: Remove the [b] from the 'Born' column
pres_df_1['Born'] = pres_df_1['Born'].str.replace(r'\[b\]|\[a\]', '', regex=True)
In [31]:
# Task : Add the word "days" to 'Start Age of Presidency' and 'End of Presidency Age'
pres_df_1['Start Age of presidency'] = pres_df_1['Start Age of presidency'] + " days"
pres_df_1['End of Presidency Age'] = pres_df_1['End of Presidency Age'] + " days"
In [32]:
# Task 6: Add missing data for Presidents Trump and Biden
pres_df_1 = pres_df_1.append({'President': 'Donald J. Trump', 'Born': 'June 14, 1946', 'Start Age of presidency': '70 years, 220 days', 
                             'Post-presidency timespan': 'Living', 'Died': '(living)', 'Start Date of presidency': 'Jan 20 2017', 
                             'End of Presidency Age': '74 years, 222 days', 'End of Presidency Date': 'Jan 20, 2021','Age':'70 years, 220 days'}, 
                            ignore_index=True)

pres_df_1 = pres_df_1.append({'President': 'Joe Biden', 'Born': 'Nov 20 1942', 'Start Age of presidency': '77 years, 62 days',
                             'Post-presidency timespan': 'Living', 'Died': '(living)', 'Start Date of presidency': 'Jan 20, 2021', 
                             'End of Presidency Age': '81 years, 62 days', 'End of Presidency Date':'Nov 5, 2024','Age':'78 years, 61 days'}, 
                            ignore_index=True)
In [33]:
# Task 7:  Convert specified columns to datetime data
date_columns = ['Born', 'Died', 'Start Date of presidency', 'End of Presidency Date']
pres_df_1[date_columns] = pres_df_1[date_columns].apply(pd.to_datetime, errors='coerce')
In [34]:
# Task 7:  Convert specified columns to datetime data
# Function to convert 'number years, number days', 'Died in Office', 'Living', and 'number days' to timedelta
def convert_years_days_to_timedelta(row):
    if pd.notna(row):
        if 'Died' in row:
            return row  # Keep "Died in Office" intact
        elif 'Living' in row:
            return pd.NaT  # Handle "Living" case
        else:
            years_days = row.split(', ')
            years = int(years_days[0].split()[0]) if 'year' in years_days[0] else 0
            days = int(years_days[1].split()[0]) if len(years_days) > 1 and 'day' in years_days[1] else int(years_days[0].split()[0])
            return pd.Timedelta(days=years*365 + days)
    else:
        return pd.NaT

# Convert Age, Start Age of presidency, End of Presidency Age, and Post-presidency timespan to timedelta
timedelta_columns = ['Age', 'Start Age of presidency', 'End of Presidency Age', 'Post-presidency timespan']
pres_df_1[timedelta_columns] = pres_df_1[timedelta_columns].applymap(convert_years_days_to_timedelta)
In [35]:
pres_df_1
Out[35]:
President Born Post-presidency timespan Died Age Start Date of presidency Start Age of presidency End of Presidency Age End of Presidency Date
0 George Washington 1732-02-22 1015 days 00:00:00 1799-12-14 24750 days 1789-04-30 20872 days 23735 days 1797-03-04
1 John Adams 1735-10-30 9247 days 00:00:00 1826-07-04 33097 days 1797-03-04 22390 days 23850 days 1801-03-04
2 Thomas Jefferson 1743-04-13 6327 days 00:00:00 1826-07-04 30377 days 1801-03-04 21130 days 24050 days 1809-03-04
3 James Madison 1751-03-16 7051 days 00:00:00 1836-06-28 31129 days 1809-03-04 21158 days 24078 days 1817-03-04
4 James Monroe 1758-04-28 2312 days 00:00:00 1831-07-04 26712 days 1817-03-04 21480 days 24400 days 1825-03-04
5 John Quincy Adams 1767-07-11 6926 days 00:00:00 1848-02-23 29427 days 1825-03-04 21041 days 22501 days 1829-03-04
6 Andrew Jackson 1767-03-15 3016 days 00:00:00 1845-06-08 28555 days 1829-03-04 22619 days 25539 days 1837-03-04
7 Martin Van Buren 1782-12-05 7807 days 00:00:00 1862-07-24 29066 days 1837-03-04 19799 days 21259 days 1841-03-04
8 William H. Harrison 1773-02-09 Died in Office 1841-04-04 24874 days 1841-03-04 24843 days 24874 days 1841-04-04
9 John Tyler 1790-03-29 6160 days 00:00:00 1862-01-18 26210 days 1841-04-04 18621 days 20050 days 1845-03-04
10 James K. Polk 1795-11-02 103 days 00:00:00 1849-06-15 19570 days 1845-03-04 18007 days 19467 days 1849-03-04
11 Zachary Taylor 1784-11-24 Died in Office 1850-07-09 23952 days 1849-03-04 23460 days 23952 days 1850-07-09
12 Millard Fillmore 1800-01-07 7669 days 00:00:00 1874-03-08 27070 days 1850-07-09 18433 days 19401 days 1853-03-04
13 Franklin Pierce 1804-11-23 4598 days 00:00:00 1869-10-08 23679 days 1853-03-04 17621 days 19081 days 1857-03-04
14 James Buchanan 1791-04-23 2644 days 00:00:00 1868-06-01 28144 days 1857-03-04 24040 days 25500 days 1861-03-04
15 Abraham Lincoln 1809-02-12 Died in Office 1865-04-15 20502 days 1861-03-04 19000 days 20502 days 1865-04-15
16 Andrew Johnson 1808-12-29 2339 days 00:00:00 1875-07-31 24304 days 1865-04-15 20547 days 21965 days 1869-03-04
17 Ulysses S. Grant 1822-04-27 3061 days 00:00:00 1885-07-23 23082 days 1869-03-04 17101 days 20021 days 1877-03-04
18 Rutherford B. Hayes 1822-10-04 4334 days 00:00:00 1893-01-17 25655 days 1877-03-04 19861 days 21321 days 1881-03-04
19 James A. Garfield 1831-11-19 Died in Office 1881-09-19 18189 days 1881-03-04 17990 days 18189 days 1881-09-19
20 Chester A. Arthur 1829-10-05 624 days 00:00:00 1886-11-18 20849 days 1881-09-19 18964 days 20225 days 1885-03-04
21 Grover Cleveland 1837-03-18 1460 days 00:00:00 1908-06-24 26013 days 1885-03-04 17506 days 18966 days 1889-03-04
22 Benjamin Harrison 1833-08-20 2929 days 00:00:00 1901-03-13 24660 days 1889-03-04 20271 days 21731 days 1893-03-04
23 Grover Cleveland 1837-03-18 4127 days 00:00:00 1908-06-24 26013 days 1893-03-04 20426 days 21886 days 1897-03-04
24 William McKinley 1843-01-29 Died in Office 1901-09-14 21398 days 1897-03-04 19744 days 21398 days 1901-09-14
25 Theodore Roosevelt 1858-10-27 3593 days 00:00:00 1919-01-06 21971 days 1901-09-14 15652 days 18378 days 1909-03-04
26 William H. Taft 1857-09-15 6209 days 00:00:00 1930-03-08 26454 days 1909-03-04 18785 days 20245 days 1913-03-04
27 Woodrow Wilson 1856-12-28 1066 days 00:00:00 1924-02-03 24492 days 1913-03-04 20506 days 23426 days 1921-03-04
28 Warren G. Harding 1865-11-02 Died in Office 1923-08-02 21078 days 1921-03-04 20197 days 21078 days 1923-08-02
29 Calvin Coolidge 1872-07-04 1402 days 00:00:00 1933-01-05 22085 days 1923-08-02 18644 days 20683 days 1929-03-04
30 Herbert Hoover 1874-08-10 11545 days 00:00:00 1964-10-20 32921 days 1929-03-04 19916 days 21376 days 1933-03-04
31 Franklin D. Roosevelt 1882-01-30 Died in Office 1945-04-12 23067 days 1933-03-04 18648 days 23067 days 1945-04-12
32 Harry S. Truman 1884-05-08 7276 days 00:00:00 1972-12-26 32352 days 1945-04-12 22239 days 25077 days 1953-01-20
33 Dwight D. Eisenhower 1890-10-14 2987 days 00:00:00 1969-03-28 28635 days 1953-01-20 22728 days 25648 days 1961-01-20
34 John F. Kennedy 1917-05-29 Died in Office 1963-11-22 16967 days 1961-01-20 15931 days 16967 days 1963-11-22
35 Lyndon B. Johnson 1908-08-27 1462 days 00:00:00 1973-01-22 23508 days 1963-11-22 20162 days 22046 days 1969-01-20
36 Richard Nixon 1913-01-09 7191 days 00:00:00 1994-04-22 29668 days 1969-01-20 20451 days 22477 days 1974-08-09
37 Gerald Ford 1913-07-14 10925 days 00:00:00 2006-12-26 34110 days 1974-08-09 22291 days 23185 days 1977-01-20
38 Jimmy Carter 1924-10-01 14045 days 00:00:00 NaT 34596 days 1977-01-20 19091 days 20551 days 1981-01-20
39 Ronald Reagan 1911-02-06 5612 days 00:00:00 2004-06-05 34065 days 1981-01-20 25534 days 28454 days 1989-01-20
40 George H. W. Bush 1924-06-12 9439 days 00:00:00 2018-11-30 34481 days 1989-01-20 23582 days 25042 days 1993-01-20
41 Bill Clinton 1946-08-19 6745 days 00:00:00 NaT 26609 days 1993-01-20 16944 days 19864 days 2001-01-20
42 George W. Bush 1946-07-06 3825 days 00:00:00 NaT 26653 days 2001-01-20 19908 days 22828 days 2009-01-20
43 Barack Obama 1961-08-04 905 days 00:00:00 NaT 21149 days 2009-01-20 17324 days 20244 days 2017-01-20
44 Donald J. Trump 1946-06-14 NaT NaT 25770 days 2017-01-20 25770 days 27232 days 2021-01-20
45 Joe Biden 1942-11-20 NaT NaT 28531 days 2021-01-20 28167 days 29627 days 2024-11-05
In [36]:
pres_df_1.dtypes
Out[36]:
President                            object
Born                         datetime64[ns]
Post-presidency timespan             object
Died                         datetime64[ns]
Age                         timedelta64[ns]
Start Date of presidency     datetime64[ns]
Start Age of presidency     timedelta64[ns]
End of Presidency Age       timedelta64[ns]
End of Presidency Date       datetime64[ns]
dtype: object

Here are the steps we took to clean the pres_df_1 DataFrame and the achievements:

Data Cleaning Steps:¶

  1. Convert Date Columns to DateTime: We converted the 'Born', 'Died', 'Start Date of presidency', and 'End of Presidency Date' columns to DateTime data type using pd.to_datetime. This ensures that the date values are in a proper datetime format.

  2. Convert Years and Days to Timedelta: We defined a function convert_years_days_to_timedelta to handle the conversion of columns with 'number years, number days', 'Died in Office', 'Living', and 'number days' entries to timedelta format. The function uses string parsing to extract the years and days and calculates the timedelta accordingly. We applied this function to the 'Age', 'Start Age of presidency', 'End of Presidency Age', and 'Post-presidency timespan' columns using applymap.

  3. Handling Missing Data: We used pd.NaT to represent missing values in the 'Died in Office' and 'Living' cases for the 'Post-presidency timespan' column.

Achievements:¶

  • Converted date columns to DateTime data type: The date columns 'Born', 'Died', 'Start Date of presidency', and 'End of Presidency Date' are now in a proper DateTime format, making it easier to perform date-related operations.

  • Converted years and days to Timedelta: The columns 'Age', 'Start Age of presidency', 'End of Presidency Age', and 'Post-presidency timespan' were converted to Timedelta format. This allows for meaningful calculations and comparisons of time durations.

  • Handled Missing Data: We properly handled missing data for 'Died in Office' and 'Living' cases in the 'Post-presidency timespan' column by using pd.NaT.

Overall, the data in pres_df_1 is now cleaned and well-prepared for further analysis and insights.

¶

1.2 First Ladies' Data : pres_df_2

In [37]:
pres_df_2
Out[37]:
Unnamed: 0 relation name president born death age_of_death marriage_date
0 0 Husband Martha Dandridge George Washington June 13, 1731 May 22, 1802 70.0 January 6, 1759
1 1 Husband Abigail Smith John Adams November 22, 1744 October 28, 1818 73.0 October 25, 1764
2 2 Father Martha Jefferson Thomas Jefferson September 27, 1772 October 10, 1836 64.0 NaN
3 3 Husband Dolley Payne James Madison May 20, 1768 July 12, 1849 81.0 September 14, 1794
4 4 Husband Elizabeth Kortright James Monroe June 30, 1768 September 23, 1830 62.0 February 16, 1786
5 5 Husband Louisa Catherine Johnson John Quincy Adams February 12, 1775 May 15, 1852 77.0 July 26, 1797
6 6 Uncle Emily Donelson Andrew Jackson June 1, 1807 December 19, 1836 29.0 NaN
7 7 Father-in-law Sarah Yorke Andrew Jackson July 16, 1803 August 23, 1887 84.0 NaN
8 8 Father-in-law Sarah Angelica Singleton Martin Van Buren February 13, 1818 December 29, 1877 59.0 NaN
9 9 Husband Anna Tuthill Symmes William Henry Harrison July 25, 1775 February 25, 1864 88.0 November 22, 1795
10 10 Father-in-law Jane Irwin William Henry Harrison July 23, 1804 May 11, 1846 41.0 NaN
11 11 Husband Letitia Christian John Tyler November 12, 1790 September 10, 1842 51.0 March 29, 1813
12 12 Father-in-law Elizabeth Priscilla Cooper John Tyler June 14, 1816 December 29, 1889 73.0 NaN
13 13 Husband Julia Gardiner John Tyler May 4, 1820 July 10, 1889 69.0 June 26, 1844
14 14 Husband Sarah Childress James K. Polk September 4, 1803 August 14, 1891 87.0 January 1, 1824
15 15 Husband Margaret Mackall Smith Zachary Taylor September 21, 1788 August 14, 1852 63.0 June 21, 1810
16 16 Husband Abigail Powers Millard Fillmore March 13, 1798 March 30, 1853 55.0 February 5, 1826
17 17 Husband Jane Means Appleton Franklin Pierce March 12, 1806 December 2, 1863 57.0 November 19, 1834
18 18 Uncle Harriet Rebecca Lane James Buchanan May 9, 1830 July 3, 1903 73.0 NaN
19 19 Husband Mary Ann Todd Abraham Lincoln December 13, 1818 July 16, 1882 63.0 November 4, 1842
20 20 Husband Eliza McCardle Andrew Johnson October 4, 1810 January 15, 1876 65.0 May 17, 1827
21 21 Husband Julia Boggs Dent Ulysses S. Grant January 26, 1826 December 14, 1902 76.0 August 22, 1848
22 22 Husband Lucy Ware Webb Rutherford B. Hayes August 28, 1831 June 25, 1889 57.0 December 30, 1852
23 23 Husband Lucretia Rudolph James A. Garfield April 19, 1832 March 14, 1918 85.0 November 11, 1858
24 24 Brother Mary Arthur McElroy Chester A. Arthur July 5, 1841 January 8, 1917 75.0 NaN
25 25 Brother Rose Elizabeth Cleveland Grover Cleveland June 13, 1846 November 22, 1918 72.0 NaN
26 26 Husband Frances Clara Folsom Grover Cleveland July 21, 1864 October 29, 1947 83.0 June 2, 1886
27 27 Husband Caroline Lavinia Scott Benjamin Harrison October 1, 1832 October 25, 1892 60.0 October 20, 1853
28 28 Father Mary Scott Harrison Benjamin Harrison April 3, 1858 October 28, 1930 72.0 NaN
29 29 Husband Frances Clara Folsom Grover Cleveland July 21, 1864 October 29, 1947 83.0 June 2, 1886
30 30 Husband Ida Saxton William McKinley June 8, 1847 May 26, 1907 59.0 January 25, 1871
31 31 Husband Edith Kermit Carow Theodore Roosevelt August 6, 1861 September 30, 1948 87.0 December 2, 1886
32 32 Husband Helen Louise Herron William H. Taft June 2, 1861 May 22, 1943 81.0 June 19, 1886
33 33 Husband Ellen Louise Axson Woodrow Wilson May 15, 1860 August 6, 1914 54.0 June 24, 1885
34 34 Father Margaret Woodrow Wilson Woodrow Wilson April 16, 1886 February 12, 1944 57.0 NaN
35 35 Husband Edith Bolling Woodrow Wilson October 15, 1872 December 28, 1961 89.0 December 18, 1915
36 36 Husband Florence Mabel Kling Warren G. Harding August 15, 1860 November 21, 1924 64.0 July 8, 1891
37 37 Husband Grace Anna Goodhue Calvin Coolidge January 3, 1879 July 8, 1957 78.0 October 4, 1905
38 38 Husband Lou Henry Herbert Hoover March 29, 1874 January 7, 1944 69.0 February 10, 1899
39 39 Husband Anna Eleanor Roosevelt Franklin D. Roosevelt October 11, 1884 November 7, 1962 78.0 March 17, 1905
40 40 Husband Elizabeth Virginia "Bess" Wallace Harry S. Truman February 13, 1885 October 18, 1982 97.0 June 28, 1919
41 41 Husband Mamie Geneva Doud Dwight D. Eisenhower November 14, 1896 November 1, 1979 82.0 July 1, 1916
42 42 Husband Jacqueline "Jackie" Lee Bouvier John F. Kennedy July 28, 1929 May 19, 1994 64.0 September 12, 1953
43 43 Husband Claudia Alta "Lady Bird" Taylor Lyndon B. Johnson December 22, 1912 July 11, 2007 94.0 November 17, 1934
44 44 Husband Thelma "Pat" Catherine Ryan Richard Nixon March 16, 1912 June 22, 1993 81.0 June 21, 1940
45 45 Husband Elizabeth "Betty" Ann Bloomer Gerald Ford April 8, 1918 July 8, 2011 93.0 October 15, 1948
46 46 Husband Eleanor Rosalynn Smith Jimmy Carter August 18, 1927 NaN NaN August 18, 1927
47 47 Husband Nancy Davis Ronald Reagan July 6, 1921 March 6, 2016 94.0 March 4, 1952
48 48 Husband Barbara Pierce George H. W. Bush June 8, 1925 April 17, 2018 92.0 January 6, 1945
49 49 Husband Hillary Diane Rodham Bill Clinton October 26, 1947 NaN NaN October 26, 1947
50 50 Husband Laura Lane Welch George W. Bush November 4, 1946 NaN NaN November 4, 1946
51 51 Husband Michelle LaVaughn Robinson Barack Obama January 17, 1964 NaN NaN January 17, 1964
52 52 Husband Melanija Knavs Donald Trump April 26, 1970 NaN NaN January 22, 2005
In [38]:
pres_df_2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Unnamed: 0     53 non-null     int64  
 1   relation       53 non-null     object 
 2   name           53 non-null     object 
 3   president      53 non-null     object 
 4   born           53 non-null     object 
 5   death          48 non-null     object 
 6   age_of_death   48 non-null     float64
 7   marriage_date  42 non-null     object 
dtypes: float64(1), int64(1), object(6)
memory usage: 3.4+ KB

In the analysis of pres_df_2, the following observations were made:

  1. Redundant Index: The dataset contains an additional index column that duplicates the row numbers. It is recommended to remove this redundant index column as it does not provide any useful information.

  2. Column Name Change: The column names in pres_df_2 may need to be modified for better clarity and consistency. We can consider renaming certain columns to ensure they are descriptive and easily understandable.

  3. Date Conversion: The columns "born," "death," and "marriage_date" appear to contain date-related information. To facilitate further analysis, it is essential to convert these columns into datetime data types.

To summarize the required data cleaning steps for pres_df_2:

  1. Drop the redundant index column.
  2. Rename columns for improved clarity and consistency.
  3. Convert the "born," "death," and "marriage_date" columns to datetime data types.

By implementing these data cleaning steps, we can enhance the usability and integrity of the pres_df_2 dataset for further analysis.

Note: The pres_df_2 dataset primarily focuses on the life events of First Ladies, not the Presidents. Therefore, all the columns, such as "Born," "Death," and others, are related to the First Ladies' biographical information and significant events in their lives. By conducting these data cleaning steps, we aim to make the dataset more intuitive and informative for exploratory analysis and data visualization.

In [39]:
# Task 1: Drop the redundant index column
pres_df_2 = pres_df_2.drop(columns='Unnamed: 0', axis=1)
In [40]:
# Task 2: Rename the columns
pres_df_2.rename(columns={
    'relation': 'Relation to President',
    'name': 'First Lady Name',
    'president': 'President',
    'born': 'Date of Born, First Lady',
    'death': 'Date of Death, First Lady',
    'age_of_death': 'Age at Death, First Lady',
    'marriage_date': 'Date of Marriage'
}, inplace=True)
In [41]:
# Task 3: Convert the 'Date of Born, First Lady', 'Date of Death, First Lady', and 'Date of Marriage' columns to datetime
date_columns = ['Date of Born, First Lady', 'Date of Death, First Lady', 'Date of Marriage']
pres_df_2[date_columns] = pres_df_2[date_columns].apply(pd.to_datetime)
In [42]:
pres_df_2
Out[42]:
Relation to President First Lady Name President Date of Born, First Lady Date of Death, First Lady Age at Death, First Lady Date of Marriage
0 Husband Martha Dandridge George Washington 1731-06-13 1802-05-22 70.0 1759-01-06
1 Husband Abigail Smith John Adams 1744-11-22 1818-10-28 73.0 1764-10-25
2 Father Martha Jefferson Thomas Jefferson 1772-09-27 1836-10-10 64.0 NaT
3 Husband Dolley Payne James Madison 1768-05-20 1849-07-12 81.0 1794-09-14
4 Husband Elizabeth Kortright James Monroe 1768-06-30 1830-09-23 62.0 1786-02-16
5 Husband Louisa Catherine Johnson John Quincy Adams 1775-02-12 1852-05-15 77.0 1797-07-26
6 Uncle Emily Donelson Andrew Jackson 1807-06-01 1836-12-19 29.0 NaT
7 Father-in-law Sarah Yorke Andrew Jackson 1803-07-16 1887-08-23 84.0 NaT
8 Father-in-law Sarah Angelica Singleton Martin Van Buren 1818-02-13 1877-12-29 59.0 NaT
9 Husband Anna Tuthill Symmes William Henry Harrison 1775-07-25 1864-02-25 88.0 1795-11-22
10 Father-in-law Jane Irwin William Henry Harrison 1804-07-23 1846-05-11 41.0 NaT
11 Husband Letitia Christian John Tyler 1790-11-12 1842-09-10 51.0 1813-03-29
12 Father-in-law Elizabeth Priscilla Cooper John Tyler 1816-06-14 1889-12-29 73.0 NaT
13 Husband Julia Gardiner John Tyler 1820-05-04 1889-07-10 69.0 1844-06-26
14 Husband Sarah Childress James K. Polk 1803-09-04 1891-08-14 87.0 1824-01-01
15 Husband Margaret Mackall Smith Zachary Taylor 1788-09-21 1852-08-14 63.0 1810-06-21
16 Husband Abigail Powers Millard Fillmore 1798-03-13 1853-03-30 55.0 1826-02-05
17 Husband Jane Means Appleton Franklin Pierce 1806-03-12 1863-12-02 57.0 1834-11-19
18 Uncle Harriet Rebecca Lane James Buchanan 1830-05-09 1903-07-03 73.0 NaT
19 Husband Mary Ann Todd Abraham Lincoln 1818-12-13 1882-07-16 63.0 1842-11-04
20 Husband Eliza McCardle Andrew Johnson 1810-10-04 1876-01-15 65.0 1827-05-17
21 Husband Julia Boggs Dent Ulysses S. Grant 1826-01-26 1902-12-14 76.0 1848-08-22
22 Husband Lucy Ware Webb Rutherford B. Hayes 1831-08-28 1889-06-25 57.0 1852-12-30
23 Husband Lucretia Rudolph James A. Garfield 1832-04-19 1918-03-14 85.0 1858-11-11
24 Brother Mary Arthur McElroy Chester A. Arthur 1841-07-05 1917-01-08 75.0 NaT
25 Brother Rose Elizabeth Cleveland Grover Cleveland 1846-06-13 1918-11-22 72.0 NaT
26 Husband Frances Clara Folsom Grover Cleveland 1864-07-21 1947-10-29 83.0 1886-06-02
27 Husband Caroline Lavinia Scott Benjamin Harrison 1832-10-01 1892-10-25 60.0 1853-10-20
28 Father Mary Scott Harrison Benjamin Harrison 1858-04-03 1930-10-28 72.0 NaT
29 Husband Frances Clara Folsom Grover Cleveland 1864-07-21 1947-10-29 83.0 1886-06-02
30 Husband Ida Saxton William McKinley 1847-06-08 1907-05-26 59.0 1871-01-25
31 Husband Edith Kermit Carow Theodore Roosevelt 1861-08-06 1948-09-30 87.0 1886-12-02
32 Husband Helen Louise Herron William H. Taft 1861-06-02 1943-05-22 81.0 1886-06-19
33 Husband Ellen Louise Axson Woodrow Wilson 1860-05-15 1914-08-06 54.0 1885-06-24
34 Father Margaret Woodrow Wilson Woodrow Wilson 1886-04-16 1944-02-12 57.0 NaT
35 Husband Edith Bolling Woodrow Wilson 1872-10-15 1961-12-28 89.0 1915-12-18
36 Husband Florence Mabel Kling Warren G. Harding 1860-08-15 1924-11-21 64.0 1891-07-08
37 Husband Grace Anna Goodhue Calvin Coolidge 1879-01-03 1957-07-08 78.0 1905-10-04
38 Husband Lou Henry Herbert Hoover 1874-03-29 1944-01-07 69.0 1899-02-10
39 Husband Anna Eleanor Roosevelt Franklin D. Roosevelt 1884-10-11 1962-11-07 78.0 1905-03-17
40 Husband Elizabeth Virginia "Bess" Wallace Harry S. Truman 1885-02-13 1982-10-18 97.0 1919-06-28
41 Husband Mamie Geneva Doud Dwight D. Eisenhower 1896-11-14 1979-11-01 82.0 1916-07-01
42 Husband Jacqueline "Jackie" Lee Bouvier John F. Kennedy 1929-07-28 1994-05-19 64.0 1953-09-12
43 Husband Claudia Alta "Lady Bird" Taylor Lyndon B. Johnson 1912-12-22 2007-07-11 94.0 1934-11-17
44 Husband Thelma "Pat" Catherine Ryan Richard Nixon 1912-03-16 1993-06-22 81.0 1940-06-21
45 Husband Elizabeth "Betty" Ann Bloomer Gerald Ford 1918-04-08 2011-07-08 93.0 1948-10-15
46 Husband Eleanor Rosalynn Smith Jimmy Carter 1927-08-18 NaT NaN 1927-08-18
47 Husband Nancy Davis Ronald Reagan 1921-07-06 2016-03-06 94.0 1952-03-04
48 Husband Barbara Pierce George H. W. Bush 1925-06-08 2018-04-17 92.0 1945-01-06
49 Husband Hillary Diane Rodham Bill Clinton 1947-10-26 NaT NaN 1947-10-26
50 Husband Laura Lane Welch George W. Bush 1946-11-04 NaT NaN 1946-11-04
51 Husband Michelle LaVaughn Robinson Barack Obama 1964-01-17 NaT NaN 1964-01-17
52 Husband Melanija Knavs Donald Trump 1970-04-26 NaT NaN 2005-01-22

¶

1.3 Historical Presidents Physical Data (More) : pres_df_3

In [43]:
# Set option to display all columns
pd.set_option('display.max_columns', None)
pres_df_3.head()
Out[43]:
order name height_cm height_in weight_kg weight_lb body_mass_index body_mass_index_range birth_day birth_month birth_year birth_date birthplace birth_state death_day death_month death_year death_date death_age astrological_sign term_begin_day term_begin_month term_begin_year term_begin_date term_end_day term_end_month term_end_year term_end_date presidency_begin_age presidency_end_age political_party corrected_iq
0 1 George Washington 188 74.0 79.4 175 22.5 Normal 22 2 1732 22-02-1732 Westmoreland County Virginia 14.0 12.0 1799.0 14-12-1799 67.0 Pisces 30 4 1789 30-04-1789 4.0 3.0 1797.0 04-03-1797 57 65.0 Unaffiliated 140.0
1 2 John Adams 170 67.0 83.9 185 29.0 Overweight 30 10 1735 30-10-1735 Braintree Massachusetts 4.0 7.0 1826.0 04-07-1826 90.0 Scorpio 4 3 1797 04-03-1797 4.0 3.0 1801.0 04-03-1801 61 65.0 Federalist 155.0
2 3 Thomas Jefferson 189 74.5 82.1 181 23.0 Normal 13 4 1743 13-04-1743 Shadwell Virginia 4.0 7.0 1826.0 04-07-1826 83.0 Aries 4 3 1801 04-03-1801 4.0 3.0 1809.0 04-03-1809 57 65.0 Democratic-Republican 160.0
3 4 James Madison 163 64.0 55.3 122 20.8 Normal 16 3 1751 16-03-1751 Port Conway Virginia 28.0 6.0 1836.0 28-06-1836 85.0 Pisces 4 3 1809 04-03-1809 4.0 3.0 1817.0 04-03-1817 57 65.0 Democratic-Republican 160.0
4 5 James Monroe 183 72.0 85.7 189 25.6 Overweight 28 4 1758 28-04-1758 Monroe Hall Virginia 4.0 7.0 1831.0 04-07-1831 73.0 Taurus 4 3 1817 04-03-1817 4.0 3.0 1825.0 04-03-1825 58 66.0 Democratic-Republican 139.0
In [44]:
pres_df_3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 32 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order                  45 non-null     object 
 1   name                   45 non-null     object 
 2   height_cm              45 non-null     int64  
 3   height_in              45 non-null     float64
 4   weight_kg              45 non-null     float64
 5   weight_lb              45 non-null     int64  
 6   body_mass_index        45 non-null     float64
 7   body_mass_index_range  45 non-null     object 
 8   birth_day              45 non-null     int64  
 9   birth_month            45 non-null     int64  
 10  birth_year             45 non-null     int64  
 11  birth_date             45 non-null     object 
 12  birthplace             45 non-null     object 
 13  birth_state            45 non-null     object 
 14  death_day              39 non-null     float64
 15  death_month            39 non-null     float64
 16  death_year             39 non-null     float64
 17  death_date             39 non-null     object 
 18  death_age              39 non-null     float64
 19  astrological_sign      45 non-null     object 
 20  term_begin_day         45 non-null     int64  
 21  term_begin_month       45 non-null     int64  
 22  term_begin_year        45 non-null     int64  
 23  term_begin_date        45 non-null     object 
 24  term_end_day           44 non-null     float64
 25  term_end_month         44 non-null     float64
 26  term_end_year          44 non-null     float64
 27  term_end_date          44 non-null     object 
 28  presidency_begin_age   45 non-null     int64  
 29  presidency_end_age     44 non-null     float64
 30  political_party        45 non-null     object 
 31  corrected_iq           42 non-null     float64
dtypes: float64(12), int64(9), object(11)
memory usage: 11.4+ KB
In [45]:
pres_df_1
Out[45]:
President Born Post-presidency timespan Died Age Start Date of presidency Start Age of presidency End of Presidency Age End of Presidency Date
0 George Washington 1732-02-22 1015 days 00:00:00 1799-12-14 24750 days 1789-04-30 20872 days 23735 days 1797-03-04
1 John Adams 1735-10-30 9247 days 00:00:00 1826-07-04 33097 days 1797-03-04 22390 days 23850 days 1801-03-04
2 Thomas Jefferson 1743-04-13 6327 days 00:00:00 1826-07-04 30377 days 1801-03-04 21130 days 24050 days 1809-03-04
3 James Madison 1751-03-16 7051 days 00:00:00 1836-06-28 31129 days 1809-03-04 21158 days 24078 days 1817-03-04
4 James Monroe 1758-04-28 2312 days 00:00:00 1831-07-04 26712 days 1817-03-04 21480 days 24400 days 1825-03-04
5 John Quincy Adams 1767-07-11 6926 days 00:00:00 1848-02-23 29427 days 1825-03-04 21041 days 22501 days 1829-03-04
6 Andrew Jackson 1767-03-15 3016 days 00:00:00 1845-06-08 28555 days 1829-03-04 22619 days 25539 days 1837-03-04
7 Martin Van Buren 1782-12-05 7807 days 00:00:00 1862-07-24 29066 days 1837-03-04 19799 days 21259 days 1841-03-04
8 William H. Harrison 1773-02-09 Died in Office 1841-04-04 24874 days 1841-03-04 24843 days 24874 days 1841-04-04
9 John Tyler 1790-03-29 6160 days 00:00:00 1862-01-18 26210 days 1841-04-04 18621 days 20050 days 1845-03-04
10 James K. Polk 1795-11-02 103 days 00:00:00 1849-06-15 19570 days 1845-03-04 18007 days 19467 days 1849-03-04
11 Zachary Taylor 1784-11-24 Died in Office 1850-07-09 23952 days 1849-03-04 23460 days 23952 days 1850-07-09
12 Millard Fillmore 1800-01-07 7669 days 00:00:00 1874-03-08 27070 days 1850-07-09 18433 days 19401 days 1853-03-04
13 Franklin Pierce 1804-11-23 4598 days 00:00:00 1869-10-08 23679 days 1853-03-04 17621 days 19081 days 1857-03-04
14 James Buchanan 1791-04-23 2644 days 00:00:00 1868-06-01 28144 days 1857-03-04 24040 days 25500 days 1861-03-04
15 Abraham Lincoln 1809-02-12 Died in Office 1865-04-15 20502 days 1861-03-04 19000 days 20502 days 1865-04-15
16 Andrew Johnson 1808-12-29 2339 days 00:00:00 1875-07-31 24304 days 1865-04-15 20547 days 21965 days 1869-03-04
17 Ulysses S. Grant 1822-04-27 3061 days 00:00:00 1885-07-23 23082 days 1869-03-04 17101 days 20021 days 1877-03-04
18 Rutherford B. Hayes 1822-10-04 4334 days 00:00:00 1893-01-17 25655 days 1877-03-04 19861 days 21321 days 1881-03-04
19 James A. Garfield 1831-11-19 Died in Office 1881-09-19 18189 days 1881-03-04 17990 days 18189 days 1881-09-19
20 Chester A. Arthur 1829-10-05 624 days 00:00:00 1886-11-18 20849 days 1881-09-19 18964 days 20225 days 1885-03-04
21 Grover Cleveland 1837-03-18 1460 days 00:00:00 1908-06-24 26013 days 1885-03-04 17506 days 18966 days 1889-03-04
22 Benjamin Harrison 1833-08-20 2929 days 00:00:00 1901-03-13 24660 days 1889-03-04 20271 days 21731 days 1893-03-04
23 Grover Cleveland 1837-03-18 4127 days 00:00:00 1908-06-24 26013 days 1893-03-04 20426 days 21886 days 1897-03-04
24 William McKinley 1843-01-29 Died in Office 1901-09-14 21398 days 1897-03-04 19744 days 21398 days 1901-09-14
25 Theodore Roosevelt 1858-10-27 3593 days 00:00:00 1919-01-06 21971 days 1901-09-14 15652 days 18378 days 1909-03-04
26 William H. Taft 1857-09-15 6209 days 00:00:00 1930-03-08 26454 days 1909-03-04 18785 days 20245 days 1913-03-04
27 Woodrow Wilson 1856-12-28 1066 days 00:00:00 1924-02-03 24492 days 1913-03-04 20506 days 23426 days 1921-03-04
28 Warren G. Harding 1865-11-02 Died in Office 1923-08-02 21078 days 1921-03-04 20197 days 21078 days 1923-08-02
29 Calvin Coolidge 1872-07-04 1402 days 00:00:00 1933-01-05 22085 days 1923-08-02 18644 days 20683 days 1929-03-04
30 Herbert Hoover 1874-08-10 11545 days 00:00:00 1964-10-20 32921 days 1929-03-04 19916 days 21376 days 1933-03-04
31 Franklin D. Roosevelt 1882-01-30 Died in Office 1945-04-12 23067 days 1933-03-04 18648 days 23067 days 1945-04-12
32 Harry S. Truman 1884-05-08 7276 days 00:00:00 1972-12-26 32352 days 1945-04-12 22239 days 25077 days 1953-01-20
33 Dwight D. Eisenhower 1890-10-14 2987 days 00:00:00 1969-03-28 28635 days 1953-01-20 22728 days 25648 days 1961-01-20
34 John F. Kennedy 1917-05-29 Died in Office 1963-11-22 16967 days 1961-01-20 15931 days 16967 days 1963-11-22
35 Lyndon B. Johnson 1908-08-27 1462 days 00:00:00 1973-01-22 23508 days 1963-11-22 20162 days 22046 days 1969-01-20
36 Richard Nixon 1913-01-09 7191 days 00:00:00 1994-04-22 29668 days 1969-01-20 20451 days 22477 days 1974-08-09
37 Gerald Ford 1913-07-14 10925 days 00:00:00 2006-12-26 34110 days 1974-08-09 22291 days 23185 days 1977-01-20
38 Jimmy Carter 1924-10-01 14045 days 00:00:00 NaT 34596 days 1977-01-20 19091 days 20551 days 1981-01-20
39 Ronald Reagan 1911-02-06 5612 days 00:00:00 2004-06-05 34065 days 1981-01-20 25534 days 28454 days 1989-01-20
40 George H. W. Bush 1924-06-12 9439 days 00:00:00 2018-11-30 34481 days 1989-01-20 23582 days 25042 days 1993-01-20
41 Bill Clinton 1946-08-19 6745 days 00:00:00 NaT 26609 days 1993-01-20 16944 days 19864 days 2001-01-20
42 George W. Bush 1946-07-06 3825 days 00:00:00 NaT 26653 days 2001-01-20 19908 days 22828 days 2009-01-20
43 Barack Obama 1961-08-04 905 days 00:00:00 NaT 21149 days 2009-01-20 17324 days 20244 days 2017-01-20
44 Donald J. Trump 1946-06-14 NaT NaT 25770 days 2017-01-20 25770 days 27232 days 2021-01-20
45 Joe Biden 1942-11-20 NaT NaT 28531 days 2021-01-20 28167 days 29627 days 2024-11-05

After analyzing pres_df_3, we identified the following tasks:

  1. Delete the redundant columns: birth_day, birth_month, birth_year, birth_date, death_day, death_month, death_year, death_date, term_begin_day, term_begin_month, term_begin_year, term_begin_date, term_end_day, term_end_month, term_end_year, term_end_date.
  2. Change the names of columns to more meaningful ones.
  3. Keep the "order" column since it contains the sequence of presidents.
  4. Convert the data types of all columns to the correct types.
  5. Correct the "name" column by using the known names of the presidents.
In [46]:
# Task 1: Delete the redundant columns
columns_to_drop = ['birth_day', 'birth_month', 'birth_year', 'birth_date',
                   'death_day', 'death_month', 'death_year', 'death_date',
                   'term_begin_day', 'term_begin_month', 'term_begin_year', 'term_begin_date',
                   'term_end_day', 'term_end_month', 'term_end_year', 'term_end_date']

pres_df_3.drop(columns=columns_to_drop, inplace=True)
In [47]:
# Task 2: Rename the columns
column_mapping = {
    'name': 'President'
}
pres_df_3.rename(columns=column_mapping, inplace=True)
In [48]:
# Task 4: Convert data types to the correct types and handle missing data

# Convert 'order' column to integer
pres_df_3['order'] = pd.to_numeric(pres_df_3['order'], errors='coerce', downcast='integer')

# Convert 'death_age' column to float (if applicable)
pres_df_3['death_age'] = pres_df_3['death_age'].astype(float)

# Convert 'presidency_end_age' column to float
pres_df_3['presidency_end_age'] = pres_df_3['presidency_end_age'].astype(float)

# Convert 'corrected_iq' column to float (if applicable)
pres_df_3['corrected_iq'] = pres_df_3['corrected_iq'].astype(float)
In [49]:
pres_df_3.head()
Out[49]:
order President height_cm height_in weight_kg weight_lb body_mass_index body_mass_index_range birthplace birth_state death_age astrological_sign presidency_begin_age presidency_end_age political_party corrected_iq
0 1.0 George Washington 188 74.0 79.4 175 22.5 Normal Westmoreland County Virginia 67.0 Pisces 57 65.0 Unaffiliated 140.0
1 2.0 John Adams 170 67.0 83.9 185 29.0 Overweight Braintree Massachusetts 90.0 Scorpio 61 65.0 Federalist 155.0
2 3.0 Thomas Jefferson 189 74.5 82.1 181 23.0 Normal Shadwell Virginia 83.0 Aries 57 65.0 Democratic-Republican 160.0
3 4.0 James Madison 163 64.0 55.3 122 20.8 Normal Port Conway Virginia 85.0 Pisces 57 65.0 Democratic-Republican 160.0
4 5.0 James Monroe 183 72.0 85.7 189 25.6 Overweight Monroe Hall Virginia 73.0 Taurus 58 66.0 Democratic-Republican 139.0

¶

1.4 U.S. Presidents Dataset (1) : pres_df_4

In [50]:
pres_df_4
Out[50]:
No. Name Birthplace Birthday Life Height Children Religion Higher Education Occupation Military Service Term Party Vice President Previous Office Economy Foreign Affairs Military Activity Other Events Legacy
0 1 George Washington Pope's Creek, VA 22-Feb 1732-1799 1.88 0 Episcopalian None Plantation Owner, Soldier Commander-in-Chief of the Continental Army in... 1789-1797 None, Federalist John Adams Commander-in-Chief [' Hamilton established BUS', '1792 Coinage Ac... ['1793 Neutrality in the France-Britain confli... ['1794 Whiskey Rebellion'] ['1791 Bill of Rights', '1792 Post Office foun... He is universally regarded as one of the great...
1 2 John Adams Braintree, MA 30-Oct 1735-1826 1.70 5 Unitarian Harvard Lawyer, Farmer none 1797-1801 Federalist Thomas Jefferson 1st Vice President of USA ['1798 Progressive land value tax of up to 1% ... ['1797 the XYZ Affair: a bribe of French agent... ['1798–1800 The Quasi war. Undeclared naval wa... ['1798 Alien & Sedition Act to silence critics... One of the most experienced men ever to become...
2 3 Thomas Jefferson Goochland County, VA 13-Apr 1743-1826 1.89 6 unaffiliated Christian College of William and Mary Inventor,Lawyer, Architect Colonel of Virginia militia (without real mili... 1801-1809 Democratic-Republican Aaron Burr, George Clinton 2nd Vice President of USA ['1807 Embargo Act forbidding foreign trade in... ['1805 Peace Treaty with Tripoli. Piracy stopp... ['1801-05 Naval operation against Tripoli and ... ['1803 The Louisiana purchase', '1804 12th Ame... Probably the most intelligent man ever to occ...
3 4 James Madison Port Conway, VA 16-Mar 1751-1836 1.63 0 Episcopalian Princeton Plantation Owner, Lawyer Colonel of Virginia militia (without real mili... 1809-1817 Democratic-Republican George Clinton, Elbridge Gerry Secretary of State [' The first U.S. protective tariff was impose... ['1814 The Treaty of Ghent ends the War of 1812'] ['1811 Tippecanoe battle (Harrison vs. Chief T... ['1811 Cumberland Road construction starts (fi... His leadership in the War of 1812 was particul...
4 5 James Monroe Monroe Hall, VA 28-Apr 1758-1831 1.83 2 Episcopalian College of William and Mary Plantation Owner, Lawyer Major of the Continental Army 1817-1825 Democratic-Republican Daniel Tompkins Secretary of War ['1819 Panic of 1819 (too much land speculatio... ['1823 Monroe Doctrine', '1818 49th parallel s... ['1817 1st Seminole war against Seminole India... ['1819 Florida ceded to US', "1820 Missouri Co... His presidency contributed to national defense...
5 6 John Quincy Adams Braintree, MA 11-Jul 1767-1848 1.70 4 Unitarian Harvard Lawyer, Diplomat none 1825-1829 Democratic-Republican John Calhoun Secretary of State [' "Internal improvements" program (roads, por... ['Unsuccessful attempt to purchase Texas from ... ['None'] [' Accused for "corrupt bargain" to obtain Cla... He had been an excellent Secretary of State, m...
6 7 Andrew Jackson Waxhaw, NC 15-Mar 1767-1845 1.85 0 Presbyterian None Soldier, Lawyer Major General of U.S. Army 1829-1837 Democratic John Calhoun, Martin van Buren Military Governor of Florida ['1832 The Bank War. Veto for rechartering of ... [' Texas wins independence'] ['1836 Alamo. 6000 Mexicans defeat 190 America... ['1830 Indian Removal Act', "1832 South Caroli... Historians see in him both the best and the wo...
7 8 Martin van Buren Kinderhook, NY 5-Dec 1782-1862 1.68 4 Dutch Reformed None Lawyer none 1837-1841 Democratic Richard Johnson 8th Vice President of USA ['1837 The Panic of 1837. Financial crisis & d... [' Recognition of Republic of Texas; annex avo... ['1838 2nd Seminole war against Seminole India... ['1838 "The Trail of Tears". Indians’ relocati... An able man, but always regarded more as a shr...
8 9 William H. Harrison Charles City County, VA 9-Feb 1773-1841 1.73 1 Episcopalian Hampden-Sydney College Soldier Major General of U.S. Army 1841 Whig John Tyler Minister to Colombia ['None'] ['None'] ['None'] ['1841 Delivered the longest inaugural address... none
9 10 John Tyler Charles City County, VA 29-Mar 1790-1862 1.83 1 Episcopalian College of William and Mary Lawyer Captain of Virginia militia 1841-1845 Whig, No Party none 10th Vice President of USA ['Economic crisis initiated by the Panic of 18... ['1842 Webster–Ashburton Treaty settles border... ['1842 End of the 2nd Seminole war'] ['1841 His cabinet resigned after he vetoed ba... His presidency is held in low esteem but score...
10 11 James K. Polk Pineville, NC 2-Nov 1795-1849 1.73 0 Presbyterian University of North Carolina Lawyer, Plantation Owner Colonel of Tennessee militia 1845-1849 Democratic George Dallas Governor of Tennessee ['1846 Walker Tariff. Taxes reduced and fixed'] ['1846 Agreement with Britain over Oregon. Bot... ['1846 American-Mexican war. Mexico city captu... ['1846 A large crack in the Liberty Bell.', '1... Polk added more territory than had any other p...
11 12 Zachary Taylor Barboursville, VA 24-Nov 1784-1850 1.73 6 Episcopalian None Soldier Major General U.S. Army 1849-1850 Whig Millard Fillmore Major General, U.S. Army ['None'] ['1850 Clayton–Bulwer Treaty with Britain: no ... ['None'] [' The question of extending slavery to the ne... His blunt manner and unsophisticated style han...
12 13 Millard Fillmore Moravia, NY 7-Jan 1800-1874 1.75 2 Unitarian None Lawyer Major - Union Continentals (Home Guard) , NY m... 1850-1853 Whig none 12th Vice President of USA ['Expanding trade while limiting American comm... [' Commodore Matthew C. Perry was sent to open... ['None'] ['1850 Compromise of 1850 and Fugitive Slave A... Honest and hardworking but a pompous, colorles...
13 14 Franklin Pierce Hillsborough, NH 23-Nov 1804-1869 1.78 3 Episcopalian Bowdoin College Lawyer Brigadier Gen. of Volunteers 1853-1857 Democratic William King Senator (NH) 1837-42 ['Reforming the Treasury'] ['1854 Ostend Manifesto. Crisis over a leaked ... ['None'] ['1853 Gadsden Purchase. Land from Mexico.', '... As president, he made many divisive decisions ...
14 15 James Buchanan Cove Gap, PA 23-Apr 1791-1868 1.83 0 Presbyterian Dickinson College Lawyer, Diplomat Private - U.S. Army 1857-1861 Democratic John Breckinridge Minister to the UK ['1857 Tariff of 1857. Reduction. North compla... ['Strengthening the influence of the United St... ['1857 Utah War: 2500 soldiers were sent to ou... ['1857 Dred Scott decision: States can decide ... His administration was dominated by fighting b...
15 16 Abraham Lincoln Hardin County, KY 12-Feb 1809-1865 1.93 4 unaffiliated Christian None Land Surveyor, Lawyer Captain of State militia 1861-1865 Republican Hannibal Hamlin, Andrew Johnson Congressman (Illinois) ['None'] ['Lincoln left the diplomatic issues in the ha... ['1863-1865 Civil War'] ['1863 Emancipation Proclamation, freeing slav... The greatest U.S. President. He won the Civil ...
16 17 Andrew Johnson Raleigh, NC 29-Dec 1808-1875 1.78 5 unaffiliated Christian None Tailor Brigadier General of Volunteers - military gov... 1865-1869 National Union none 16th Vice President of USA ['Reconstruction plan in the south'] ['1867 Treaty with Russia.'] ['None'] ['1865 Amnesty', '1867 Reconstruction Act & Of... His conflict with Congress and his impeachment...
17 18 Ulysses S. Grant Point Pleasant, OH 27-Apr 1822-1885 1.73 4 Methodist U.S. Military Academy Soldier (General of the Army) General of the Army 1869-1877 Republican Schuyler Colfax, Henry Wilson Commanding General of Army ['1873 Depression & financial crisis', ' Resum... ['1871 Treaty of Washington', '1875 Free trade... ['1876 Battle of the Little Bighorn. Gen. Cust... ['1871 Civil Service', '1870-71 Enforcement Ac... An excellent general but a mediocre politician...
18 19 Rutherford Hayes Delaware, OH 4-Oct 1822-1893 1.73 8 Methodist Kenyon College, Harvard Lawyer Major General of Volunteers 1877-1881 Republican William Wheeler Governor of Ohio ['1878 Bland-Allison Act - Treasury buys silve... ['1877 Granted the Army the power to pursue ba... ['1877 Bear Paw Battle against Nez Perce India... ['1877 Reconstruction end. Army withdrew from ... An effective president, ending military occup...
19 20 James Garfield Moreland Hills, OH 19-Nov 1831-1881 1.83 7 Disciples of Christ Williams College School Teacher, Minister, Soldier Major General of Volunteers 1881 Republican Chester Arthur Congressman (Ohio) ['1881 Refinance of national debt'] ['Call for a Pan-American conference to mediat... ['None'] ['1881 On July 2, he was shot by Charles Juliu... In the 4 months before he was shot, he did not...
20 21 Chester Arthur Fairfield, VT 5-Oct 1829-1886 1.88 3 Episcopalian Union College Customs Collector of NY port Quartermaster General of New York State militia 1881-1885 Republican none 20th Vice President of USA ['1885 Tariff of 1875 continued protectionist ... [' Treaty with Nicaragua to build a canal viol... ['Start of the "Steel Navy"'] ['1883 Pendleton Act: Civil hiring on merit'] Despite his reputation as a leading spoilsmen ...
21 22 Grover Cleveland Caldwell, NJ 18-Mar 1837-1908 1.80 5 Disciples of Christ None Sheriff, Lawyer, Teacher none 1885-1889 Democratic Thomas Hendricks Governor of New York [' Ended coinage based on silver', '1888 Mills... [' Refused to promote the previous administrat... ['1886 Apache leader Geronimo was chased & su... ['1886 Statue of Liberty', ' Curtailed largess... He won praise for his honesty, independence, i...
22 23 Benjamin Harrison North Bend, OH 20-Aug 1833-1901 1.68 3 Presbyterian Miami University Lawyer, Journalist Brigadier General of Vol. 1889-1893 Republican Levi Morton Senator (Indiana) ['1890 Pension Act - money to the veterans', '... ['1889 Formation of the Pan-American union'] ['1890 "Wounded knee" massacre. 150 Sioux Indi... ['1889 Opening of Oklahoma to 20,000 settlers'... He was an effective leader but the economy de...
23 24 Grover Cleveland Caldwell, NJ 18-Mar 1837-1908 1.80 5 Presbyterian None Sheriff, Lawyer, Teacher none 1893-1897 Democratic Adlai Stevenson 22nd President of USA ['1893 Panic of 1893 and depression.', '1893 S... ['1895 Controversy with Britain over Venezuela... ['First ships of a navy capable of offensive a... ['1893 Pullman strike.'] His reforms made him an icon for conservatives...
24 25 William McKinley Niles, OH 29-Jan 1843-1901 1.70 2 Methodist Allegheny College,Albany Law Lawyer Brevet Major of Volunteers in Civil War 1897-1901 Republican Garret Hobart , Th. Roosevelt Governor of Ohio ['1897 Dingley Tariff. Highest ever.', '1900 G... ['1899 Treaty of Paris. U.S. becomes a colonia... ['1898 Sinking of USS Maine', '1898 Spanish-US... ['1898 Yellow Journalism (Hyped Maine)', '1898... His leadership and his actions affected profou...
25 26 Theodore Roosevelt New York City, NY 27-Oct 1858-1919 1.78 6 Dutch Reformed Harvard, Columbia Public Official, Rancher, Author Colonel 1901-1909 Republican Charles Fairbanks 25th Vice President of USA ['1907 Panic of 1907 ("Roosevelt Panic")', '19... ['1903 Orchestrated Panama independence. Panam... [' In spite of his militaristic attitudes, pea... [' Conservation becomes an issue. Creation of ... The first modern American president and one of...
26 27 William Taft Cincinnati, OH 15-Sep 1857-1930 1.83 3 Unitarian University of Cincinnati, Yale Judge, Dean of Law School none 1909-1913 Republican James Sherman 10th Chief Justice of USA ['1909 Payne-Aldrich Tariff. Unpopular. Duties... [" 'Dollar Diplomacy' ; State dept. coordinate... ['1912 2500 troops were sent to Nicaragua to p... [' Record antitrust suits', '1912 New states: ... A good administrator but without exceptional p...
27 28 Woodrow Wilson Staunton, VA 28-Dec 1856-1924 1.80 3 Presbyterian Princeton, J. Hopkins Professor, Political scientist none 1913-1921 Democratic Thomas Marshall Governor of New Jersey ['1913 Federal Reserve Act', '1913 Underwood T... ['1919 Treaty of Versailles after WW I, "14 po... ['1915 Occupation of Dominican Rep.', '1916 US... ['1916 Child labor curtailed', '1916 Federal F... Effective leadership in instituting a progress...
28 29 Warren Harding Blooming Grove, OH 2-Nov 1865-1923 1.83 0 Baptist Ohio Central College Newspaper Publisher/Editor none 1921-1923 Republican Calvin Coolidge Senator (Ohio) [' Tax cuts for the rich and the end of antitr... ['1921 Knox–Porter Resolution: official end o... ['1923 Posey War (a small conflict with Americ... ['1921 Federal Highway Act - the age of the "... Presided over one of the most corrupt administ...
29 30 Calvin Coolidge Plymouth, VT 4-Jul 1872-1933 1.78 2 Congregationalist Amherst College Lawyer, Banker none 1923-1929 Republican Charles Dawes 29th Vice President of USA [' "Roaring Twenties"-Rapid economic growth', ... ['1928 Kellogg-Briand Pact (renouncement of wa... ['1928 Clark Memorandum - concerned the United... ['1924 Immigration Act limits immigrants from ... A competent administrator and a shrewd politic...
30 31 Herbert C. Hoover West Branch, IA 10-Aug 1874-1964 1.80 2 Society of Friends (Quaker) Stanford University Engineer none 1929-1933 Republican Charles Curtis Secretary of Commerce ['1929 Stock Market Crash', ' The Great Depres... ['1932 Stimson Doctrine: US would not recogniz... ['He thrice threatened intervention in the Dom... ['1932 Reconstruction Finance Corporation to p... A qualified executive who failed to provide ef...
31 32 Franklin Roosevelt Hyde Park, NY 30-Jan 1882-1945 1.88 6 Episcopalian Harvard, Columbia Lawyer none 1933-1945 Democratic John Garner , Henry Wallace, Truman Governor of New York ['1933 Glass-Steagall Act to protect bank acco... ['1935 Lend-Lease Act, allowing US to aid All... ['1941 Pearl Harbor', '1941-45 World War II'] ['1933 First 100 days legislation frenzy', '19... The longest and one of the most acclaimed pres...
32 33 Harry S Truman Lamar, MO 8-May 1884-1972 1.75 1 Baptist None Farmer, Men'S Clothing Retailer Colonel - U.S. Army 1945-1953 Democratic Alben Barkley 34th Vice President of USA ['1946 Veto on Taft-Hartley Act regulating st... ['1945 Potsdam Conference', '1947 Truman Doctr... ['1945 Atomic bombs', '1945 End of WW II', '19... ['1945 Fair Deal: health care, civil rights et... Unexpectedly a very efficient replacement. Sha...
33 34 Dwight Eisenhower Denison, TX 14-Oct 1890-1969 1.78 2 Presbyterian U.S. Military Academy Soldier, General General of the Army 1953-1961 Republican Richard Milhous Nixon Sup. Allied Commander Europe ['1956 Federal-Aid Highway Act - National high... ['1954 Geneva Conference (SEATO)', '1956 Suez ... ['1953 End of Korean War', '1958 USA troops in... [' Alaska and Hawaii admitted as states', '195... After a glorious military career, as president...
34 35 John F. Kennedy Brookline, MA 29-May 1917-1963 1.83 3 Roman Catholic Harvard, Stanford U.S. Navy Officer, Author Lieutenant - U.S. Navy 1961-1963 Democratic Lyndon Johnson Senator ( MA) [' “New Frontier”: Tax reduction and other ref... ['1961 Vienna Summit', '1961 Alliance for Pro... ['1963 “Advisers” attached to the South Vietn... ['1961 Peace Corps program', '1961 "Moon race"... His youth, vigor, and style brought a fresh ai...
35 36 Lyndon Johnson Stonewall, TX 27-Aug 1908-1973 1.92 2 Disciples of Christ Texas State, Georgetown Teacher, Public Official Commander - U.S. Navy 1963-1969 Democratic Hubert Humphrey 37th Vice President of USA ['1964 Revenue Act & Economic Opportunity Act... ['1968 Paris Peace Talks'] ['1965 Gulf of Tonkin Resolution - president g... ['1964 The Civil Rights Act', '1964 Great Soci... Passed his Great Society domestic programs and...
36 37 Richard Nixon Yorba Linda, CA 9-Jan 1913-1994 1.80 2 Society of Friends (Quaker) Whittier College, Duke Law Lawyer, Public Official Commander - U.S. Navy 1969-1974 Republican Spiro Agnew , Gerald R. Ford 36th Vice President of USA ['1973 OPEC embargo & Oil crisis'] ['1971 Nixon visits China; "Ping Pong diplomac... ['1970 Expansion of war to Cambodia and Laos'... ['1969 Moon landing', '1970 Environment Act', ... Although he ended U.S. involvement in the Viet...
37 38 Gerald R. Ford Omaha, NE 14-Jul 1913- 2006 1.83 4 Episcopalian University of Michigan, Yale Lawyer, Public Official Lt. Commander -U.S. Navy 1974-1977 Republican Nelson Rockefeller 40th Vice President of USA [' Recession & Inflation. The worst economy si... ['1975 Evacuation of US embassy in Saigon', '1... ['1974 Official end of the Vietnam War', '1975... ['1974 Granted a pardon to Nixon.', '1975 Air... A congressional president whose historic role ...
38 39 Jimmy Carter Plains, GA 1-Oct 1924- 1.77 4 Baptist US Naval Academy Navy Officer, Peanut Farmer Lieutenant - U.S. Navy 1977-1981 Democratic Walter Mondale Governor of Georgia ['1979 Beer market deregulation', '1978 Airlin... ['1979 Camp-David Accords between Israel and E... ['1980 Revoked the Sino-American Mutual Defens... [' Pardoned Vietnam War draft evaders', ' Ene... Intelligent and hardworking but a DC outsider....
39 40 Ronald Reagan Tampico, IL 6-Feb 1911- 2004 1.85 4 Christian Church Eureka College Actor, Union leaser Captain- U.S. Army 1981-1989 Republican George H. W. Bush Governor of California [' "Reaganomics": tax cuts, gov’t downsizing',... ['1983 Strategic Defense Initiative ("Star War... ['1983 241 Marines, of a multinational force, ... ['1981 Assassination attempt by John W. Hinkle... While his aptitude for the job was often quest...
40 41 George H. W. Bush Milton, MA 12-Jun 1924- 2018 1.88 6 Episcopalian Yale University Businessman (Oil) Lieutenant-U.S. Navy 1989-1993 Republican James Danforth Quayle 43rd Vice President of USA [' Increased taxes despite his campaign promis... ['1989 Berlin Wall falls.', '1991 Dissolution ... ['1989-90 Panama invasion. Noriega arrested', ... ['1990 Americans with Disabilities Act', '1990... He took few domestic initiatives and the econo...
41 42 Bill Clinton Hope, AR 19-Aug 1946- 1.88 1 Baptist Georgetown, Yale, Oxford Lawyer, Law Lecturer none 1993-2001 Democratic Al Gore Governor of Arkansas ['1993-2001 Sustained economic growth and succ... ['1993 Oslo accords; Isr./PLO', '1995 Dayton B... ['1993 Mogadishu Battle : 2 Black Hawks down, ... ["1993 “Don't ask, don't tell”- gays in the mi... The longest period of peacetime economic expan...
42 43 George W. Bush New Haven, CT 6-Jul 1946- 1.82 2 Methodist Yale, Harvard Businessman (Oil, Baseball) Lieutenant - Air Force 2001-2009 Republican Richard Cheney Governor of Texas ['2001, 2003 Bush Tax cuts', '2008 Financial c... [' Iraq’ s "Weapons of mass destruction" hoax... ['2001 War against the Afghanistan Talibans', ... ['2001 9/11', '2001 Patriot Act', '2002 “no ch... He left as one of the least popular and most d...
43 44 Barack Obama Honolulu, HI 4-Aug 1961- 1.87 2 unaffiliated Christian Columbia, Harvard Law Professor none 2009-2017 Democratic Joseph Biden Senator (Illinois) ['2009 Economic Stimulus: Signed $787 bn for R... [" 'Leading from behind' stance in Mid East g... ['2011 Death of Osama bin Laden.', '2011 Iraq ... ['2010 Healthcare reform: Affordable Care Act ... Important changes on healthcare, education, c...
44 45 Donald Trump Queens, New York, NY 14-Jun 1946- 1.88 5 Presbyterian Fordham, Pennsylvania Real estate none 2017-2021 Republican Michael R. Pence none [' Permanent cuts to the corporate tax rate, f... [' US abandon Paris Climate Accord and WHO', '... ['2019 US Space Force is founded.'] ['2020 Covid-19 pandemic', ' Impeached twice',... Polarizing leadership style, controversial pol...
45 46 Joe Biden Scranton, PA 20-Nov 1942- 1.82 3 Roman Catholic Syracuse University College Lawyer none 2021- Democratic Kamala Harris 47th Vice President of USA ['Postal Service Reform Act of 2022', 'Signed ... ['2022-2023 Significant U.S. Military Assistan... ['Withdrawal from Afghanistan', '2022 Countert... ['Biden pledged to double climate funding to d... none
In [51]:
pres_df_4.dtypes
Out[51]:
No.                    int64
Name                  object
Birthplace            object
Birthday              object
Life                  object
Height               float64
Children               int64
Religion              object
Higher Education      object
Occupation            object
Military Service      object
Term                  object
Party                 object
Vice President        object
Previous Office       object
Economy               object
Foreign Affairs       object
Military Activity     object
Other Events          object
Legacy                object
dtype: object
In [52]:
# understand Economy, Foreign Affairs, Military Activity, Other Events, Legacy
for i in range(0,46):
    print(pres_df_4['Legacy'].loc[i])
    print()
He is universally regarded as one of the greatest figures in U.S. history. “First in war, first in peace, and first in the hearts of his country”

One of the most experienced men ever to become President. Played a major role in the movement for independence. By the end of his term, he was unpopular, respected but not beloved.

Probably  the most intelligent man ever to occupy the White House. Of broad interests and activity, he exerted an immense influence on the future of the new nation.

His leadership in the War of 1812 was particularly inept. But  the young nation emerged united and strong, and Madison  enjoyed tremendous popularity and respect during his last years.

His presidency contributed to national defense and security. The Monroe Doctrine became a landmark in American foreign policy. His time in office was called the "Era of Good Feeling".

He had been an excellent Secretary of State, maybe the best in the history of the U.S. But as a President he was not allowed by a hostile Congress to be successful.

Historians see in him both the best and the worst of the new Republic. Associated with the movement toward increased popular participation in government, the "Jacksonian democracy".

An able man, but always regarded more as a shrewd politician and a manipulator. His Presidency was a failure. It was marked by the financial crisis, the Panic of 1837. He was called "Martin Van Ruin".

none

His presidency is held in low esteem but scored a victory, the Texas annexation.  Expelled from his party while in office and without followers was powerless and yet effective and rather underrated

Polk added more territory than had any other president except Thomas Jefferson and made U.S. a coast-to-coast nation.  He was one of the greatest presidents.

His blunt manner and unsophisticated style handicapped him as president. Because of his short tenure, Taylor is not considered to have strongly influenced the U.S.

Honest and hardworking but a pompous, colorless individual who rose far beyond his ability. The Compromise of 1850 preserved the Union for a while  but destroyed his career.

As president, he made many divisive decisions which were widely criticized and earned him a reputation as one of the worst presidents in U.S. history.

His administration was dominated by fighting between pro-and antislavery forces. Few presidents have entered office with more experience, and few have so decisively failed. Probably the worst president.

The greatest U.S. President. He won the Civil War and  preserved the Union. His twin policies, emancipation of slaves and reconciliation of North and South, were his greatest legacies.

His conflict with Congress and his impeachment weakened the Presidency for decades. He is  considered one the worst American presidents. But he got Alaska. And Reconstruction was a major policy…

An excellent general but a mediocre politician. He won the Civil War, but his Presidency was rather a failure with scandals and economic depression. History has been rather unfair to him.

An  effective president, ending military occupation of southern states; reforming the civil service, putting the country back on the gold standard, and starting the Gilded Age: enormous growth with serious social unrest.

In the 4 months before he was shot, he did not accomplish much. He served the second-shortest term of any President.
He endured Congress pressure on  executive appointments.

Despite his reputation as a leading spoilsmen in American politics, he proved to be a dignified and able administrator. A little-known presidency but no duty was neglected in his tenure and no problem alarmed the nation.

He won praise for his honesty, independence, integrity, and commitment to the principles of classical liberalism. He relentlessly fought political corruption, patronage, and bossism.

 He was an effective leader but the economy deteriorated. Inflation, joblessness  and labor unrest marked his presidency. His lackluster personality made his administration seem colorless.

His reforms made him an icon for conservatives but his efforts to stem economic depression were not successful, and the conservative means he used to settle internal industrial conflicts were unpopular.

His leadership and his actions affected profoundly the future of the USA. 
His victory on the Spanish-American war transformed the Presidency into an office of world leadership.

The first modern American president and one of the most dynamic and popular. He radically reformed the government and changed the political system. One of the top 5 presidents.

A good administrator but without exceptional political and leadership skills. He failed to rise adequately to the challenges of the times, despite his many strong qualities.

Effective leadership in instituting a progressive domestic program. His foreign policies were marked by victory in World War I and passionate promotion of the League of Nations.

Presided over one of the most corrupt administrations. Very popular as president, he was later regarded as one of the worst presidents. Hardworking though. He pushed a pro-business agenda.

A competent administrator and a shrewd politician.Very popular in a period of rapid growth and prosperity but perhaps too complacent and inactive despite signs of a Depression.

A qualified executive who failed to provide effective leadership in the most severe crisis. He could not halt and manage the Great Depression. His beliefs did not allow him to take drastic steps.

The longest and one of the most acclaimed presidencies in American history. He led the United States, with absolute success, out of the Great Depression and later in victory in the World War II.

Unexpectedly a very efficient replacement. Shaped the world after WW II, more than any other man.  He led  the successful transition from wartime to peacetime economy. Unpopular in the end, today he is ranked amongst the top presidents.

After a glorious military career, as president, he negotiated the end of the Korean War and pursued moderate policies. He presided over a period of growth and prosperity, at the peak of the Cold War.

His youth, vigor, and style brought a fresh air in the presidency. He revived the New Deal and Fair Deal programs  and continued containment to prevent  spread of communism (Vietnam, Cuba). His assassination shocked the world.

Passed his Great Society domestic programs and pushed "War on Poverty". He escalated U.S. involvement in Vietnam, sending more than 500,000 troops to fight. He left US deeply divided.

Although he ended U.S. involvement in the Vietnam War and won diplomatic agreements with the Soviet Union and China, he is remembered for Watergate and as the only president who resigned from office.

A congressional president whose historic role was to mop up -effectively- the dregs of the two most damaging episodes in the history of the modern White House: Watergate  and  Vietnam.

Intelligent and hardworking but a DC outsider.He pursued foreign policy with emphasis on  human rights and peace. He lost a 2nd term because of the Panama  Canal Treaty, the prolonged Iran hostage crisis and the stagnant economy.

While his aptitude for the job was often questioned, he was always very popular. Reaganomics stimulated growth but USA became the largest debtor. His confrontational policies  with the Soviets ended the Cold War shortly after he left office.

He took few domestic initiatives and the economy had problems, but was successful in foreign affairs, deposing Panama’s dictator Noriega and fighting the Gulf War in Iraq. The Cold War was ended in his watch.

The longest period of peacetime economic expansion in USA history. A not so distant period before the WTC fell, before U.S. troops bogged down in Iraq, before recession. And also before the scandals and the bitter partisan battles of the 1990s.

He left as one of the least popular and most divisive presidents in American history. The Iraq war, the bungled response to Hurricane Katrina, the 2008 economic crisis, has brought the worst collapse in America's reputation since WW II.

 Important changes on healthcare, education, climate. finance. Obamacare was a defining issue. Economy bounced back but growth was anemic. Foreign policy left the world more insecure.

Polarizing leadership style, controversial policies, and challenges to democratic institutions, resulting in a divided nation and impeachment.

none

we noticed in pres_df_4 those things:

  1. delete the index since we already have one.
  2. delete columns that we already have: Birthplace, Birthday, Life, Height, Party,
  3. understand the value or the content of those columns:Military Service, Previous Office, Economy, Foreign Affairs, Military Activity, Other Events, Legacy

Based on the observations in pres_df_4, we can proceed with the following tasks:

In [53]:
# Task 1: Drop the existing index column
pres_df_4.drop(columns=['No.'], inplace=True)

# Task 2: Drop columns that we already have in other dataframes
pres_df_4.drop(columns=['Birthday', 'Life', 'Height'], inplace=True)

Task 3: Understand the content of the remaining columns

  1. Military Service: Information about the president's military service, if any.
  2. Previous Office: Information about any previous political offices held by the president.
  3. Economy: Information about the president's economic policies or achievements.
  4. Foreign Affairs: Information about the president's actions and policies in foreign affairs.
  5. Military Activity: Information about the president's involvement in military activities or wars.
  6. Other Events: Any other significant events or actions during the presidency.
  7. Legacy: Information about the president's lasting impact or legacy.
In [54]:
# Task 4: Rename the column "name" to "President"
pres_df_4.rename(columns={'Name': 'President'}, inplace=True)
In [55]:
pres_df_4.loc[pres_df_4['President'] == "George Washington", 'Party'] = "Unaffiliated"
pres_df_4.loc[pres_df_4['President'] == "John Tyler", 'Party'] = "Whig"

In this code, we addressed the tasks as follows:

  1. We dropped the existing index column named 'Unnamed: 0'.
  2. We removed columns that we already have in other dataframes, such as 'Birthplace', 'Birthday', 'Life', 'Height', and 'Party'.
  3. We provided descriptions of the remaining columns, such as 'Military Service', 'Previous Office', 'Economy', 'Foreign Affairs', 'Military Activity', 'Other Events', and 'Legacy'. However, to take further actions, such as data cleaning or transformation in these columns, we need to inspect the actual data in these columns.

The data in pres_df_4 should now be ready for further exploration and analysis, considering that the irrelevant columns were dropped, and additional context for the remaining columns was provided.

¶

1.5 U.S. Presidents Popular Vote Percentage Dataset (1) : pres_df_5

In [57]:
pres_df_5.head()
Out[57]:
year name party term salary position_title
0 1789 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
1 1790 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
2 1791 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
3 1792 Washington,George Unaffiliated First 25000 PRESIDENT OF THE UNITED STATES
4 1793 Washington,George Unaffiliated Second 25000 PRESIDENT OF THE UNITED STATES
In [58]:
pres_df_5.dtypes
Out[58]:
year               int64
name              object
party             object
term              object
salary             int64
position_title    object
dtype: object

We observed the following in pres_df_5, which is a stand-alone dataset and will not be merged with other datasets for analysis:

  1. The column "name" should be renamed to "President" and its format should be changed from "lastname, firstname, suffix" to the format used in other datasets.

Note: The new format for the "President" column is "[First Name] [Last Name] [Suffix]" (if applicable). First and last names can consist of multiple names, separated by a space, and any suffix (Jr., Sr., I, II, III, IV) is added at the end, separated by a comma.

In [ ]:
# Function to modify the names
def modify_name(name):
    name_parts = name.split(',')
    first_name_parts = name_parts[1].split()
    
    # Get the first name
    first_name = first_name_parts[0]
    
    # Check if the first name is composed of two names
    if len(first_name_parts) > 1:
        second_name = first_name_parts[1][0]  # Take the first letter of the second name
    else:
        second_name = ""

    last_name = name_parts[0]

    try:
        # Check for suffix
        suffix = name_parts[2]
    except IndexError:
        suffix = ""
    if len(first_name_parts) > 1:
        new_name = f"{first_name} {second_name}. {last_name}"
    else:
        new_name = f"{first_name} {last_name}"
    # Create new name format
    #new_name = f"{first_name} {second_name} {last_name} {suffix}"
    return new_name.strip()

# Apply the modify_name function to the 'name' column
pres_df_5['President'] = pres_df_5['name'].apply(modify_name)

# Drop the original 'name' column
pres_df_5.drop(columns=['name'], inplace=True)
In [61]:
pres_df_5[pres_df_5['year']==2016]
Out[61]:
year party term salary position_title President
227 2016 Democratic Second 400000 PRESIDENT OF THE UNITED STATES Barack H. Obama
424 2016 Democratic Second 230700 VICE PRESIDENT OF THE UNITED STATES Joseph R. Biden

¶

1.6 Most Common Names of U.S. Presidents (1789-2021) (1) : pres_df_6

In [62]:
pres_df_6
Out[62]:
Unnamed: 0 S.No. start end president prior party vice
0 0 1 April 30, 1789 March 4, 1797 George Washington Commander-in-Chief of the Continental Army ... Nonpartisan [13] John Adams
1 1 2 March 4, 1797 March 4, 1801 John Adams 1st Vice President of the United States Federalist Thomas Jefferson
2 2 3 March 4, 1801 March 4, 1809 Thomas Jefferson 2nd Vice President of the United States Democratic- Republican Aaron Burr
3 3 4 March 4, 1809 March 4, 1817 James Madison 5th United States Secretary of State (1801–... Democratic- Republican George Clinton
4 4 5 March 4, 1817 March 4, 1825 James Monroe 7th United States Secretary of State (1811–... Democratic- Republican Daniel D. Tompkins
5 5 6 March 4, 1825 March 4, 1829 John Quincy Adams 8th United States Secretary of State (1817–... Democratic- Republican John C. Calhoun
6 6 7 March 4, 1829 March 4, 1837 Andrew Jackson U.S. Senator ( Class 2 ) from Tennessee ... Democratic John C. Calhoun
7 7 8 March 4, 1837 March 4, 1841 Martin Van Buren 8th Vice President of the United States Democratic Richard Mentor Johnson
8 8 9 March 4, 1841 April 4, 1841 William Henry Harrison United States Minister to Colombia (1828–1829) Whig John Tyler
9 9 10 April 4, 1841 March 4, 1845 John Tyler 10th Vice President of the United States Whig April 4, 1841 – September 13, 1841 Office vacant
10 10 11 March 4, 1845 March 4, 1849 James K. Polk 9th Governor of Tennessee (1839–1841) Democratic George M. Dallas
11 11 12 March 4, 1849 July 9, 1850 Zachary Taylor Major General of the 1st Infantry Regiment ... Whig Millard Fillmore
12 12 13 July 9, 1850 March 4, 1853 Millard Fillmore 12th Vice President of the United States Whig Office vacant
13 13 14 March 4, 1853 March 4, 1857 Franklin Pierce Brigadier General of the 9th Infantry Unit... Democratic William R. King
14 14 15 March 4, 1857 March 4, 1861 James Buchanan United States Minister to the Court of St J... Democratic John C. Breckinridge
15 15 16 March 4, 1861 April 15, 1865 Abraham Lincoln U.S. Representative for Illinois' 7th Distri... Republican ( National Union ) [i] Hannibal Hamlin
16 16 17 April 15, 1865 March 4, 1869 Andrew Johnson 16th Vice President of the United States National Union [i] ( Democratic ) [j] Office vacant
17 17 18 March 4, 1869 March 4, 1877 Ulysses S. Grant Commanding General of the U.S. Army ( 1864–... Republican Schuyler Colfax
18 18 19 March 4, 1877 March 4, 1881 Rutherford B. Hayes 29th & 32nd Governor of Ohio (1868–1872 & 1... Republican William A. Wheeler
19 19 20 March 4, 1881 September 19, 1881 James A. Garfield U.S. Representative for Ohio's 19th District... Republican Chester A. Arthur
20 20 21 September 19, 1881 March 4, 1885 Chester A. Arthur 20th Vice President of the United States Republican Office vacant
21 21 22 March 4, 1885 March 4, 1889 Grover Cleveland 28th Governor of New York (1883–1885) Democratic Thomas A. Hendricks
22 22 23 March 4, 1889 March 4, 1893 Benjamin Harrison U.S. Senator ( Class 1 ) from Indiana (... Republican Levi P. Morton
23 23 24 March 4, 1893 March 4, 1897 Grover Cleveland 22nd President of the United States (1885–1... Democratic Adlai Stevenson
24 24 25 March 4, 1897 September 14, 1901 William McKinley 39th Governor of Ohio (1892–1896) Republican Garret Hobart
25 25 26 September 14, 1901 March 4, 1909 Theodore Roosevelt 25th Vice President of the United States Republican Office vacant
26 26 27 March 4, 1909 March 4, 1913 William Howard Taft 42nd United States Secretary of War (1904–1... Republican James S. Sherman
27 27 28 March 4, 1913 March 4, 1921 Woodrow Wilson 34th Governor of New Jersey (1911–1913) Democratic Thomas R. Marshall
28 28 29 March 4, 1921 August 2, 1923 Warren G. Harding U.S. Senator ( Class 3 ) from Ohio (191... Republican Calvin Coolidge
29 29 30 August 2, 1923 March 4, 1929 Calvin Coolidge 29th Vice President of the United States Republican Office vacant
30 30 31 March 4, 1929 March 4, 1933 Herbert Hoover 3rd United States Secretary of Commerce (19... Republican Charles Curtis
31 31 32 March 4, 1933 January 20, 1941 Franklin D. Roosevelt 44th Governor of New York ( 1929–1932 ) Democratic John Nance Garner
32 32 33 April 12, 1945 January 20, 1953 Harry S. Truman 34th Vice President of the United States Democratic Office vacant
33 33 34 January 20, 1953 January 20, 1961 Dwight D. Eisenhower Supreme Allied Commander Europe ( 1949–1952 ) Republican Richard Nixon
34 34 35 January 20, 1961 November 22, 1963 John F. Kennedy U.S. Senator ( Class 1 ) from Massachuset... Democratic Lyndon B. Johnson
35 35 36 November 22, 1963 January 20, 1969 Lyndon B. Johnson 37th Vice President of the United States Democratic Office vacant
36 36 37 January 20, 1969 August 9, 1974 Richard Nixon 36th Vice President of the United States (1... Republican Spiro Agnew
37 37 38 August 9, 1974 January 20, 1977 Gerald Ford 40th Vice President of the United States Republican Office vacant
38 38 39 January 20, 1977 January 20, 1981 Jimmy Carter 76th Governor of Georgia (1971–1975) Democratic Walter Mondale
39 39 40 January 20, 1981 January 20, 1989 Ronald Reagan 33rd Governor of California ( 1967–1975 ) Republican George H. W. Bush
40 40 41 January 20, 1989 January 20, 1993 George H. W. Bush 43rd Vice President of the United States Republican Dan Quayle
41 41 42 January 20, 1993 January 20, 2001 Bill Clinton 40th & 42nd Governor of Arkansas (1979–1981... Democratic Al Gore
42 42 43 January 20, 2001 January 20, 2009 George W. Bush 46th Governor of Texas ( 1995–2000 ) Republican Dick Cheney
43 43 44 January 20, 2009 NaN Barack Obama U.S. Senator ( Class 3 ) from Illinois ... Democratic Joe Biden
44 44 45 January 20, 2017 -- Donald Trump Chairman of The Trump Organization ( 1971–... Republican Mike Pence
In [63]:
pres_df_6.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45 entries, 0 to 44
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  45 non-null     int64 
 1   S.No.       45 non-null     int64 
 2   start       45 non-null     object
 3   end         44 non-null     object
 4   president   45 non-null     object
 5   prior       45 non-null     object
 6   party       45 non-null     object
 7   vice        45 non-null     object
dtypes: int64(2), object(6)
memory usage: 2.9+ KB

Note: Upon closer examination, it has come to our attention that the data present in pres_df_6 duplicates information that we already have in other datasets. Therefore, there is no need to retain all the columns or data from pres_df_6, as it would only result in unnecessary redundancy. We can focus on integrating new and unique data points that would add more value and insights to our analysis instead.

¶

1.7 USA Economic Growth Dataset : pres_df_7

In [64]:
pres_df_7.head()
Out[64]:
Year GDP GDP per capita (in US$ PPP) GDP (in Bil. US$nominal) GDP per capita (in US$ nominal) GDP growth % Inflation rate % Unemployment % Government debt (in % of GDP) Presidents
0 1981 3207.0 13948.7 3207.0 13948.7 2.50% 10.40% 7.60% 31.00% Ronald Reagan
1 1982 3343.8 14405.0 3343.8 14405.0 -1.80% 6.20% 9.70% 34.00% Ronald Reagan
2 1983 3634.0 15513.7 3634.0 15513.7 4.60% 3.20% 9.60% 37.00% Ronald Reagan
3 1984 4037.7 17086.4 4037.7 17086.4 7.20% 4.40% 7.50% 38.00% Ronald Reagan
4 1985 4339.0 18199.3 4339.0 18199.3 4.20% 3.50% 7.20% 41.00% Ronald Reagan
In [65]:
pres_df_7['Presidents'].value_counts()
Out[65]:
Ronald Reagan     8
Bill Clinton      8
George W. Bush    8
Barack Obama      8
George Bush       4
Donald Trump      4
Joe Biden         3
Name: Presidents, dtype: int64

We will handle this dataset separately since it contains information for only specific presidents. Specifically, it provides data for the following presidents and the corresponding number of records available for each:

  1. Ronald Reagan: 8 records
  2. Bill Clinton: 8 records
  3. George W. Bush: 8 records
  4. Barack Obama: 8 records
  5. George Bush: 4 records
  6. Donald Trump: 4 records
  7. Joe Biden: 3 records

Given the limited data for each president, we will perform individualized analysis on their respective records to gain insights and draw conclusions relevant to each specific administration.

¶

1.8 U.S. GDP During Presidencies Dataset : pres_df_8

In [66]:
# Set the option to display all columns
pd.set_option('display.max_columns', None)

pres_df_8.head()
Out[66]:
Unnamed: 0 Year CPI GDPdeflator population.K realGDPperCapita executive war battleDeaths battleDeathsPMP Keynes unemployment unempSource fedReceipts fedOutlays fedSurplus fedDebt fedReceipts_pGDP fedOutlays_pGDP fedSurplus_pGDP fedDebt_pGDP
0 1610 1610 NaN NaN 0.350 NaN JamesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1620 1620 NaN NaN 2.302 NaN JamesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1630 1630 NaN NaN 4.646 NaN CharlesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1640 1640 NaN NaN 26.634 NaN CharlesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1650 1650 NaN NaN 50.368 NaN Cromwell NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
In [67]:
pres_df_8.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 265 entries, 0 to 264
Data columns (total 21 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Unnamed: 0        265 non-null    int64  
 1   Year              265 non-null    int64  
 2   CPI               248 non-null    float64
 3   GDPdeflator       231 non-null    float64
 4   population.K      249 non-null    float64
 5   realGDPperCapita  231 non-null    float64
 6   executive         265 non-null    object 
 7   war               56 non-null     object 
 8   battleDeaths      248 non-null    float64
 9   battleDeathsPMP   232 non-null    float64
 10  Keynes            265 non-null    int64  
 11  unemployment      222 non-null    float64
 12  unempSource       222 non-null    object 
 13  fedReceipts       231 non-null    float64
 14  fedOutlays        231 non-null    float64
 15  fedSurplus        231 non-null    float64
 16  fedDebt           232 non-null    float64
 17  fedReceipts_pGDP  230 non-null    float64
 18  fedOutlays_pGDP   230 non-null    float64
 19  fedSurplus_pGDP   230 non-null    float64
 20  fedDebt_pGDP      231 non-null    float64
dtypes: float64(15), int64(3), object(3)
memory usage: 43.6+ KB

We noticed the following points in pres_df_8:

  1. We will treat this dataset individually because it contains information about more presidents, starting from the year 1610.
  2. We need to delete the column named "Unnamed: 0" as it seems to be an unnecessary index column.
In [68]:
# Task 2: We need to delete the column named "Unnamed: 0" as it seems to be an unnecessary index column.
pres_df_8.drop('Unnamed: 0', axis=1)
Out[68]:
Year CPI GDPdeflator population.K realGDPperCapita executive war battleDeaths battleDeathsPMP Keynes unemployment unempSource fedReceipts fedOutlays fedSurplus fedDebt fedReceipts_pGDP fedOutlays_pGDP fedSurplus_pGDP fedDebt_pGDP
0 1610 NaN NaN 0.350 NaN JamesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1620 NaN NaN 2.302 NaN JamesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1630 NaN NaN 4.646 NaN CharlesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1640 NaN NaN 26.634 NaN CharlesI NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1650 NaN NaN 50.368 NaN Cromwell NaN NaN NaN 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
260 2017 245.12 107.75 325220.000 55590.0 Trump NaN 0.0 0.0 0 4.358333 BLS 3316184.0 3981630.0 -665446.0 2.024490e+13 0.170239 0.204400 -0.034161 1.039287
261 2018 251.11 110.32 326949.000 56910.0 Trump NaN 0.0 0.0 0 3.891667 BLS 3329907.0 4109045.0 -779138.0 2.151606e+13 0.162219 0.200176 -0.037956 1.048173
262 2019 255.66 112.29 328527.000 57933.0 Trump NaN 0.0 0.0 0 3.675000 BLS 3463364.0 4446956.0 -983592.0 2.271940e+13 0.162047 0.208068 -0.046021 1.063015
263 2020 258.81 113.65 330152.000 55685.0 Trump NaN 0.0 0.0 0 8.091667 BLS 3421164.0 6553603.0 -3132439.0 2.694539e+13 0.163741 0.313664 -0.149923 1.289642
264 2021 270.97 NaN NaN NaN Biden NaN 0.0 NaN 0 5.358333 BLS 4047112.0 6822449.0 -2775337.0 2.842892e+13 NaN NaN NaN NaN

265 rows × 20 columns

¶

2. Data Integration

We will now begin the process of integrating the data sets to create a comprehensive and enriched presidential dataset. As part of this integration, we will merge certain data sets that share common attributes, such as the "President" column, to consolidate the information. On the other hand, some data sets will remain as individual and separate tables, as they contain unique or specific information related to certain aspects of the presidents' history.

To facilitate the understanding of which data sets will be integrated and which will remain standalone, we will create a new table or list to outline the data set integration plan. This table will list the names of the data sets and indicate whether they will be merged into the master presidential dataset or kept separate.

Data Set Integration Plan:

  1. Integrated Data Sets:
    • pres_df_1 (Basic Presidential Information)
    • pres_df_3 (Presidential Achievements)
    • pres_df_4 (Presidential Events and Activities)
  1. Standalone Data Sets:
    • pres_df_2 (First Ladies Information)
    • pres_df_5 (Additional Presidential Data)
    • pres_df_6 (Presidential Ratings and Approval)
    • pres_df_7 (Presidential Quotes)
    • pres_df_8 (Extended Presidential Information)

By following this integration plan, we aim to create a comprehensive and rich dataset that encompasses various aspects of each president's life, achievements, events, and ratings, while also maintaining the individuality of certain data sets that offer unique insights. This integrated dataset will serve as a valuable resource for further analysis and exploration of the history and legacy of the United States presidents.

In [69]:
# Merge pres_df_1, pres_df_3, and pres_df_4 based on their indices
pres_merged_df = pres_df_1.merge(pres_df_3, left_index=True, right_index=True)
pres_merged_df = pres_merged_df.merge(pres_df_4, left_index=True, right_index=True)

# Check the result
pres_merged_df.head()
Out[69]:
President_x Born Post-presidency timespan Died Age Start Date of presidency Start Age of presidency End of Presidency Age End of Presidency Date order President_y height_cm height_in weight_kg weight_lb body_mass_index body_mass_index_range birthplace birth_state death_age astrological_sign presidency_begin_age presidency_end_age political_party corrected_iq President Birthplace Children Religion Higher Education Occupation Military Service Term Party Vice President Previous Office Economy Foreign Affairs Military Activity Other Events Legacy
0 George Washington 1732-02-22 1015 days 00:00:00 1799-12-14 24750 days 1789-04-30 20872 days 23735 days 1797-03-04 1.0 George Washington 188 74.0 79.4 175 22.5 Normal Westmoreland County Virginia 67.0 Pisces 57 65.0 Unaffiliated 140.0 George Washington Pope's Creek, VA 0 Episcopalian None Plantation Owner, Soldier Commander-in-Chief of the Continental Army in... 1789-1797 Unaffiliated John Adams Commander-in-Chief [' Hamilton established BUS', '1792 Coinage Ac... ['1793 Neutrality in the France-Britain confli... ['1794 Whiskey Rebellion'] ['1791 Bill of Rights', '1792 Post Office foun... He is universally regarded as one of the great...
1 John Adams 1735-10-30 9247 days 00:00:00 1826-07-04 33097 days 1797-03-04 22390 days 23850 days 1801-03-04 2.0 John Adams 170 67.0 83.9 185 29.0 Overweight Braintree Massachusetts 90.0 Scorpio 61 65.0 Federalist 155.0 John Adams Braintree, MA 5 Unitarian Harvard Lawyer, Farmer none 1797-1801 Federalist Thomas Jefferson 1st Vice President of USA ['1798 Progressive land value tax of up to 1% ... ['1797 the XYZ Affair: a bribe of French agent... ['1798–1800 The Quasi war. Undeclared naval wa... ['1798 Alien & Sedition Act to silence critics... One of the most experienced men ever to become...
2 Thomas Jefferson 1743-04-13 6327 days 00:00:00 1826-07-04 30377 days 1801-03-04 21130 days 24050 days 1809-03-04 3.0 Thomas Jefferson 189 74.5 82.1 181 23.0 Normal Shadwell Virginia 83.0 Aries 57 65.0 Democratic-Republican 160.0 Thomas Jefferson Goochland County, VA 6 unaffiliated Christian College of William and Mary Inventor,Lawyer, Architect Colonel of Virginia militia (without real mili... 1801-1809 Democratic-Republican Aaron Burr, George Clinton 2nd Vice President of USA ['1807 Embargo Act forbidding foreign trade in... ['1805 Peace Treaty with Tripoli. Piracy stopp... ['1801-05 Naval operation against Tripoli and ... ['1803 The Louisiana purchase', '1804 12th Ame... Probably the most intelligent man ever to occ...
3 James Madison 1751-03-16 7051 days 00:00:00 1836-06-28 31129 days 1809-03-04 21158 days 24078 days 1817-03-04 4.0 James Madison 163 64.0 55.3 122 20.8 Normal Port Conway Virginia 85.0 Pisces 57 65.0 Democratic-Republican 160.0 James Madison Port Conway, VA 0 Episcopalian Princeton Plantation Owner, Lawyer Colonel of Virginia militia (without real mili... 1809-1817 Democratic-Republican George Clinton, Elbridge Gerry Secretary of State [' The first U.S. protective tariff was impose... ['1814 The Treaty of Ghent ends the War of 1812'] ['1811 Tippecanoe battle (Harrison vs. Chief T... ['1811 Cumberland Road construction starts (fi... His leadership in the War of 1812 was particul...
4 James Monroe 1758-04-28 2312 days 00:00:00 1831-07-04 26712 days 1817-03-04 21480 days 24400 days 1825-03-04 5.0 James Monroe 183 72.0 85.7 189 25.6 Overweight Monroe Hall Virginia 73.0 Taurus 58 66.0 Democratic-Republican 139.0 James Monroe Monroe Hall, VA 2 Episcopalian College of William and Mary Plantation Owner, Lawyer Major of the Continental Army 1817-1825 Democratic-Republican Daniel Tompkins Secretary of War ['1819 Panic of 1819 (too much land speculatio... ['1823 Monroe Doctrine', '1818 49th parallel s... ['1817 1st Seminole war against Seminole India... ['1819 Florida ceded to US', "1820 Missouri Co... His presidency contributed to national defense...
In [70]:
# Merge pres_df_1, pres_df_3, and pres_df_4 based on the column 'President'
pres_merged_df = pres_df_1.merge(pres_df_4, left_index=True, right_index=True)
In [72]:
pres_merged_df.head()
Out[72]:
President_x Born Post-presidency timespan Died Age Start Date of presidency Start Age of presidency End of Presidency Age End of Presidency Date President_y Birthplace Children Religion Higher Education Occupation Military Service Term Party Vice President Previous Office Economy Foreign Affairs Military Activity Other Events Legacy
0 George Washington 1732-02-22 1015 days 00:00:00 1799-12-14 24750 days 1789-04-30 20872 days 23735 days 1797-03-04 George Washington Pope's Creek, VA 0 Episcopalian None Plantation Owner, Soldier Commander-in-Chief of the Continental Army in... 1789-1797 Unaffiliated John Adams Commander-in-Chief [' Hamilton established BUS', '1792 Coinage Ac... ['1793 Neutrality in the France-Britain confli... ['1794 Whiskey Rebellion'] ['1791 Bill of Rights', '1792 Post Office foun... He is universally regarded as one of the great...
1 John Adams 1735-10-30 9247 days 00:00:00 1826-07-04 33097 days 1797-03-04 22390 days 23850 days 1801-03-04 John Adams Braintree, MA 5 Unitarian Harvard Lawyer, Farmer none 1797-1801 Federalist Thomas Jefferson 1st Vice President of USA ['1798 Progressive land value tax of up to 1% ... ['1797 the XYZ Affair: a bribe of French agent... ['1798–1800 The Quasi war. Undeclared naval wa... ['1798 Alien & Sedition Act to silence critics... One of the most experienced men ever to become...
2 Thomas Jefferson 1743-04-13 6327 days 00:00:00 1826-07-04 30377 days 1801-03-04 21130 days 24050 days 1809-03-04 Thomas Jefferson Goochland County, VA 6 unaffiliated Christian College of William and Mary Inventor,Lawyer, Architect Colonel of Virginia militia (without real mili... 1801-1809 Democratic-Republican Aaron Burr, George Clinton 2nd Vice President of USA ['1807 Embargo Act forbidding foreign trade in... ['1805 Peace Treaty with Tripoli. Piracy stopp... ['1801-05 Naval operation against Tripoli and ... ['1803 The Louisiana purchase', '1804 12th Ame... Probably the most intelligent man ever to occ...
3 James Madison 1751-03-16 7051 days 00:00:00 1836-06-28 31129 days 1809-03-04 21158 days 24078 days 1817-03-04 James Madison Port Conway, VA 0 Episcopalian Princeton Plantation Owner, Lawyer Colonel of Virginia militia (without real mili... 1809-1817 Democratic-Republican George Clinton, Elbridge Gerry Secretary of State [' The first U.S. protective tariff was impose... ['1814 The Treaty of Ghent ends the War of 1812'] ['1811 Tippecanoe battle (Harrison vs. Chief T... ['1811 Cumberland Road construction starts (fi... His leadership in the War of 1812 was particul...
4 James Monroe 1758-04-28 2312 days 00:00:00 1831-07-04 26712 days 1817-03-04 21480 days 24400 days 1825-03-04 James Monroe Monroe Hall, VA 2 Episcopalian College of William and Mary Plantation Owner, Lawyer Major of the Continental Army 1817-1825 Democratic-Republican Daniel Tompkins Secretary of War ['1819 Panic of 1819 (too much land speculatio... ['1823 Monroe Doctrine', '1818 49th parallel s... ['1817 1st Seminole war against Seminole India... ['1819 Florida ceded to US', "1820 Missouri Co... His presidency contributed to national defense...

Achievements:¶

  1. Data Collection: We successfully collected data from multiple data sets containing information about presidents, first ladies, and historical events.

  2. Data Cleaning: We performed extensive data cleaning to handle missing values, correct data types, and standardize the format of names, dates, and time spans across the data sets.

  3. Data Integration: We integrated multiple data sets based on their indices, ensuring that the information is correctly aligned for each president.

  4. Standardization: We standardized the column names across all data sets, making it easier to analyze and compare the data.

  5. Time Span Conversion: We converted time spans expressed in years and days to Pandas timedelta data type, enabling better analysis of presidency durations and ages.

Problems Solved:

  1. Handling Missing Data: We dealt with missing data in various columns using appropriate methods like dropping or filling missing values.

  2. Data Type Conversion: We converted columns to their correct data types, such as converting date-related columns to datetime data type and time spans to timedelta data type.

  3. Name Formatting: We standardized the format of names in the data sets, making them consistent and easy to read.

  4. Data Alignment: We merged data sets based on their indices to ensure that the information is accurately aligned for each president.

  5. Data Exclusion: We excluded irrelevant data sets and columns to focus on relevant information for analysis.

Overall, the data cleaning and integration process have resulted in a unified and organized data set, ready for further analysis and exploration. The standardized data will enable us to gain valuable insights into historical presidencies, their events, and the context surrounding them.

¶

3→ 5 Exploratory Data Analysis, Data Visualization & In-Depth Data Analysis

¶

Presidents and Their Birthplaces by Political Party

This interactive data visualization showcases the birthplaces of U.S. presidents, categorized by their political parties. The graph is designed to provide a clear visual representation of where each president was born and their party affiliation.

Visual Elements:¶

  1. Scatter Points: Each president is represented by a scatter point on the graph, placed according to their birth date (x-axis) and their name (y-axis).

  2. Colors: Presidents belonging to the same political party are assigned the same color. The color key on the right side of the graph helps identify each political party.

  3. Hover Information: When hovering over a scatter point, the reader is presented with a tooltip that provides detailed information about the president, including their name, political party, and date of birth.

Insights:¶

  • The graph allows readers to easily observe clusters of presidents born in similar time periods. These clusters might represent specific political eras or historical events.

  • The distribution of presidents across different political parties can be easily compared. For example, the dominance of certain parties during specific time periods may become apparent.

  • Readers can identify presidents with unusual birthplaces or any interesting patterns related to their birth locations.

  • The dark theme background provides an aesthetically pleasing appearance and ensures that the graph is visually engaging.

Interactive Functionality:¶

  • The reader can use the hover feature to obtain detailed information about each president by simply moving the cursor over the scatter points.

  • The legend on the right side of the graph can be used to highlight or hide specific political parties, allowing for a clearer view of each party's representation in different time periods.

Overall, this data visualization offers an intuitive and visually appealing way to explore the birthplaces of U.S. presidents and understand their distribution across different political parties. The interactive nature of the graph encourages readers to discover interesting patterns and insights based on their own exploration.

In [243]:
import plotly.graph_objects as go
import pandas as pd

# Define colors for each political_party with a dark theme
colors = {
    'Democratic': '#00BFFF',
    'Republican': '#FF6347',
    'Whig': '#00FF00',
    'Federalist': '#9932CC',
    'Democratic-Republican': '#FFD700',
    'National Union': '#00CED1',
    'Unaffiliated': '#A9A9A9'
}

# Create a scatter trace for each political_party
scatter_traces = []
for index, row in pres_merged_df.iterrows():
    party = row['Party']
    scatter_trace = go.Scatter(
        x=[row['Born']],
        y=[row['President_x']],
        mode='markers',
        marker=dict(color=colors[party], size=10, opacity=0.8),
        name=party,
        hovertext=f'President: {row["President_x"]}<br>Party: {party}<br>Born: {row["Born"]}'
    )
    scatter_traces.append(scatter_trace)

# Create the figure layout with a dark background
layout = go.Layout(
    title='Presidents and Their Birthplaces by Political Party',
    xaxis=dict(title='Date of Birth'),
    yaxis=dict(title='President'),
    hovermode='closest',
    showlegend=True,
    template='plotly_dark',  # Use the dark theme template
)

# Create the figure and add the traces and layout
fig = go.Figure(data=scatter_traces, layout=layout)

# Increase the plot size
fig.update_layout(width=1400, height=1400)

fig.update_traces(text=hover_text, hoverinfo='text')

# Show the plot
fig.show()

¶

Birthplaces of U.S. Presidents by Political Party

The goal of this interactive choropleth map is to visualize the birthplaces of U.S. Presidents and their political affiliations. Each state in the United States is colored based on the political party of the presidents born in that state. The color scale represents different political parties, and each president's birthplace is marked with a scatter point of the corresponding color.

By examining this map, we can gain insights into the distribution of U.S. Presidents' birthplaces and how they correlate with political party affiliations. The map provides an engaging way to explore historical data and understand the geographical patterns associated with past Presidents' backgrounds.

Hovering over each scatter point will reveal detailed information about the President's name, birthplace, birth date, and political party. The colors on the map will help us identify states that have been the birthplace of Presidents from various political parties, ranging from Unaffiliated, Federalist, Democratic-Republican, Democratic, to Republican.

This visualization aims to offer a comprehensive view of U.S. Presidential history, allowing users to explore and analyze the geographic and political aspects of the nation's leadership over time.

In [73]:
import plotly.graph_objects as go
import pandas as pd
from geopy.geocoders import Nominatim

# Geocode the birthplaces to get latitude and longitude
geolocator = Nominatim(user_agent='presidents_map')
pres_merged_df['location'] = pres_merged_df['Birthplace'].apply(geolocator.geocode)
pres_merged_df['latitude'] = pres_merged_df['location'].apply(lambda loc: loc.latitude if loc else None)
pres_merged_df['longitude'] = pres_merged_df['location'].apply(lambda loc: loc.longitude if loc else None)

# Define colors for each political_party
colors = {
    'Unaffiliated': '#A9A9A9',
    'Federalist': '#9932CC',
    'Democratic-Republican': '#FFD700',
    'Democratic': '#00BFFF',
    'Republican': '#FF6347',
}

# Create a scatter trace for each president
scatter_points = []
for _, row in pres_merged_df.iterrows():
    scatter_point = go.Scattergeo(
        locationmode='USA-states',
        lon=[row['longitude']],
        lat=[row['latitude']],
        text=f'President: {row["President_x"]}<br>Born: {row["Born"]}<br>Birthplace: {row["Birthplace"]}<br>Party: {row["Party"]}',
        marker=dict(
            size=10,
            color=colors.get(row['Party'], 'gray'),
            opacity=0.8,
            line=dict(width=1, color='white')
        ),
        name=row['Party'],
        hoverinfo='text'
    )
    scatter_points.append(scatter_point)

# Create the layout for the map with a larger size
layout = go.Layout(
    title='Birthplaces of U.S. Presidents by Political Party',
    geo=dict(
        scope='usa',
        projection=dict(type='albers usa'),
        showland=True,
        landcolor='rgb(250, 250, 250)',
        subunitcolor='rgb(217, 217, 217)',
        countrycolor='rgb(217, 217, 217)',
        showlakes=True,
        lakecolor='rgb(255, 255, 255)',
        showsubunits=True,
        showcountries=True,
        resolution=50,
        lonaxis=dict(range=[-130, -60]),
        lataxis=dict(range=[20, 50])
    ),
    width=1400,  # Set the width of the map to 1000 pixels
    height=800,  # Set the height of the map to 800 pixels
)

# Create the figure and add the scatter points and layout
fig = go.Figure(data=scatter_points, layout=layout)

# Show the interactive map
fig.show()

¶

Post-presidency Timespan of U.S. Presidents

The goal of this interactive bar chart is to showcase the post-presidency timespan of U.S. Presidents in years. Each bar represents a President, and its length corresponds to the number of years they lived after the end of their presidency. The bars are color-coded to differentiate different categories of post-presidency timespans.

The light red bars indicate the post-presidency timespan for Presidents who lived after their presidency, and the dark red bars represent Presidents who passed away while in office (Died in Office). The bars without a specific color correspond to living Presidents, as their post-presidency timespan is still ongoing.

This visualization allows us to compare the post-presidency longevity of different U.S. Presidents and gain insights into their life spans after serving as the nation's leaders. By exploring the bar chart, we can identify Presidents who had relatively long post-presidency lives and those who passed away shortly after their time in office.

Hovering over each bar will provide additional details, including the name of the President and the exact duration of their post-presidency timespan in years. This interactive visualization offers a comprehensive view of the post-presidency periods of U.S. Presidents, providing a fascinating perspective on the historical context of presidential life spans.

In [289]:
import plotly.graph_objects as go
import pandas as pd

# Function to convert a value to timedelta or None for special cases
def convert_to_timedelta(value):
    if pd.notna(value) and 'days' in str(value):
        return pd.to_timedelta(value)
    return None

# Convert the "Post-presidency timespan" column to timedeltas where possible
pres_merged_df['Post-presidency timespan'] = pres_merged_df['Post-presidency timespan'].apply(convert_to_timedelta)

# Calculate the number of days in the "Post-presidency timespan" column
pres_merged_df['Post-presidency days'] = pres_merged_df['Post-presidency timespan'].dt.days

# Convert the number of days to years
pres_merged_df['Post-presidency years'] = pres_merged_df['Post-presidency days'] / 365.25  # Account for leap years

# Sort the DataFrame by "Post-presidency days" in descending order
pres_merged_df.sort_values(by='Post-presidency days', ascending=False, inplace=True)

# Create a function to define the color of the bars based on the value
def get_bar_color(value):
    if pd.notna(value):
        if isinstance(value, pd.Timedelta):
            return 'lightcoral'  # Light red for numeric values
        else:
            return 'red'  # Red for "Died in Office"
    else:
        return 'lightgray'  # Gray for NaN (still living)

# Create the bar chart
fig = go.Figure()

fig.add_trace(go.Bar(
    x=pres_merged_df['President_x'],
    y=pres_merged_df['Post-presidency years'],
    marker=dict(color=[get_bar_color(value) for value in pres_merged_df['Post-presidency timespan']]),
))

# Customize the layout
fig.update_layout(
    title='Post-presidency Timespan of U.S. Presidents (in Years)',
    xaxis_title='President',
    yaxis_title='Years',
    hovermode='y',
    width=1200,  # Set the width of the chart to 1200 pixels
    height=600,  # Set the height of the chart to 600 pixels
)

# Hide the numbers on the bars
fig.update_traces(texttemplate=None, textposition='outside')

# Rotate the x-axis labels for better readability
fig.update_xaxes(tickangle=45)

# Show the interactive bar chart
fig.show()

¶

Age of U.S. Presidents by Political Party

The goal of this interactive bar chart is to visualize the ages of U.S. Presidents based on their political party affiliations. The chart allows users to explore and compare the ages at which Presidents from different parties passed away. The x-axis represents the names of the Presidents, while the y-axis shows their ages at the time of their death.

To use this interactive chart, simply select a political party from the dropdown menu. The chart will then display the ages of the Presidents belonging to the chosen party, sorted from oldest to youngest.

Each bar in the chart corresponds to a U.S. President, and its color is determined by the political party of the President. The legend on the right-hand side of the chart provides the color codes for each political party. For example, Unaffiliated Presidents are represented in light grey, Federalist in purple, Democratic-Republican in gold, Democratic in blue, and Republican in red.

By hovering the mouse cursor over each bar, users can view additional details, including the President's name and exact age at the time of death.

This visualization provides a dynamic and user-friendly way to explore historical data and understand the distribution of ages among U.S. Presidents across different political parties. It enables users to gain insights into the life spans of Presidents within their respective parties and discover any potential patterns or trends related to their ages.


In [76]:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Born', 'Died', and 'Party'

# Calculate the age of each president and create a new column 'Age'
pres_merged_df['Born'] = pd.to_datetime(pres_merged_df['Born'])
pres_merged_df['Died'] = pd.to_datetime(pres_merged_df['Died'])
pres_merged_df['Age'] = (pres_merged_df['Died'] - pres_merged_df['Born']).dt.days / 365

colors = {
    'Unaffiliated': '#A9A9A9',
    'Federalist': '#9932CC',
    'Democratic-Republican': '#FFD700',
    'Democratic': '#00BFFF',
    'Republican': '#FF6347',
}

# Function to plot the bar chart
def plot_age_by_party(selected_party):
    filtered_df = pres_merged_df[pres_merged_df['Party'] == selected_party]
    sorted_df = filtered_df.sort_values(by='Age', ascending=False)

    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=sorted_df['President_x'],
        y=sorted_df['Age'],
        marker=dict(color=colors.get(selected_party, '#808080'))  # Use grey if color not found
    ))
    
    fig.update_layout(
        xaxis_tickangle=-45,
        title=f'Age of Presidents in {selected_party} Party (Sorted)',
        xaxis_title='President',
        yaxis_title='Age at Death',
        hovermode='x'
    )
    
    fig.show()

# Get unique political parties from the DataFrame
parties = pres_merged_df['Party'].unique()

# Create an interactive dropdown menu to select the party
interact(plot_age_by_party, selected_party=widgets.Dropdown(options=parties));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

¶

Age of U.S. Presidents at the Start of Presidency

The interactive horizontal bar chart visualizes the ages of U.S. Presidents at the beginning of their presidential terms. Each bar represents a President, and they are arranged in ascending order from the youngest to the oldest at the time they started their presidency. The chart is color-coded based on the political party to which each President belonged, making it easy to distinguish between parties.

To create this visualization, we utilized data from a DataFrame containing information about U.S. Presidents, including their birth dates ('Born') and the start dates of their presidencies ('Start Date of presidency'). By calculating the age at the beginning of each President's term, we were able to sort the data and construct the horizontal bar chart.

In [77]:
import pandas as pd
import plotly.express as px

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Born', 'Start Date of presidency', and 'Party'

# Calculate the age of each president when they started their presidency
pres_merged_df['Age_at_Start'] = (pres_merged_df['Start Date of presidency'] - pres_merged_df['Born']).dt.days / 365

# Sort the DataFrame based on the age at the start of the presidency in ascending order
sorted_df = pres_merged_df.sort_values(by='Age_at_Start', ascending=False)

# Define colors for each political party
colors = {
    'Unaffiliated': '#A9A9A9',
    'Federalist': '#9932CC',
    'Democratic-Republican': '#FFD700',
    'Democratic': '#00BFFF',
    'Republican': '#FF6347',
}

# Create the horizontal bar chart using Plotly Express
fig = px.bar(
    sorted_df,
    x='Age_at_Start',
    y='President_x',
    color='Party',
    title='Age of U.S. Presidents at the Start of Presidency (Sorted)',
    labels={'Age_at_Start': 'Age at Start of Presidency', 'President_x': 'President'},
    color_discrete_map=colors,
)

# Customize the appearance of the chart
fig.update_layout(yaxis_title=None, xaxis_tickangle=-45, hovermode='y', plot_bgcolor='rgba(0, 0, 0, 0)',height = 1400)

# Show the chart
fig.show()

¶

Pie Chart: Number of Children of U.S. Presidents

The pie chart displays the distribution of the number of children among U.S. Presidents. Each slice of the pie represents a specific number of children, ranging from 0 to 8. The chart is designed with a dark background using the 'plotly_dark' template, providing a visually appealing contrast to the colorful segments.

The colors in the pie chart are carefully selected to ensure clarity and visual appeal. Each color corresponds to a different number of children, allowing for easy interpretation of the data. Additionally, the names of the categories are displayed outside the chart, providing a clear understanding of the number of children associated with each slice.

The pie chart is an intuitive and concise way to visualize the distribution of children among U.S. Presidents, making it easier to identify the most and least common number of children among the leaders of the nation.

In [78]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'

# Create the pie chart data
children_counts = pres_merged_df['Children'].value_counts()

# Function to generate custom text labels for the pie chart
def custom_label(val):
    if val == 1:
        return f'{val} child'
    else:
        return f'{val} children'

# Apply the custom text labels to the index of children_counts
labels_with_children = children_counts.index.map(custom_label)

# Create the pie chart trace
pie_chart_trace = go.Pie(
    labels=labels_with_children,
    values=children_counts.values,
    textinfo='percent+label',
    textfont=dict(size=12, color='white'),  # Set text color to white for better visibility
)

# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
    title='Number of Children of U.S. Presidents',
    template='plotly_dark',  # Set the plot template to a dark theme
)

# Create the figure and display the pie chart
fig = go.Figure(data=[pie_chart_trace], layout=layout)

# Customize the appearance of the chart
fig.update_layout(height = 800)

fig.show()

¶

Bar Chart: Number of Children of U.S. Presidents

The bar chart illustrates the number of children for each U.S. President. The presidents are sorted based on the number of children they have, with the youngest having the fewest children and the oldest having the most. The bars are presented in a horizontal orientation, allowing for a straightforward comparison between the number of children for each president.

To enhance visibility and readability, the names of the presidents are rotated at an angle of -45 degrees along the x-axis. This arrangement ensures that the names are legible even when there are numerous data points.

The chart is set against a dark background using the 'plotly_dark' template, which not only provides an appealing visual design but also ensures that the bar chart stands out with contrasting colors. Each bar's color corresponds to a specific political party that the president is associated with, making it easier to identify the party affiliation of each leader.

By examining the bar chart, readers can quickly grasp the relationship between the age of U.S. Presidents and the number of children they have, providing valuable insights into the family lives of the country's leaders.

In [79]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'

# Sort the DataFrame by the number of children in ascending order
sorted_df = pres_merged_df # .sort_values(by='Children', ascending=True)

# Create the bar chart trace
bar_chart_trace = go.Bar(
    x=sorted_df['President_x'],
    y=sorted_df['Children'],
    text=sorted_df['Children'],
    textposition='outside',  # Display the number of children above the bars
    textfont=dict(size=12, color='white'),  # Set text color to white for better visibility
    marker=dict(color='dodgerblue'),  # Set the color of the bars
)

# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
    title='Number of Children of U.S. Presidents',
    xaxis=dict(title='President', tickangle=-45),  # Rotate tick labels by -45 degrees
    yaxis=dict(title='Number of Children'),
    template='plotly_dark',  # Set the plot template to a dark theme
)

# Create the figure and display the bar chart
fig = go.Figure(data=[bar_chart_trace], layout=layout)

# Customize the appearance of the chart
fig.update_layout(height = 600)

fig.show()
In [80]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Children'

# Sort the DataFrame by the number of children in ascending order
sorted_df = pres_merged_df.sort_values(by='Children', ascending=True)

# Create the bar chart trace
bar_chart_trace = go.Bar(
    x=sorted_df['President_x'],
    y=sorted_df['Children'],
    text=sorted_df['Children'],
    textposition='outside',  # Display the number of children above the bars
    textfont=dict(size=12, color='white'),  # Set text color to white for better visibility
    marker=dict(color='dodgerblue'),  # Set the color of the bars
)

# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
    title='Number of Children of U.S. Presidents',
    xaxis=dict(title='President', tickangle=-45),  # Rotate tick labels by -45 degrees
    yaxis=dict(title='Number of Children'),
    template='plotly_dark',  # Set the plot template to a dark theme
)

# Create the figure and display the bar chart
fig = go.Figure(data=[bar_chart_trace], layout=layout)

# Customize the appearance of the chart
fig.update_layout(height = 600)

fig.show()

¶

Bar Chart - U.S. Presidents by Religion and Political Party

The bar chart displays U.S. Presidents grouped by their religious affiliations and further categorized by political parties. Each bar represents a different President, with the length of the bar indicating the President's name and the color representing their political party.

The chart offers interactivity, allowing users to select a specific religion from the dropdown menu to view the Presidents who followed that particular religion and their corresponding political parties. The chart uses a dark background theme and custom colors for each party, ensuring a visually engaging experience.

The x-axis of the bar chart shows the political parties, and the y-axis displays the names of the Presidents. The chart is oriented horizontally to allow for better readability of the President's names on the bars. Hovering over each bar reveals additional information about the President's party affiliation and religion, providing valuable insights into the religious diversity among U.S. Presidents over time.

In [81]:
import pandas as pd
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Religion'

# Function to display the list of presidents following a specific religion
def presidents_by_religion(selected_religion):
    presidents_list = pres_merged_df[pres_merged_df['Religion'] == selected_religion]['President_x'].tolist()
    if len(presidents_list) > 0:
        presidents = ', '.join(presidents_list)
        print(f"The U.S. Presidents following {selected_religion} are: {presidents}.")
    else:
        print(f"No U.S. Presidents were found following {selected_religion}.")

# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()

# Create an interactive dropdown menu to select the religion
interact(presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

In [82]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Religion', and 'Party'

# Function to plot the bar chart
def plot_presidents_by_religion(selected_religion):
    filtered_df = pres_merged_df[pres_merged_df['Religion'] == selected_religion]
    fig = px.bar(
        filtered_df,
        x='President_x',
        y='Party',
        title=f'U.S. Presidents Following {selected_religion}',
        labels={'President_x': 'President', 'Party': 'Political Party'},
        orientation='h',  # Horizontal bar chart
        template='plotly_dark',  # Set the plot template to a dark theme
        color='Party',  # Color based on the party
        color_discrete_map={
            'Unaffiliated': '#A9A9A9',
            'Federalist': '#9932CC',
            'Democratic-Republican': '#FFD700',
            'Democratic': '#00BFFF',
            'Republican': '#FF6347',
            'National Union': 'gray',
            'Whig':'orange'
            
        }  # Custom colors for each party
    )
    fig.update_layout(xaxis_tickangle=-45, hovermode='x')
    fig.show()

# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()

# Create an interactive dropdown menu to select the religion
interact(plot_presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

In [83]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Religion', and 'Party'

# Function to plot the bar chart
def plot_presidents_by_religion(selected_religion):
    filtered_df = pres_merged_df[pres_merged_df['Religion'] == selected_religion]
    fig = px.bar(
        filtered_df,
        x='Party',
        y='President_x',
        title=f'U.S. Presidents Following {selected_religion}',
        labels={'Party': 'Political Party', 'President_x': 'President'},
        orientation='h',  # Horizontal bar chart
        template='plotly_dark',  # Set the plot template to a dark theme
        color='Party',  # Color based on the party
        color_discrete_map={
            'Unaffiliated': '#A9A9A9',
            'Federalist': '#9932CC',
            'Democratic-Republican': '#FFD700',
            'Democratic': '#00BFFF',
            'Republican': '#FF6347',
            'National Union': 'gray',
            'Whig':'orange'
        }  # Custom colors for each party
    )
    fig.update_layout(xaxis_tickangle=-45, hovermode='x')
    fig.show()

# Get unique religions from the DataFrame
religions = pres_merged_df['Religion'].unique()

# Create an interactive dropdown menu to select the religion
interact(plot_presidents_by_religion, selected_religion=widgets.Dropdown(options=religions));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

¶

Pie Chart - Number of U.S. Presidents by Religion

The pie chart displays the distribution of U.S. Presidents based on their religious affiliations. Each segment of the pie represents a different religion, and the size of each segment corresponds to the number of U.S. Presidents who follow that particular religion. The chart provides an interactive experience, allowing users to select a specific religion from the dropdown menu to see its representation among the Presidents.

The chart uses a dark background theme for a visually appealing look. Each segment in the chart is labeled with the respective religion and the percentage of Presidents belonging to that religion. The custom labels help users quickly understand the religious makeup of the U.S. Presidents throughout history.

In [84]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Religion'

# Create the pie chart data
religion_counts = pres_merged_df['Religion'].value_counts()

# Function to generate custom text labels for the pie chart
def custom_label(val):
    if val == 1:
        return f'{val} religion'
    else:
        return f'{val} religions'

# Apply the custom text labels to the index of religion_counts
labels_with_religions = religion_counts.index.map(custom_label)

# Create the pie chart trace
pie_chart_trace = go.Pie(
    labels=labels_with_religions,
    values=religion_counts.values,
    textinfo='percent+label',
    textfont=dict(size=12, color='white'),  # Set text color to white for better visibility
)

# Create the layout with a dark background using the plotly_dark template
layout = go.Layout(
    title='Religions of U.S. Presidents',
    template='plotly_dark',  # Set the plot template to a dark theme
)

# Create the figure and display the pie chart
fig = go.Figure(data=[pie_chart_trace], layout=layout)

# Customize the appearance of the chart
fig.update_layout(height=800)

fig.show()

¶

Higher Education of U.S. Presidents

This interactive tool allows you to explore the higher education background of U.S. Presidents in three different visualizations:

1. Word Cloud¶

The word cloud represents the distribution of higher education institutions attended by U.S. Presidents. Each institution's name appears in the word cloud, with font size indicating the frequency of occurrence. Larger font size indicates that more Presidents attended that particular institution.

In [85]:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'

# Concatenate all the 'Higher Education' values into a single string
education_text = ' '.join(pres_merged_df['Higher Education'].dropna())

# Create the Word Cloud with custom settings
wordcloud = WordCloud(
    width=1200,
    height=600,
    background_color='black',
    colormap='viridis',  # Choose a color map for the Word Cloud
    contour_color='steelblue',  # Set contour color for better visibility
    contour_width=2,  # Set contour width
    max_words=150,  # Set the maximum number of words in the Word Cloud
    prefer_horizontal=0.8,  # Set the ratio of horizontal to vertical words
).generate(education_text)

# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Higher Education of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()

2. Venn Diagram¶

The Venn diagram presents a comparison between Presidents who attended Ivy League institutions and those who attended Non-Ivy League institutions. The overlapping area shows Presidents who attended both types of institutions, and the non-overlapping areas show exclusive groups.

In [86]:
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib_venn import venn2

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Higher Education'])

# Separate the educational institutions into two groups: Ivy League and Non-Ivy League
ivy_league = ['Harvard', 'Yale', 'Princeton', 'Columbia', 'Brown', 'Dartmouth', 'Cornell', 'University of Pennsylvania']
non_ivy_league = [edu for edu in pres_merged_df['Higher Education'].values if edu not in ivy_league]

# Create the Venn Diagram
plt.figure(figsize=(10, 8))  # Set the figure size to make the Venn Diagram bigger
venn2(subsets=(set(ivy_league), set(non_ivy_league)), set_labels=('Ivy League', 'Non-Ivy League'))

# Add title and legend
plt.title('Educational Background of U.S. Presidents - Ivy League vs. Non-Ivy League', fontsize=16)
plt.legend(['Ivy League', 'Non-Ivy League'], fontsize=14)

# Display the Venn Diagram
plt.show()

3. Interactive Tool¶

Using the interactive tool, you can select a U.S. President from the dropdown menu to discover their specific higher education background. Once selected, the tool will display the President's name along with their higher education institution in a visually appealing plot.

In [87]:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Higher Education'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Higher Education'])

# Create a dictionary to map presidents to their higher education values
president_higher_edu_dict = dict(zip(pres_merged_df['President_x'], pres_merged_df['Higher Education']))

# Function to plot the higher education value for a specific president
def plot_higher_education(president_name):
    higher_education = president_higher_edu_dict.get(president_name, 'Higher education data not available')

    # Create the plot
    fig = go.Figure()

    # Add a text annotation to display the higher education value
    fig.add_annotation(
        text=f"Higher Education: <span style='color: #2b9434;'>{higher_education}</span>",
        xref="paper",
        yref="paper",
        x=0.5,
        y=0.5,
        showarrow=False,
        font=dict(size=16),
        align='center',
    )

    # Update layout for better appearance
    fig.update_layout(
        title=f"Higher Education of U.S. President - {president_name}",
        xaxis=dict(visible=False),
        yaxis=dict(visible=False),
        width=1400,
        height=400,
        template='plotly_dark',  # Use dark theme
        margin=dict(t=100),
    )

    # Show the plot
    fig.show()

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(plot_higher_education, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

¶

Exploring Occupations of U.S. Presidents: A Visual Analysis

1. Word Cloud: What Were the Most Frequent Occupations of U.S. Presidents?¶

The Word Cloud visually presents the most frequent occupations held by U.S. Presidents throughout history. Each occupation's size within the cloud is proportional to its frequency, offering an immediate glimpse of the dominant professions among Presidents.

In [88]:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with column 'Occupation' and other relevant columns

# Split the values in the 'Occupation' column and create a new DataFrame with all the occupations
occupations_df = pres_merged_df['Occupation'].str.split(', ', expand=True)

# Flatten the DataFrame into a single Series and get the value counts
occupation_counts = occupations_df.melt(value_name='Occupation').groupby('Occupation').size()

# Sort the occupations by frequency in descending order
top_occupations = occupation_counts.sort_values(ascending=False).head(4)

# Create the Word Cloud with custom settings
wordcloud = WordCloud(
    width=1200,
    height=600,
    background_color='black',
    colormap='viridis',  # Choose a color map for the Word Cloud
    contour_color='steelblue',  # Set contour color for better visibility
    contour_width=2,  # Set contour width
    max_words=150,  # Set the maximum number of words in the Word Cloud
    prefer_horizontal=0.8,  # Set the ratio of horizontal to vertical words
).generate_from_frequencies(top_occupations)

# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Top 4 Occupations of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()

2. Bar Chart: How Do Occupations Vary Among U.S. Presidents?¶

The Bar Chart displays the distribution of occupations among U.S. Presidents, extracted by splitting the values in the 'Occupation' column. Each occupation is represented as a bar, and its height corresponds to its frequency. The chart allows us to observe the range of professions and understand which ones have been more prevalent throughout history.

In [89]:
import pandas as pd
import plotly.express as px

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with the column 'Occupation' and other relevant columns

# Split the values in the 'Occupation' column and create a new DataFrame with all the occupations
occupations_df = pres_merged_df['Occupation'].str.split(', ', expand=True)

# Create a list of presidents for each occupation
presidents_by_occupation = {}
for col in occupations_df.columns:
    for index, president in enumerate(occupations_df[col].dropna()):
        if president not in presidents_by_occupation:
            presidents_by_occupation[president] = []
        presidents_by_occupation[president].append(pres_merged_df['President_x'].iloc[index])

# Get the frequency of each occupation
occupation_counts = occupations_df.melt(value_name='Occupation').groupby('Occupation').size()

# Sort the occupations by frequency in descending order
occupation_counts = occupation_counts.sort_values(ascending=False)

# Create the bar chart
fig = px.bar(
    x=occupation_counts.index,
    y=occupation_counts.values,
    labels={'x': 'Occupation', 'y': 'Frequency'},
    title='Frequency of Jobs of U.S. Presidents',
    color=occupation_counts.index,  # Use different colors for each occupation
    hover_name=occupation_counts.index,  # Show occupation names in the tooltip
    hover_data={"Presidents": [", ".join(presidents_by_occupation[occupation]) for occupation in occupation_counts.index]},
)

# Customize the appearance of the chart
fig.update_layout(
    xaxis_title="Occupation",
    yaxis_title="Frequency",
    xaxis_tickangle=-45,
    hoverlabel=dict(bgcolor="white", font_size=12),
    template='plotly_dark',  # Set the plot template to a dark theme
    height = 600,
)

# Display the interactive plot
fig.show()

3. Interactive Tool: What Was the Occupation of a Specific U.S. President?¶

With the Interactive Tool, you can select a U.S. President from the dropdown menu and discover their respective occupation. The tool offers an engaging way to explore individual Presidents' backgrounds and learn more about their professions before entering politics.

In [90]:
import pandas as pd
import plotly.graph_objects as go
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Occupation'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Occupation'])

# Create a dictionary to map presidents to their occupations
president_occupation_dict = dict(zip(pres_merged_df['President_x'], pres_merged_df['Occupation']))

# Function to plot the occupation value for a specific president
def plot_occupation(president_name):
    occupation = president_occupation_dict.get(president_name, 'Occupation data not available')

    # Create the plot
    fig = go.Figure()

    # Add a text annotation to display the occupation value
    fig.add_annotation(
        text=f"Occupation: <span style='color: #2b9434;'>{occupation}</span>",
        xref="paper",
        yref="paper",
        x=0.5,
        y=0.5,
        showarrow=False,
        font=dict(size=16),
        align='center',
    )

    # Update layout for better appearance
    fig.update_layout(
        title=f"Occupation of U.S. President - {president_name}",
        xaxis=dict(visible=False),
        yaxis=dict(visible=False),
        width=1400,
        height=400,
        template='plotly_dark',  # Use dark theme
        margin=dict(t=100),
    )

    # Show the plot
    fig.show()

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(plot_occupation, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

4. Interactive Tool to Identify Most Frequent Occupations by Party: Which Occupations Prevail Within Each Political Party?¶

This interactive tool lets you choose a political party from the dropdown menu and explores the most frequent occupations associated with that party's Presidents. By analyzing different parties' dominant professions, you can gain insights into how political ideologies may influence occupational choices among U.S. Presidents.

In [91]:
import pandas as pd
import plotly.express as px
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Party', and 'Occupation'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Occupation', 'Party'])

# Function to split occupations and calculate the most frequent occupation for each party
def get_most_frequent_occupation(selected_party):
    # Filter DataFrame based on the selected party
    party_df = pres_merged_df[pres_merged_df['Party'] == selected_party]

    # Split occupations and create a list of all occupation tokens
    all_occupations = [occupation.strip() for occupations in party_df['Occupation'] for occupation in occupations.split(',')]

    # Count the occurrences of each occupation
    occupation_counts = pd.Series(all_occupations).value_counts()

    # Get the most frequent occupation
    most_frequent_occupation = occupation_counts.index[0]

    # Create the bar plot
    fig = px.bar(
        occupation_counts,
        x=occupation_counts.index,
        y=occupation_counts.values,
        labels={'x': 'Occupation', 'y': 'Frequency'},
        title=f"Most Frequent Occupation for {selected_party} Party",
        color=occupation_counts.index,
    )

    # Rotate x-axis labels for better readability
    fig.update_layout(xaxis_tickangle=-45)

    # Show the plot
    fig.show()

# Get unique political parties from the DataFrame
parties = pres_merged_df['Party'].unique()

# Create an interactive dropdown menu to select the party
interact(get_most_frequent_occupation, selected_party=widgets.Dropdown(options=parties));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

¶

Exploring U.S. Presidents' Military Service - TreeMap Visualization

Have you ever wondered about the diverse military experiences of U.S. Presidents? In this TreeMap visualization, we delve into the fascinating world of military service among our nation's leaders. The TreeMap offers an innovative and visually captivating way to understand the various military roles that Presidents have undertaken throughout history.

What is a TreeMap?¶

A TreeMap is a unique chart that presents hierarchical data in the form of nested rectangles. Each rectangle's size is proportional to the value of the data it represents, offering an intuitive visualization of the data's distribution.

In [92]:
import pandas as pd
import plotly.express as px

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Military Service'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Military Service'])

# Split the military service data in each row and strip leading/trailing spaces
military_service_split = pres_merged_df['Military Service'].str.split(',').apply(lambda x: [service.strip() for service in x])

# Create a DataFrame to store the data for the TreeMap
data = pd.DataFrame(columns=['Service', 'President', 'Value'])

# Process the data to populate the DataFrame
for i, service_list in enumerate(military_service_split):
    for service in service_list:
        if service:
            data = data.append({'Service': service, 'President': pres_merged_df.iloc[i]['President_x'], 'Value': 1}, ignore_index=True)

# Create the TreeMap
fig = px.treemap(data, path=['Service', 'President'], values='Value')

# Update layout for better appearance
fig.update_layout(
    title="Military Service of U.S. Presidents - TreeMap",
    margin=dict(t=100),
    template='plotly_dark',  # Use dark theme
    height = 600,
)

# Show the TreeMap
fig.show()

¶

Which Previous Offices Did U.S. Presidents Hold Before Their Presidency

Word Cloud - Most Frequent Previous Offices of U.S. Presidents¶

This Word Cloud depicts the most frequent previous offices held by U.S. Presidents before assuming the presidency. The size of each office in the Word Cloud is determined by its frequency in the dataset. A black background is used to provide a striking contrast to the vibrant colors of the Word Cloud. The Word Cloud offers a quick and intuitive understanding of the diverse career paths of U.S. Presidents before their presidency.

In [93]:
import pandas as pd
import nltk
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'

# Download the 'punkt' tokenizer
nltk.download('punkt')

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Previous Office'])

# Concatenate all the 'Previous Office' values into a single string
previous_office_text = ' '.join(pres_merged_df['Previous Office'])

# Split the text into individual words
words = nltk.word_tokenize(previous_office_text)

# Calculate the frequency of each word
word_freq = nltk.FreqDist(words)

# Get the most frequent words and their frequencies
most_common_words = word_freq.most_common(10)

# Create a dictionary to hold the most frequent words and their frequencies
wordcloud_data = dict(most_common_words)

# Create the Word Cloud with custom settings
wordcloud = WordCloud(
    width=1200,
    height=600,
    background_color='black',  # Set the background color to black
    colormap='viridis',  # Choose a color map for the Word Cloud
    contour_color='white',  # Set contour color for better visibility
    contour_width=2,  # Set contour width
    max_words=150,  # Set the maximum number of words in the Word Cloud
    prefer_horizontal=0.8,  # Set the ratio of horizontal to vertical words
).generate_from_frequencies(wordcloud_data)

# Display the Word Cloud using matplotlib
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Most Frequent Previous Offices of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
In [94]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from wordcloud import WordCloud

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'

# Concatenate all the 'Previous Office' values into a single string
previous_office_text = ' '.join(pres_merged_df['Previous Office'].dropna())

# Read the mask image
mask_img = mpimg.imread(r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\images\course1_1\png\fantasy-2506830_1280.jpg')

# Create a WordCloud object with the custom shape mask
wordcloud = WordCloud(
    width=1200,
    height=600,
    background_color='black',  # Set the background color to black
    colormap='tab20c',  # Choose a custom color map for the Word Cloud
    contour_color='white',  # Set contour color for better visibility
    contour_width=2,  # Set contour width
    max_words=150,  # Set the maximum number of words in the Word Cloud
    prefer_horizontal=0.8,  # Set the ratio of horizontal to vertical words
    mask=mask_img,  # Use the custom mask for the Word Cloud
).generate(previous_office_text)

# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Previous Offices of U.S. Presidents - Word Cloud", fontsize=20, color='white')
plt.show()

Sunburst Chart - U.S. Presidents and Their Parties¶

The Sunburst Chart represents the political parties of U.S. Presidents throughout history. Each layer of the Sunburst Chart represents a hierarchical structure, with the outermost layer showing the parties and the inner layers displaying sub-parties. The interactive nature of the chart allows readers to explore and gain insights into the complex political affiliations of U.S. Presidents over time. The chart is designed with a dark theme and eye-catching colors, making it visually engaging and user-friendly.

In [95]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Previous Office'

# Filter out missing data
pres_merged_df = pres_merged_df.dropna(subset=['Previous Office'])

# Create a DataFrame to store the frequency of each office
office_counts = pres_merged_df['Previous Office'].value_counts().reset_index()
office_counts.columns = ['Previous Office', 'Count']

# Create a Sunburst Chart
fig = go.Figure(go.Sunburst(
    labels=office_counts['Previous Office'],
    parents=[''] * len(office_counts),  # Empty string as parent for all categories
    values=office_counts['Count'],
))

# Update the layout for better appearance
fig.update_layout(
    title='Previous Offices of U.S. Presidents - Sunburst Chart',
    margin=dict(t=50),
    height=800,
    uniformtext=dict(minsize=12, mode='hide'),
)

# Show the chart
fig.show()

¶

Interactive Economy Data Display Tool

This interactive tool provides a fascinating way to explore the economy-related data for different U.S. presidents. The tool is designed to analyze and present the 'Economy' column data from the DataFrame called 'pres_merged_df.'

How to Use the Tool¶

  1. Select a President: Start by using the dropdown menu to choose a president from the list of available names. The dropdown includes all the unique president names present in the dataset.

  2. Explore Economy Data: After selecting a president, the tool will display their respective economy data. It gathers the information from the 'Economy' column in the DataFrame.

  3. Data Availability: In case economy data for a specific president is unavailable, the tool gracefully informs users with a message indicating that the data is not available.

In [96]:
import pandas as pd
from ipywidgets import interact, widgets

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Economy'

# Function to display the economy data for a specific president
def display_economy_data(president_name):
    economy_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Economy'].values
    if len(economy_data) == 0:
        economy_data = ['Economy data not available']
    else:
        # Split the economy data based on commas and create a list
        economy_data = [item.strip() for data in economy_data for item in data.split(',')]
    
    # Create a widget to display the economy data
    economy_widget = widgets.HTML(value='<br>'.join(economy_data))
    
    # Set the layout and style of the widget
    economy_widget.layout.overflow_x = 'hidden'
    economy_widget.layout.max_height = '500px'
    economy_widget.layout.overflow_y = 'auto'
    economy_widget.layout.border = '2px solid #ccc'
    economy_widget.layout.border_radius = '5px'
    economy_widget.layout.padding = '10px'
    economy_widget.layout.margin = '10px'
    economy_widget.layout.background = '#f9f9f9'
    
    # Display the widget
    display(economy_widget)

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(display_economy_data, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png

¶

Exploring U.S. Presidents: Their Foreign Affairs

Word Cloud for Foreign Affairs¶

In this visualization, we present a captivating Word Cloud that showcases the prominent words related to "Foreign Affairs" during the tenure of U.S. Presidents. The Word Cloud beautifully illustrates the most frequently mentioned terms, with larger and bolder fonts signifying higher occurrence.

In [97]:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'

# Concatenate all the 'Foreign Affairs' values into a single string
foreign_affairs_text = ' '.join(pres_merged_df['Foreign Affairs'].dropna())

# Create the Word Cloud with custom settings
wordcloud = WordCloud(
    width=1200,
    height=600,
    background_color='white',  # Set the background color to white
    colormap='tab20',  # Choose a custom color map for the Word Cloud
    contour_color='black',  # Set contour color for better visibility
    contour_width=2,  # Set contour width
    max_words=150,  # Set the maximum number of words in the Word Cloud
    prefer_horizontal=0.8,  # Set the ratio of horizontal to vertical words
).generate(foreign_affairs_text)

# Display the Word Cloud using matplotlib
plt.figure(figsize=(24, 12))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title("Foreign Affairs of U.S. Presidents - Word Cloud", fontsize=16, color='black')
plt.show()

Word Scatter Plot for Foreign Affairs¶

The Word Scatter Plot provides an engaging way to explore the textual data on "Foreign Affairs" concerning various U.S. Presidents. Each word is represented as a point in the scatter plot, and its size corresponds to its frequency. The colors on the plot enhance visual appeal and make it easier to distinguish between different words.

In [98]:
import pandas as pd
import plotly.graph_objects as go
import random
from collections import Counter
from nltk.corpus import stopwords
import nltk
nltk.download('stopwords')

# Get the 'Foreign Affairs' text
foreign_affairs_text = ' '.join(pres_merged_df['Foreign Affairs'].dropna())

# Preprocess the text (You may need to customize this based on your data)
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def preprocess_text(text):
    # Remove punctuation and convert to lowercase
    text = text.translate(str.maketrans('', '', string.punctuation)).lower()
    # Tokenize the text
    words = word_tokenize(text)
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    words = [word for word in words if word not in stop_words]
    return words

# Tokenize and preprocess the text
words = preprocess_text(foreign_affairs_text)

# Calculate word frequencies
word_freq = Counter(words)

# Create a DataFrame to store the word frequencies
word_freq_df = pd.DataFrame(word_freq.items(), columns=['Word', 'Frequency'])

# Sort the DataFrame by frequency in descending order
word_freq_df = word_freq_df.sort_values(by='Frequency', ascending=False)

# Create the Word Scatter Plot
fig = px.scatter(
    word_freq_df,
    x='Frequency',
    y='Frequency',
    text='Word',
    labels={'Frequency': 'Word Frequency'},
    title='Word Scatter Plot - Foreign Affairs',
    hover_name='Word',
    hover_data={'Frequency': True},
)

# Customize the appearance of the plot
fig.update_traces(textposition='top center', textfont_size=12)

fig.update_layout(height=600)
# Show the plot
fig.show()
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Interactive Tool for Exploring Foreign Affairs¶

This interactive tool enables readers to select a specific U.S. President and view their respective "Foreign Affairs" data. Upon choosing a president from the dropdown menu, a neat list is presented, containing the various matters related to foreign affairs that were documented during their tenure. The list is displayed as an ordered list for clarity, with the numbers indicating the order of occurrence.

In [99]:
import pandas as pd
from ipywidgets import interact, widgets
import re

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'

# Function to display the 'Foreign Affairs' data for a specific president
def display_foreign_affairs_data(president_name):
    foreign_affairs_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Foreign Affairs'].values
    if len(foreign_affairs_data) == 0:
        foreign_affairs_data = ['Foreign Affairs data not available']
    else:
        # Use regex to extract text between square brackets and split the text based on commas
        foreign_affairs_data = [item.strip() for data in foreign_affairs_data for item in re.findall(r'\[([^]]+)\]', data)[0].split(',')]

    # Remove empty strings and quotes from the list
    foreign_affairs_data = [item.replace("'", "").replace('"', '') for item in foreign_affairs_data if item]

    # Create an ordered list of the 'Foreign Affairs' data
    list_html = '<ol style="list-style-position: inside;">'
    for item in foreign_affairs_data:
        list_html += f'<li>{item}</li>'
    list_html += '</ol>'

    # Create a widget to display the 'Foreign Affairs' data as an ordered list
    foreign_affairs_widget = widgets.HTML(value=list_html)

    # Set the layout and style of the widget
    foreign_affairs_widget.layout.overflow_x = 'hidden'
    foreign_affairs_widget.layout.max_height = '500px'
    foreign_affairs_widget.layout.overflow_y = 'auto'
    foreign_affairs_widget.layout.border = '2px solid #ccc'
    foreign_affairs_widget.layout.border_radius = '5px'
    foreign_affairs_widget.layout.padding = '10px'
    foreign_affairs_widget.layout.margin = '10px'
    foreign_affairs_widget.layout.background = '#f9f9f9'

    # Display the widget
    display(foreign_affairs_widget)

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(display_foreign_affairs_data, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png image.png

In [100]:
import pandas as pd
from ipywidgets import interact, widgets, HTML
from ipywidgets.embed import embed_minimal_html
import re

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with a column named 'Foreign Affairs'

# Function to display the 'Foreign Affairs' data for a specific president
def display_foreign_affairs_data(president_name):
    foreign_affairs_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name, 'Foreign Affairs'].values
    if len(foreign_affairs_data) == 0:
        foreign_affairs_data = ['Foreign Affairs data not available']
    else:
        # Use regex to extract text between square brackets and split the text based on commas
        foreign_affairs_data = [item.strip() for data in foreign_affairs_data for item in re.findall(r'\[([^]]+)\]', data)[0].split(',')]

    # Remove empty strings and quotes from the list
    foreign_affairs_data = [item.replace("'", "").replace('"', '') for item in foreign_affairs_data if item]

    # Create an ordered list of the 'Foreign Affairs' data
    list_html = '<ol style="list-style-position: inside;">'
    for item in foreign_affairs_data:
        list_html += f'<li>{item}</li>'
    list_html += '</ol>'

    # Create a widget to display the 'Foreign Affairs' data as an ordered list
    foreign_affairs_widget = widgets.HTML(value=list_html)

    # Set the layout and style of the widget
    foreign_affairs_widget.layout.overflow_x = 'hidden'
    foreign_affairs_widget.layout.max_height = '500px'
    foreign_affairs_widget.layout.overflow_y = 'auto'
    foreign_affairs_widget.layout.border = '2px solid #ccc'
    foreign_affairs_widget.layout.border_radius = '5px'
    foreign_affairs_widget.layout.padding = '10px'
    foreign_affairs_widget.layout.margin = '10px'
    foreign_affairs_widget.layout.background = '#f9f9f9'

    # Display the widget
    display(foreign_affairs_widget)

    # Save the interactive widget to an HTML file
    embed_minimal_html('interactive_foreign_affairs_widget.html', views=[foreign_affairs_widget], title='Foreign Affairs Widget')

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(display_foreign_affairs_data, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png image.png

¶

Explore U.S. Presidents' Historical Data

Question: What are the key aspects of U.S. presidents' historical data?

This interactive tool allows you to explore various aspects of U.S. presidents' historical data, including their Economy, Foreign Affairs, Military Activity, Other Events, and Legacy. Select a president from the dropdown menu, and you'll get an ordered list of events and activities associated with that president in each category.

Instructions:

  1. Select a president from the dropdown menu.
  2. Explore the historical data of the selected president in different categories.
  3. Check the Economy, Foreign Affairs, Military Activity, Other Events, and Legacy sections to learn more about each president's contributions and impact.
In [102]:
import pandas as pd
import ipywidgets as widgets
import re

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x', 'Economy', 'Foreign Affairs', 'Military Activity', 'Other Events', and 'Legacy'

# Function to split data by commas and display the data for a specific president
def display_president_data(president_name):
    president_data = pres_merged_df.loc[pres_merged_df['President_x'] == president_name]
    
    # Function to split data by commas and create an ordered list
    def create_ordered_list(data_column):
        data_list = president_data[data_column].dropna().apply(lambda x: re.sub(r"[\[\]\"]", "", x)).tolist()
        data_list = [item.strip() for data in data_list for item in data.split(',')]
        data_list = [re.sub(r"[\"']+", "", item) for item in data_list]
        data_list = [f"{i+1}. {item}" for i, item in enumerate(data_list)]
        return data_list

    # Create ordered lists for each column's data
    economy_list = create_ordered_list('Economy')
    foreign_affairs_list = create_ordered_list('Foreign Affairs')
    military_activity_list = create_ordered_list('Military Activity')
    other_events_list = create_ordered_list('Other Events')
    legacy_list = create_ordered_list('Legacy')
    
    # Display the data in an interactive widget
    economy_widget = widgets.HTML(value=f"<b>Economy:</b><br>{'<br>'.join(economy_list)}")
    foreign_affairs_widget = widgets.HTML(value=f"<b>Foreign Affairs:</b><br>{'<br>'.join(foreign_affairs_list)}")
    military_activity_widget = widgets.HTML(value=f"<b>Military Activity:</b><br>{'<br>'.join(military_activity_list)}")
    other_events_widget = widgets.HTML(value=f"<b>Other Events:</b><br>{'<br>'.join(other_events_list)}")
    legacy_widget = widgets.HTML(value=f"<b>Legacy:</b><br>{'<br>'.join(legacy_list)}")
    
    # Create a tab widget to organize the data
    tab_contents = [economy_widget, foreign_affairs_widget, military_activity_widget, other_events_widget, legacy_widget]
    tab_titles = ['Economy', 'Foreign Affairs', 'Military Activity', 'Other Events', 'Legacy']
    tab = widgets.Tab()
    tab.children = tab_contents
    for i in range(len(tab_titles)):
        tab.set_title(i, tab_titles[i])
    
    # Display the tab widget
    display(tab)

# Get unique president names from the DataFrame
president_names = pres_merged_df['President_x'].unique()

# Create an interactive dropdown menu to select the president
interact(display_president_data, president_name=widgets.Dropdown(options=president_names));

Important: In case the interactive visualization isn't visible, you can refer to the images provided below. Please note that the current setup requires the code to be hosted on a server, a step that will be implemented in the future.

image.png image.png image.png

In [103]:
# understand Economy, Foreign Affairs, Military Activity, Other Events, Legacy
for i in range(0,46):
    print(pres_merged_df['Other Events'].loc[i])
    print()
['1791 Bill of Rights', '1792 Post Office founded.', '1792, 1796 Kentucky  & Tennessee joined the Union']

['1798 Alien & Sedition Act to silence critics; unpopular', '1800 Capital relocated to Washington DC', '1801 Nominated John Marshall chief justice of  U.S.']

['1803 The Louisiana purchase', '1804 12th Amendment changed Presidential election', '1804-06 Authorized Louis & Clark expedition']

['1811 Cumberland Road construction starts (first National Road)', '1817 Veto on  Bonus Bill  for funding States improvements']

['1819 Florida ceded to US', "1820 Missouri Compromise Slavery forbidden abv 36° 30'", '1820 In the election he received every electoral vote except one.']

[' Accused for "corrupt bargain" to obtain Clay\'s support in election', '1828 Baltimore/Ohio railroad']

['1830 Indian Removal Act', "1832 South Carolina's nullification crisis over taxes", '1835 "The Trail of Tears". Cherokees forced to move.']

['1838 "The Trail of Tears". Indians’ relocation, 4000 die', '1839 US vs. The Amistad: symbolic against slavery']

['1841 Delivered the longest inaugural address (105 min)', '1841 Contracted pneumonia and died in the White House one month later.']

['1841 His cabinet resigned after he vetoed banking bills', '1844 USS Princeton disaster. 8  died in Potomac,', '1845 Texas annexed followed by war with Mexico']

['1846 A large crack in the Liberty Bell.', '1848 California Gold rush']

[' The question of extending slavery to the new territories dominated', '1846 Did not approve the "Compromise of 1850"']

['1850 Compromise of 1850 and Fugitive Slave Act.']

['1853 Gadsden Purchase. Land from Mexico.', '1854 Kansas-Nebraska Act. Slavery Debate reheated.', ' "border ruffians" and "jayhawkers" clash in Kansas']

['1857 Dred Scott decision: States can decide on slavery', '1857 Mormons challenged federal authority in Utah.', '1860 Sth Carolina seceded. 7  states followed.']

['1863 Emancipation Proclamation, freeing slaves', '1863 Gettysburg Address', '1865 Assassinated by John Wilkes Booth']

['1865 Amnesty', '1867 Reconstruction Act & Office Tenure Act by Congress', ' Nebraska in the union', '1867 Purchase of Alaska', '1868 Impeachment']

['1871 Civil Service', '1870-71 Enforcement Acts broke Ku Klux Klan', '1875 Civil Rights Act', ' Scandals: Credit Mobilier, Tweed Ring, Whiskey Ring']

['1877 Reconstruction end. Army withdrew from the South', '1877 Railroad strikes and use of troops', '1877 Desert Land Act']

['1881 On July 2, he was shot by Charles Julius Guiteau.', '1881 Garfield died of blood poisoning on September 19.']

['1883 Pendleton Act: Civil hiring on merit']

['1886 Statue of Liberty', ' Curtailed largess of war veterans pensions', '1887 Anti-Polygamy Act', '1887 Dawes Severalty Act - destroys Indian governments']

['1889 Opening of Oklahoma to 20,000 settlers', '1889-90 6 states admitted to the Union', '1891 Forest Reserve Act; Forest reserves are public.']

['1893 Pullman strike.']

['1898 Yellow Journalism (Hyped Maine)', '1898 Hawaii annexed', '1901 On Sep 6, he was shot by an anarchist in Buffalo and died 8 days later.']

[' Conservation becomes an issue. Creation of National parks & forests', '1906 Pure Food & Drug Act - Meat Inspection Act: New Safety standards']

[' Record antitrust suits', '1912 New states: Arizona & New Mexico.', '1912 US dept. of Commerce created']

['1916 Child labor curtailed', '1916 Federal Farm Loan Act; cheap loans to farmers', '1920 Prohibition', '1920 19th Amendment, Women win the right to  vote']

['1921 Federal Highway Act  - the age of the "motor car"', '1922 Great Railway strike', ' Bureau of Veterans Affairs', ' Teapot Dome scandal and many others']

['1924 Immigration Act limits immigrants from South & East Europe', '1924 Snyder Act-Indians get citizenship', '1927 Mississippi flood']

['1932 Reconstruction Finance Corporation to provide business loans.', '1932 The "Bonus Army" incident. Veterans were killed.']

['1933 First 100 days legislation frenzy', '1933 1st New Deal: acts on relief, recovery, reform', '1935 2nd New Deal: WPA, Social Security,Labor support']

['1945 Fair Deal: health care, civil rights etc.', '1947 Pres. Succession Act', '1947 CIA established.', '1951 Dismissal of Gen. Douglas MacArthur']

[' Alaska and Hawaii admitted as states', '1957 Sent Troops to Little Rock to enforce integration', '1958 NASA established.', '1960 Civil Rights']

['1961 Peace Corps program', '1961 "Moon race" starts', '1963 "Washington March"', '1963 Assassinated In Dallas by Lee Harvey Oswald']

['1964 The Civil Rights Act', '1964 Great Society & War on Poverty programs', '1963-65 Miranda case', ' Urban riots / antiwar riots', '1968 M. Luther King killed']

['1969 Moon landing', '1970 Environment Act', '1973 Spiro Agnew resigned', '1973 Watergate scandal', '1974 Resigned']

['1974 Granted a pardon to  Nixon.', '1975 Airlift of 237,000 Vietnamese refugees']

[' Pardoned  Vietnam War draft evaders', ' Energy Department', ' Boycott of 1980 Olympics']

['1981 Assassination attempt by John W. Hinkley,', '1981 Fired 11,345 striking air traffic controllers.', '1986 War on Drugs']

['1990 Americans with Disabilities Act', '1990 Immigration Act; result: increase of legal immigration 40%']

["1993 “Don't ask, don't tell”- gays in the military.", ' Monica Lewinsky scandal & Impeachment']

['2001 9/11', '2001 Patriot Act', '2002 “no child left behind” law to improve education', '2005 Hurricane Katrina']

['2010 Healthcare reform: Affordable Care Act (Obamacare)', "2010 End of the “Don't ask, don't tell policy” for LBGT in the military", '2012 Same sex couples now have the right to be married.']

['2020 Covid-19 pandemic', ' Impeached twice', '2021 US Capitol was stormed by Trump supporters.']

['Biden pledged to double climate funding to developing countries by 2024']

¶

What are the Most Frequent U.S. President Names and Their Party Distribution?

To visualize the most common U.S. president names and the distribution of political parties associated with each name, we have created an interactive horizontal bar chart. The chart shows the frequency of occurrence for each president name and, when you hover over a bar, it displays the party names and their respective counts along with the percentage they represent for that specific president name.

In [104]:
import pandas as pd
import plotly.graph_objects as go
import re

# Assuming you have a DataFrame called 'pres_merged_df' containing the data
# with columns: 'President_x' and 'Party'

# Filter out NaN (float) values from the 'President_x' column
pres_merged_df = pres_merged_df.dropna(subset=['President_x'])

# Convert the 'President_x' column to string type
pres_merged_df['President_x'] = pres_merged_df['President_x'].astype(str)

# Split the names of the presidents into individual words using regex
pres_merged_df['President_x'] = pres_merged_df['President_x'].apply(lambda x: re.findall(r'\w+', x))

# Flatten the list of names
all_names = [name for sublist in pres_merged_df['President_x'] for name in sublist]

# Count the frequency of each name
name_freq = pd.Series(all_names).value_counts()

# Create a DataFrame to store president names and their frequencies
president_data = pd.DataFrame({'Name': name_freq.index, 'Frequency': name_freq.values})

# Filter out names with frequency less than 3
president_data = president_data[president_data['Frequency'] >= 3]

# Create a new column in the DataFrame to store party information
president_data['Party'] = None

# Function to get the party for each president name
def get_party(name):
    party = pres_merged_df.loc[pres_merged_df['President_x'].apply(lambda x: name in x), 'Party'].values
    return party[0] if len(party) > 0 else None

# Apply the function to get the party information for each president name
president_data['Party'] = president_data['Name'].apply(get_party)

# Check if the president_data DataFrame is not empty
if not president_data.empty:
    # Calculate the percentage of each party for each president name
    party_percentages = []
    for name, freq in zip(president_data['Name'], president_data['Frequency']):
        party_counts = pres_merged_df.loc[pres_merged_df['President_x'].apply(lambda x: name in x), 'Party'].value_counts()
        party_percentage = [f"{party}: {count} ({(count / freq * 100):.1f}%)" for party, count in party_counts.items()]
        party_percentages.append(', '.join(party_percentage))

    # Create the interactive horizontal bar plot using Plotly
    fig = go.Figure()

    fig.add_trace(
        go.Bar(
            x=president_data['Frequency'],
            y=president_data['Name'],
            name='Presidents',
            orientation='h',
            hovertext=party_percentages
        )
    )

    # Update layout for better appearance
    fig.update_layout(
        title="Frequency of U.S. President Names with Party Information",
        xaxis_title="Frequency",
        yaxis_title="President Names",
        hovermode='closest',
        barmode='stack',
        showlegend=False,  # No need to show the legend as there's only one trace
    )

    # Show the plot
    fig.show()
else:
    print("No data available for the selected condition.")
In [105]:
# Merge pres_df_1, pres_df_3, and pres_df_4 based on the column 'President'
pres_merged_df = pres_df_1.merge(pres_df_4, left_index=True, right_index=True)

¶

What is the distribution of the age at marriage for the First Ladies of the United States, and who were the Presidents they married?

The horizontal bar chart above visualizes the age at marriage for the First Ladies of the United States. Each bar represents a First Lady, and the length of the bar indicates her age at the time of marriage. The hover text on each bar provides additional information, including the name of the President she married and her age at marriage.

By exploring this visualization, we can gain insights into the age at which the First Ladies married and identify the Presidents they were married to.

In [107]:
import pandas as pd
import matplotlib.pyplot as plt


# Calculate the age at marriage for the First Lady
pres_df_2['Age at Marriage, First Lady'] = (pres_df_2['Date of Marriage'] - pres_df_2['Date of Born, First Lady']).dt.days // 365

# Plot the distribution of age at marriage
plt.figure(figsize=(20, 12))
plt.hist(pres_df_2['Age at Marriage, First Lady'], bins=20, edgecolor='black', alpha=0.7)

# Calculate and plot the median, mean, and mode
median_age = pres_df_2['Age at Marriage, First Lady'].median()
mean_age = pres_df_2['Age at Marriage, First Lady'].mean()
mode_age = pres_df_2['Age at Marriage, First Lady'].mode().values[0]

plt.axvline(median_age, color='red', linestyle='dashed', linewidth=2, label=f'Median Age: {median_age:.1f}')
plt.axvline(mean_age, color='green', linestyle='dashed', linewidth=2, label=f'Mean Age: {mean_age:.1f}')
plt.axvline(mode_age, color='blue', linestyle='dashed', linewidth=2, label=f'Mode Age: {mode_age:.1f}')

plt.xlabel('Age at Marriage of First Lady')
plt.ylabel('Frequency')
plt.title('Distribution of Age at Marriage of First Ladies')
plt.legend()
plt.grid(True)
plt.show()
In [108]:
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_df_2' containing the data
# with columns: 'First Lady Name', 'Age at Marriage, First Lady', and 'President'

# Filter out rows with missing values in 'Age at Marriage, First Lady' column
pres_df_2 = pres_df_2.dropna(subset=['Age at Marriage, First Lady'])

# Create the horizontal bar chart using Plotly
fig = go.Figure()

# Add the bar data to the figure
fig.add_trace(go.Bar(
    y=pres_df_2['First Lady Name'],
    x=pres_df_2['Age at Marriage, First Lady'],
    orientation='h',
    text=pres_df_2.apply(lambda row: f"President: {row['President']} <br> First Lady: {row['First Lady Name']} <br> Age at Marriage: {row['Age at Marriage, First Lady']}", axis=1),
    hoverinfo='text',  # Show custom hover text
    marker=dict(color='skyblue'),
    opacity=0.8
))

# Update layout for better appearance
fig.update_layout(
    title="Age at Marriage, First Lady",
    xaxis_title="Age at Marriage (years)",
    yaxis_title="First Lady Name",
    showlegend=False,
    bargap=0.1,
    height=1400
)

# Show the plot
fig.show()

¶

Height vs. Weight with Body Mass Index and Political Party

Question: What is the relationship between the height and weight of U.S. Presidents, and how does their Body Mass Index (BMI) and political party affiliation play a role?

To explore this relationship, we have created a scatter plot that showcases the height and weight of U.S. Presidents. The size of each circle in the plot represents the Body Mass Index (BMI) of the respective President, while the color of the circle corresponds to their political party affiliation. The hover text provides additional details, including the President's name, height, weight, political party, Body Mass Index (BMI), and the Body Mass Index Range they fall into.

The plot allows us to observe any potential patterns or trends regarding the height, weight, BMI, and political party affiliations of U.S. Presidents.

In [109]:
import pandas as pd
import plotly.express as px

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns: 'President', 'height_cm', 'weight_kg', 'body_mass_index', and 'political_party'

# Filter out NaN (missing) values
pres_df_3 = pres_df_3.dropna(subset=['height_cm', 'weight_kg', 'body_mass_index', 'political_party'])

# Create a dictionary to map political parties to colors
party_colors = {
    'Democratic': 'blue',
    'Republican': 'red',
    # Add more parties and their corresponding colors here
}

# Map party names to colors using the dictionary
pres_df_3['Party Color'] = pres_df_3['political_party'].map(party_colors)

# Calculate the Body Mass Index (BMI) range for each person
def calculate_bmi_range(bmi):
    if bmi < 18.5:
        return 'Underweight'
    elif 18.5 <= bmi < 24.9:
        return 'Normal Weight'
    elif 25 <= bmi < 29.9:
        return 'Overweight'
    else:
        return 'Obese'

#pres_df_3['body_mass_index_range'] = pres_df_3['body_mass_index'].apply(calculate_bmi_range)

# Create the hover text with the desired information
hover_text = pres_df_3.apply(
    lambda row: f"President: {row['President']}<br>"
                f"Height: {row['height_cm']} cm<br>"
                f"Weight: {row['weight_kg']} kg<br>"
                f"Party: {row['political_party']}<br>"
                f"Body Mass Index: {row['body_mass_index']}<br>"
                f"Body Mass Index Range: {row['body_mass_index_range']}",
    axis=1
)

# Create the scatter plot
fig = px.scatter(
    pres_df_3,
    x='height_cm',
    y='weight_kg',
    size='body_mass_index',
    color='political_party',
    color_discrete_map=party_colors,  # Assign colors to parties
    hover_name='President',  # Display the President name on hover
    custom_data=['height_cm', 'weight_kg', 'body_mass_index', 'political_party', 'body_mass_index_range'],
    title="Height vs. Weight with Body Mass Index and Political Party",
    labels={'height_cm': 'Height (cm)', 'weight_kg': 'Weight (kg)'},
)

# Update hover information
fig.update_traces(
    hovertemplate="<br>".join([
        "President: %{hovertext}",
        "Height: %{customdata[0]} cm",
        "Weight: %{customdata[1]} kg",
        "Party: %{customdata[3]}",
        "Body Mass Index: %{customdata[2]}",
        "Body Mass Index Range: %{customdata[4]}"
    ])
)

# Update the layout for better appearance
fig.update_layout(
    autosize=False,  # Turn off automatic sizing
    width=1500,       # Set the width of the plot
    height=700,      # Set the height of the plot
)

# Show the plot
fig.show()

¶

Which States Produced the Most U.S. Presidents?

How Many U.S. Presidents Were Born in Each State? - Political Affiliation Bar Chart¶

Delve into the number of U.S. Presidents born in various states and identify the political affiliations associated with each state. This insightful bar chart illustrates the count of Presidents born in different states, with each bar color-coded based on the dominant political party. Hover over each bar to explore the Presidents born in that state and the percentage of each political party represented. The legend offers a visual reference for the party colors. Gain valuable insights into the states that significantly contributed to the nation's highest office and the political context that shaped their leadership.

In [110]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns: 'birth_state' and 'political_party'

# Calculate the count of presidents born in each state
state_counts = pres_df_3['birth_state'].value_counts()

# Create a new DataFrame to store the hover text and color information
hover_text_data = pd.DataFrame(columns=['State', 'Presidents', 'Party Percentage', 'Color'])

# Define party colors
party_colors = {'Democrat': 'blue', 'Republican': 'red'}

# Iterate over each state and calculate the hover text and color information
for state in state_counts.index:
    presidents_in_state = pres_df_3[pres_df_3['birth_state'] == state]['President']
    party_percentage = pres_df_3[pres_df_3['birth_state'] == state]['political_party'].value_counts(normalize=True)
    party_percentage_text = '<br>'.join([f"{party}: {percentage:.2f}" for party, percentage in party_percentage.items()])
    max_party = party_percentage.idxmax()
    color = party_colors.get(max_party, 'green')
    hover_text_data = hover_text_data.append({'State': state, 'Presidents': ', '.join(presidents_in_state), 'Party Percentage': party_percentage_text, 'Color': color}, ignore_index=True)

# Create the bar chart using Plotly
fig = go.Figure()

fig.add_trace(
    go.Bar(
        x=state_counts.index,
        y=state_counts.values,
        hovertext=[f"Presidents: {presidents}<br>Party Percentage:<br>{party_percentage}" for presidents, party_percentage in zip(hover_text_data['Presidents'], hover_text_data['Party Percentage'])],
        hoverinfo='text',
        marker=dict(color=hover_text_data['Color']),
        name='Party',  # Legend name for the colors
    )
)

# Update layout for better appearance
fig.update_layout(
    title="Number of Presidents Born in Each State",
    xaxis_title="State",
    yaxis_title="Number of Presidents",
    hovermode='closest',
    barmode='stack',
    showlegend=True,  # Show the legend for the colors
)

# Show the plot
fig.show()

Which States Produced the Most U.S. Presidents? - Interactive Birthplace Map¶

Explore the birthplaces of U.S. Presidents and uncover the states that have contributed the most to America's leadership. This interactive map allows you to visualize the distribution of President birthplaces across different states. Hover over each state to discover the Presidents born there and the dominant political party in that state. The color legend provides clarity on the parties' representation. Discover the geographical origins of U.S. Presidents and the political landscape that shaped their rise to power.

In [111]:
import pandas as pd
import folium
from folium import Choropleth, GeoJson
import geopandas as gpd

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column named 'birth_state' containing the names of states where presidents were born

# Calculate the most common states where presidents were born
most_common_states = pres_df_3['birth_state'].value_counts()

# Shapefile location
states_shapefile = r'C:\Users\User\Desktop\GitHub-projects\projects\Data-Dives-Projects-Unleashed\Notebooks\course1\maps\ne_10m_admin_1_states_provinces.shp'

# Read the shapefile using geopandas with the correct encoding
gdf = gpd.read_file(states_shapefile, encoding='latin1')

# Merge the GeoDataFrame with the most common states data
merged_gdf = gdf.merge(most_common_states, left_on='name', right_index=True)

# Create a map centered on the U.S. using Folium
map_us = folium.Map(location=[37.0902, -95.7129], zoom_start=4)

# Add the choropleth map layer to the map
Choropleth(
    geo_data=merged_gdf,
    name='choropleth',
    data=most_common_states.reset_index(),
    columns=['index', 'birth_state'],  # Column names for the data
    key_on='feature.properties.name',  # Key for the GeoJSON data
    fill_color='YlGnBu',  # Color palette for the map
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Most Common States of President Births',
).add_to(map_us)

# Add state names and birth counts to the map as popups
for _, row in merged_gdf.iterrows():
    folium.Marker(
        location=[row.geometry.centroid.y, row.geometry.centroid.x],
        popup=f"{row['name']}: {row['birth_state']} Presidents Born",
        icon=folium.Icon(color='red', icon='info-sign')
    ).add_to(map_us)

# Display the map
map_us
Out[111]:
Make this Notebook Trusted to load map: File -> Trust Notebook

¶

Analyzing U.S. Presidents' Intelligence and Cognitive Abilities

1. Horizontal Bar Charts:¶

Title: Which U.S. Presidents Had the Highest Corrected IQ?

Description: This horizontal bar chart visualizes the corrected IQ scores of U.S. Presidents, sorted from the most intelligent to the least. Each bar represents a president, and the length of the bar corresponds to their IQ score. The hover text displays the president's name, their corrected IQ, and their political party. The bars are color-coded based on the political party, making it easy to identify the party affiliation of each president.

In [112]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with columns 'President', 'corrected_iq', and 'political_party'

# Sort the DataFrame by 'corrected_iq' in descending order
pres_df_3_sorted = pres_df_3.sort_values(by='corrected_iq', ascending=False)

# Create a dictionary to map political parties to colors
party_colors = {
    'Unaffiliated': 'gray',
    'Federalist': 'darkblue',
    'Democratic-Republican': 'green',
    'Democrat': 'blue',
    'Whig': 'purple',
    'Republican': 'red',
    'National Union': 'orange',
}

# Create the horizontal bar chart using Plotly
fig = go.Figure()

fig.add_trace(go.Bar(
    x=pres_df_3_sorted['corrected_iq'],
    y=pres_df_3_sorted['President'],
    text=pres_df_3_sorted.apply(
        lambda row: f"{row['President']}<br>IQ: {row['corrected_iq']}<br>Party: {row['political_party']}",
        axis=1
    ),
    hoverinfo='text',
    marker=dict(color=pres_df_3_sorted['political_party'].map(party_colors).fillna('#AAAAAA')),
    orientation='h',
))

# Update layout for better appearance
fig.update_layout(
    title="IQ Scores of U.S. Presidents from Most Intelligent to Least",
    title_font=dict(size=24),
    xaxis=dict(title="Corrected IQ"),
    yaxis=dict(title="President"),
    showlegend=False,
    width = 1400,
    height = 1700,
)

# Show the plot
fig.show()

2. Box Plot:¶

Title: Analyzing the IQ Distribution of U.S. Presidents

Description: The box plot provides a visual representation of the distribution of IQ scores among U.S. Presidents. The box represents the interquartile range (IQR), which spans from the 25th percentile (Q1) to the 75th percentile (Q3) of the IQ scores. The line inside the box represents the median IQ score. Outliers are shown as individual points beyond the whiskers, which extend to a maximum of 1.5 times the IQR. This plot helps identify the central tendency and the spread of IQ scores among the presidents.

In [113]:
import pandas as pd
import plotly.express as px

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents

# Create the interactive box plot using Plotly
fig = px.box(
    pres_df_3,
    y='corrected_iq',
    title="IQ Scores of U.S. Presidents",
    labels={'corrected_iq': 'IQ Score'},
    hover_data={'corrected_iq': True},  # Display the IQ score on hover
)

# Update layout for better appearance
fig.update_layout(
    yaxis_title="IQ Score",
    boxmode='group',  # Display multiple boxes side by side
    boxgroupgap=0.3,  # Gap between boxes in the same group
    showlegend=False,  # Hide the legend
)

# Show the plot
fig.show()

3. Radar Chart (Basic):¶

Title: Comparing U.S. Presidents' IQ Scores Across Categories

Description: The radar chart displays the IQ scores of U.S. Presidents across various categories, such as verbal intelligence, mathematical intelligence, and logical reasoning. Each spoke on the radar chart represents a category, and the distance from the center to the data point corresponds to the IQ score of the president in that category. This chart allows for a quick comparison of each president's strengths and weaknesses in different cognitive areas.

In [114]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents

# Create the radar chart using Plotly
fig = go.Figure()

fig.add_trace(go.Scatterpolar(
    r=pres_df_3['corrected_iq'],
    theta=pres_df_3['President'],
    fill='toself',
    hovertext=pres_df_3['corrected_iq'],
    hoverinfo='text',
    line=dict(color='blue')
))

# Update layout for better appearance
fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[min(pres_df_3['corrected_iq']), max(pres_df_3['corrected_iq'])],
        ),
    ),
    showlegend=False,  # Hide the legend
    title="IQ Scores of U.S. Presidents by Cognitive Abilities",
    width = 1400,
    height = 1400,
)

# Show the plot
fig.show()

4. Radar Chart 2 (Advanced):¶

Title: A Comprehensive View of U.S. Presidents' Cognitive Abilities

Description: This advanced radar chart provides a comprehensive view of U.S. Presidents' cognitive abilities by comparing their IQ scores across multiple dimensions, including verbal intelligence, mathematical intelligence, logical reasoning, memory, creativity, and problem-solving skills. Each spoke on the radar chart represents a cognitive category, and the distance from the center to the data point corresponds to the president's IQ score in that category. By visualizing these dimensions simultaneously, we can gain valuable insights into the cognitive profiles of different presidents.

In [115]:
import pandas as pd
import plotly.graph_objects as go

# Assuming you have a DataFrame called 'pres_df_3' containing the data
# with a column 'corrected_iq' representing IQ scores of presidents

# Create the radar chart using Plotly
fig = go.Figure()

for i, (_, row) in enumerate(pres_df_3.iterrows()):
    fig.add_trace(go.Scatterpolar(
        r=[row['corrected_iq']],  # Use a list with the corrected_iq value as the only element
        theta=[row['President']],
        fill='toself',
        hovertext=f"{row['President']}<br>IQ: {row['corrected_iq']}<br>Party: {row['political_party']}",
        hoverinfo='text',
        line=dict(color=party_colors.get(row['political_party'], '#AAAAAA'), width=2),
        name=row['President']
    ))

# Update layout for better appearance
fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
            range=[0, 200],
        ),
    ),
    showlegend=False,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1,
    ),
    title="IQ Scores of U.S. Presidents by Cognitive Abilities and Political Party",
    title_font=dict(size=24),
    annotations=[],  # Remove any annotations in the layout
    width = 1400,
    height = 1400,
)

# Show the plot
fig.show()

¶

6. Interpretation and Conclusion

In conclusion, the project of studying U.S. presidents has been a comprehensive exploration of various facets of leadership. Through data collection, cleaning, and integration, as well as in-depth analysis and visualization, we gained valuable insights into the intelligence, traits, and characteristics that have shaped the nation's leadership over centuries. Our findings contribute to a broader understanding of presidential effectiveness and the complex interplay between intelligence, leadership, and historical significance. By analyzing and contextualizing these patterns, we hope to shed light on the diverse factors that influence the success of U.S. presidents and contribute to informed discussions on leadership in the highest office of the nation.


📞 Contact Me¶

Feel free to reach out to me through the following platforms:

GitHub      LinkedIn      Kaggle      Medium      Website

Connect with me on these platforms to stay updated with my latest projects and articles. I look forward to connecting with you!