USA Lightning Strikes

Code language

Libraries for operations

Libraries for plotting

Python

Pandas, NumPy, datetime

Matplotlib (Pyplot), seaborn, Plotly

Code language

Python

Libraries for operations

Pandas, NumPy, datetime

Libraries for plotting

Matplotlib (Pyplot), seaborn, Plotly

In this project I dive into the occurrence of lightning strikes in the United States. The analysis is divided into 4 parts based on what I wanted to focus on. The link to the Jupyter notebook for each section of the analysis is provided under each description.

 

The analysis is divided as follows:

1987 - 2020 Comprehensive Analysis

  • Examine the range of total lightning strike counts for each year.
  • Identify outliers.
  • Plot the yearly totals on a scatterplot to visualize the outliers.
  • Further investigate the outliers to decide how to deal with them.

2016-2018 Analysis

  • Calculate weekly sums of lightning strikes and plot them on a bar graph.
  • Calculate quarterly lightning strike totals and plot them on bar graphs.
  • Perform label encoding to assign the monthly number of strikes to the following categories: mild, scattered, heavy, or severe.
  • Create a heatmap of the three years so I can get a high-level understanding of monthly lightning severity from a simple diagram.

2018 Analysis

  • Find days of the week with most strikes in 2018.
  • Calculate the total number of strikes for each month in 2018 and plot them on a bar graph.
  • Find the locations with the greatest number of strikes within a single day in 2018.
  • Examine the locations that had the greatest number of days with at least one lightning strike in 2018.
  • Determine whether certain days of the week had more lightning strikes than others and plot the results on a box-plot.
  • Calculate the number of weekly lightning strikes in 2018 and plot them on a bar graph.
  • Perform input validation to inspect the data and validate the quality of its contents, checking for null values, missing dates, a plausible range of daily lightning strikes in a location, and a geographical range that aligns with expectation.

August 2018 Analysis

  • Combine two datasets with distinct variables into a single dataframe that has all of the information from both datasets.
  • Investigate missing data in case both datasets don’t have the same number of entries for the same locations on the same dates.
  • Plot missing data as a geographic scatter plot.