## Code language

## Libraries for operations

## Libraries for plotting

Python

Pandas, NumPy, datetime

Matplotlib (Pyplot), seaborn, Plotly

## Code language

Python

## Libraries for operations

Pandas, NumPy, datetime

## Libraries for plotting

Matplotlib (Pyplot), seaborn, Plotly

In this project I dive into the occurrence of lightning strikes in the United States. The analysis is divided into 4 parts based on what I wanted to focus on. The link to the Jupyter notebook for each section of the analysis is provided under each description.

The analysis is divided as follows:

## 1987 - 2020 Comprehensive Analysis

- Examine the range of total lightning strike counts for each year.
- Identify outliers.
- Plot the yearly totals on a scatterplot to visualize the outliers.
- Further investigate the outliers to decide how to deal with them.

## 2016-2018 Analysis

- Calculate weekly sums of lightning strikes and plot them on a bar graph.
- Calculate quarterly lightning strike totals and plot them on bar graphs.
- Perform label encoding to assign the monthly number of strikes to the following categories: mild, scattered, heavy, or severe.
- Create a heatmap of the three years so I can get a high-level understanding of monthly lightning severity from a simple diagram.

## 2018 Analysis

- Find days of the week with most strikes in 2018.
- Calculate the total number of strikes for each month in 2018 and plot them on a bar graph.
- Find the locations with the greatest number of strikes within a single day in 2018.
- Examine the locations that had the greatest number of days with at least one lightning strike in 2018.
- Determine whether certain days of the week had more lightning strikes than others and plot the results on a box-plot.
- Calculate the number of weekly lightning strikes in 2018 and plot them on a bar graph.
- Perform input validation to inspect the data and validate the quality of its contents, checking for null values, missing dates, a plausible range of daily lightning strikes in a location, and a geographical range that aligns with expectation.

## August 2018 Analysis

- Combine two datasets with distinct variables into a single dataframe that has all of the information from both datasets.
- Investigate missing data in case both datasets don’t have the same number of entries for the same locations on the same dates.
- Plot missing data as a geographic scatter plot.