Seismic analysis using Python - Part 1

CudGeo
4 min readSep 17, 2022

--

Reading and pre-processing the datasets

The aim is to analyse the relationship between worldwide earthquakes, tsunamis and tectonic plate boundaries. This aim will be met by completing the following objectives:
- Mapping all the affected areas,
- Number of occurrences of earthquakes with different magnitude ranges,
- Severity of earthquakes,
- Mapping highly affected areas based on the magnitude,
- Which month has the highest earthquake occurrences?
- Which year has the highest earthquake occurrences?
- Visualizations of earthquakes and tsunami.

The seismic analysis is divided into two parts: reading and pre-processing the datasets, visualization techniques and damage grade prediction.

Photo by Jens Aber on Unsplash

1. Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.basemap import Basemap
import folium
from folium import plugins
import datetime
import plotly.express as px
import pandas_profiling
import matplotlib.pyplot as plt

2. Importing datasets

Datasets:
- Earthquake data (1965–2016) downloaded from https://www.kaggle.com/usgs/earthquake-database
- Tectonic plates data downloaded from https://www.kaggle.com/cwthompson/tectonic-plate-boundaries
- Tsunami data (2000–2017) downloaded from https://www.kaggle.com/noaa/seismic-waves

Earthquakes Data

Reading a comma-separated values (CSV) file into DataFrame.

earthquakes = pd.read_csv('database.csv')

and visualizing data using plotly.express library.

fig = px.density_mapbox(earthquakes , lat='Latitude', lon='Longitude', z='Magnitude', radius=5,
center=dict(lat=0, lon=180), zoom=0,
mapbox_style="stamen-terrain", title = 'Earthquakes around the world')
fig.show()

Tectonic plates data

Reading data from CSV,

tec_plates = pd.read_csv('Tectonicplates.csv')

creating a visualization of tectonic plates using plotly library:

fig = plt.figure(figsize=(14, 10), edgecolor='w')
m = Basemap(projection='cyl', resolution='c',
llcrnrlat=-90, urcrnrlat=90,
llcrnrlon=-180, urcrnrlon=180, )
m.scatter(tec_plates['lon'], tec_plates['lat'],s=4, color='green')m.drawcountries(color='gray',linewidth=1)
m.shadedrelief()
plt.title("View of Tectonic plates ")
plt.show()

Tsunami data

Reading tsunami data,

tsunami = pd.read_csv('sources.csv')

importing the second table from the source here and selecting two columns for the analysis,

waves = pd.read_csv('waves.csv')
waves = waves[['SOURCE_ID', 'DISTANCE_FROM_SOURCE']]

3. Preprocessing data for the analysis

Cleaning earthquake data

Selecting columns from data frame and

earthquakes = earthquakes[['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude', 'Type']]

checking lengths of dates to see if there are any differences.

lengths = earthquakes["Date"].str.len()
lengths.value_counts()

As we can see, the data frame contains 3 rows with wrong dates (24 characters).

wrongdates_index = np.where([lengths == 24])[1]
print(wrongdates_index )

Row index with wrong dates: [ 3378 7512 20650].

earthquakes.loc[wrongdates_index]
earthquakes.loc[3378, "Date"] = "02/23/1975"  
earthquakes.loc[7512, "Date"] = "04/28/1985"
earthquakes.loc[20650, "Date"] = "03/13/2011"

earthquakes.loc[3378, "Time"] = "02:58:41"
earthquakes.loc[7512, "Time"] = "02:53:41"
earthquakes.loc[20650, "Time"] = "02:23:34"

All the wrong dates have been corrected.

lengths = earthquakes["Date"].str.len()
lengths.value_counts()

Creating a DateTime column from the date and time columns,

earthquakes['Datetime'] = earthquakes['Date'] + ' ' + earthquakes['Time']
earthquakes['Datetime'] = pd.to_datetime(earthquakes['Datetime'])
earthquakes.head()

extracting year and month names from DateTime columns,

earthquakes['Year'] = earthquakes['Datetime'].dt.year
earthquakes['Month'] = earthquakes['Datetime'].dt.month_name()
earthquakes.head()

and selecting columns from the data frame and checking if there are any nan values.

earthquakes = earthquakes[['Datetime', 'Latitude', 'Longitude', 'Depth', 'Magnitude', 'Type', 'Year','Month', 'Date']]
earthquakes.head()
check_nan_in_df = earthquakes.isnull().any()
print (check_nan_in_df)

Cleaning tsunami data

The first step was to reset the index and select columns from the data frame.

tsunami.reset_index(drop=True, inplace=True)

tsunami = tsunami[['SOURCE_ID','YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'FOCAL_DEPTH', 'PRIMARY_MAGNITUDE','LATITUDE', 'LONGITUDE', 'COUNTRY', 'CAUSE']]

After that, dropping nan values in the chosen subset of columns,

tsunami  = tsunami.dropna(subset=['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE']).dropna()

converting columns to integers,

tsunami.MONTH = tsunami.MONTH.astype(int)
tsunami.DAY = tsunami.DAY.astype(int)
tsunami.HOUR = tsunami.HOUR.astype(int)
tsunami.MINUTE = tsunami.MINUTE.astype(int)
tsunami.head()

creating a new column called cause_name with the categorized values,

causes = {0:'Unknown',
1:'Earthquake',
2:'Questionable Earthquake',
3:'Earthquake and Landslide',
4:'Volcano and Earthquake',
5:'Volcano, Earthquake, and Landslide',
6:'Volcano',
7:'Volcano and Landslide',
8:'Landslide',
9:'Meteorological',
10:'Explosion',
11:'Astronomical Tide'}
tsunami['CAUSE_NAME'] = tsunami['CAUSE'].map(causes)
tsunami.head()
source of categories: Historical Tsunami Database (National Center for Environmental information

picking data only for earthquake causes,

tsunami_type = tsunami[tsunami['CAUSE_NAME'] == 'Earthquake']

checking if there are any nans,

check_nan_in_df2 = tsunami_type.isnull().any()
print (check_nan_in_df2)

and finally creating a date column combining day, month year columns.

cols=["MONTH","DAY","YEAR"]
tsunami_type['DATE'] = tsunami_type[cols].apply(lambda x: '/'.join(x.values.astype(str)), axis="columns")
tsunami_type = tsunami_type[['SOURCE_ID', 'DATE', 'PRIMARY_MAGNITUDE', 'COUNTRY', 'LATITUDE', 'LONGITUDE' ]]
tsunami_type.head()

The second part, i.e. visualization techniques and damage grade prediction, will be published shortly.. :)

--

--

CudGeo
CudGeo

Written by CudGeo

GIS & Remote Sensing enthusiast 🌎

No responses yet