Seismic analysis using Python - Part 1

4 min readSep 17, 2022

Reading and pre-processing the datasets

The aim is to analyse the relationship between worldwide earthquakes, tsunamis and tectonic plate boundaries. This aim will be met by completing the following objectives:
- Mapping all the affected areas,
- Number of occurrences of earthquakes with different magnitude ranges,
- Severity of earthquakes,
- Mapping highly affected areas based on the magnitude,
- Which month has the highest earthquake occurrences?
- Which year has the highest earthquake occurrences?
- Visualizations of earthquakes and tsunami.

The seismic analysis is divided into two parts: reading and pre-processing the datasets, visualization techniques and damage grade prediction.

1. Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.basemap import Basemap 
import folium
from folium import plugins
import datetime
import plotly.express as px
import pandas_profiling 
import matplotlib.pyplot as plt

2. Importing datasets

Datasets:
- Earthquake data (1965–2016) downloaded from https://www.kaggle.com/usgs/earthquake-database
- Tectonic plates data downloaded from https://www.kaggle.com/cwthompson/tectonic-plate-boundaries
- Tsunami data (2000–2017) downloaded from https://www.kaggle.com/noaa/seismic-waves

Earthquakes Data

Reading a comma-separated values (CSV) file into DataFrame.

earthquakes = pd.read_csv('database.csv')

and visualizing data using plotly.express library.

fig = px.density_mapbox(earthquakes , lat='Latitude', lon='Longitude', z='Magnitude', radius=5,
                        center=dict(lat=0, lon=180), zoom=0,
                        mapbox_style="stamen-terrain", title = 'Earthquakes around the world')
fig.show()

Tectonic plates data

Reading data from CSV,

tec_plates = pd.read_csv('Tectonicplates.csv')

creating a visualization of tectonic plates using plotly library:

fig = plt.figure(figsize=(14, 10), edgecolor='w')
m = Basemap(projection='cyl', resolution='c',
            llcrnrlat=-90, urcrnrlat=90,
            llcrnrlon=-180, urcrnrlon=180, )m.scatter(tec_plates['lon'], tec_plates['lat'],s=4, color='green')m.drawcountries(color='gray',linewidth=1)
m.shadedrelief()
plt.title("View of Tectonic plates ")
plt.show()

Tsunami data

Reading tsunami data,

tsunami = pd.read_csv('sources.csv')

importing the second table from the source here and selecting two columns for the analysis,

waves = pd.read_csv('waves.csv')
waves = waves[['SOURCE_ID', 'DISTANCE_FROM_SOURCE']]

3. Preprocessing data for the analysis

Cleaning earthquake data

Selecting columns from data frame and

earthquakes = earthquakes[['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude', 'Type']]

checking lengths of dates to see if there are any differences.

lengths = earthquakes["Date"].str.len()
lengths.value_counts()

As we can see, the data frame contains 3 rows with wrong dates (24 characters).

wrongdates_index = np.where([lengths == 24])[1]
print(wrongdates_index )

Row index with wrong dates: [ 3378 7512 20650].

earthquakes.loc[wrongdates_index]

earthquakes.loc[3378, "Date"] = "02/23/1975"  
earthquakes.loc[7512, "Date"] = "04/28/1985"
earthquakes.loc[20650, "Date"] = "03/13/2011"
 
earthquakes.loc[3378, "Time"] = "02:58:41"  
earthquakes.loc[7512, "Time"] = "02:53:41"
earthquakes.loc[20650, "Time"] = "02:23:34"

All the wrong dates have been corrected.

lengths = earthquakes["Date"].str.len()
lengths.value_counts()

Creating a DateTime column from the date and time columns,

earthquakes['Datetime'] = earthquakes['Date'] + ' ' + earthquakes['Time']
earthquakes['Datetime'] = pd.to_datetime(earthquakes['Datetime'])earthquakes.head()

extracting year and month names from DateTime columns,

earthquakes['Year'] = earthquakes['Datetime'].dt.year
earthquakes['Month'] = earthquakes['Datetime'].dt.month_name()
earthquakes.head()

and selecting columns from the data frame and checking if there are any nan values.

earthquakes = earthquakes[['Datetime', 'Latitude', 'Longitude', 'Depth', 'Magnitude', 'Type', 'Year','Month', 'Date']]
earthquakes.head()

check_nan_in_df = earthquakes.isnull().any()
print (check_nan_in_df)

Cleaning tsunami data

The first step was to reset the index and select columns from the data frame.

tsunami.reset_index(drop=True, inplace=True)

tsunami = tsunami[['SOURCE_ID','YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'FOCAL_DEPTH', 'PRIMARY_MAGNITUDE','LATITUDE', 'LONGITUDE', 'COUNTRY', 'CAUSE']]

After that, dropping nan values in the chosen subset of columns,

tsunami  = tsunami.dropna(subset=['YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE']).dropna()

converting columns to integers,

tsunami.MONTH = tsunami.MONTH.astype(int)
tsunami.DAY = tsunami.DAY.astype(int)
tsunami.HOUR = tsunami.HOUR.astype(int)
tsunami.MINUTE = tsunami.MINUTE.astype(int)
tsunami.head()

creating a new column called cause_name with the categorized values,

causes = {0:'Unknown',
          1:'Earthquake',
          2:'Questionable Earthquake',
          3:'Earthquake and Landslide',
          4:'Volcano and Earthquake',
          5:'Volcano, Earthquake, and Landslide',
          6:'Volcano',
          7:'Volcano and Landslide',
          8:'Landslide',
          9:'Meteorological',
          10:'Explosion',
          11:'Astronomical Tide'}tsunami['CAUSE_NAME'] = tsunami['CAUSE'].map(causes)
tsunami.head()source of categories:  Historical Tsunami Database (National Center for Environmental information

picking data only for earthquake causes,

tsunami_type = tsunami[tsunami['CAUSE_NAME'] == 'Earthquake']

checking if there are any nans,

check_nan_in_df2 = tsunami_type.isnull().any()
print (check_nan_in_df2)

and finally creating a date column combining day, month year columns.

cols=["MONTH","DAY","YEAR"]
tsunami_type['DATE'] = tsunami_type[cols].apply(lambda x: '/'.join(x.values.astype(str)), axis="columns")
tsunami_type = tsunami_type[['SOURCE_ID', 'DATE', 'PRIMARY_MAGNITUDE', 'COUNTRY', 'LATITUDE', 'LONGITUDE' ]]
tsunami_type.head()

The second part, i.e. visualization techniques and damage grade prediction, will be published shortly.. :)