This notebooks investigates the Taranaki Well Logs dataset which is open sourced by the IBM Research and is available to download freely on the IBM Developer Data Asset Exchange: Taranaki Basin Curated Well Logs Dataset. This notebook can be found on Watson Studio: Taranaki Basin Curated Well Logs Notebook.
This dataset contains details about a set of oil wells located in the Taranaki Basin. The Taranaki Basin comprises an area of about 330,000 square kilometers, located broadly onshore and offshore the New Zealand west coast. This basin is the main region for oil exploration and production in New Zealand, with over 400 wells drilled to date. The basin consists of sedimentary rocks dated from Late Cretaceous to present, covering the Paleozoic and Mesozoic basement rocks.
The data was curated from two sources, the New Zealand Petroleum & Minerals Online Exploration Database (data.nzpam.govt.nz), and the Petlab (pet.gns.cri.nz), which served to characterize the Taranaki basin. In particular, the data served to map important tectonic regions in the basin and the various formations in these regions. We used geological reports to identify formation markers, spreadsheets to find well header and drilling deviation information, and finally, LAS files to characterize a reasonable set of well log annotations. The curated dataset consists of a set with 407 wells containing the main geophysical well logs and reported geological formations in true vertical depth.
The data was then prepared, processed, and cleaned from various files into a final CSV file containing the well logs, the coordinates of the wells, and the corresponding labels.
import requests
import tarfile
import os
Lets download the dataset from the Data Asset Exchange Cloud Object Storage bucket and extract the tarball.
# Download the dataset
fname = 'taranaki-basin-curated-well-logs.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-taranaki-basin-curated-well-logs/1.0.0/'
download_link = url + fname
r = requests.get(download_link, allow_redirects=True)
open(fname , 'wb').write(r.content)
print(r.status_code)
print(os.listdir('.'))
print(r.headers.get('content-type'))
# Extract the dataset
tar = tarfile.open(fname)
tar.extractall()
tar.close()
# Verify the file was extracted properly
data_path = "taranaki-basin-curated-well-logs"
os.path.exists(data_path)
# load dataset into notebook
# Load the Pandas libraries with alias 'pd'
import pandas as pd
# Read data from file 'filename.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later)
# Using pandas to read the data
logs_df = pd.read_csv(data_path + '/logs.csv')
coords_df = pd.read_csv(data_path + '/coords.csv')
# Preview the first 5 lines of the logs data
logs_df.head()
# Preview the first 5 lines of the coords data
coords_df.head()
logs_df.dtypes
coords_df.dtypes
# Check the well names of both dataset
print(sorted(logs_df['WELLNAME'].unique()) == sorted(coords_df['WELL_NAME'].unique()))
# Check how many data points for each well
logs_df.pivot_table(index=['WELLNAME'], aggfunc='size')
merged_df = pd.merge(left=logs_df, right=coords_df, left_on='WELLNAME', right_on='WELL_NAME', how='outer')
# Rename the column X Y Z for both dataset
merged_df = merged_df.rename(columns={'X_x': 'X_logs', 'Y_x': 'Y_logs', 'Z': 'Z_logs', 'X_y': 'X_coords', 'Y_y': 'Y_coords'})
merged_df.head(10)
merged_df.dtypes
# Compare X and Y location in both dataset
XY_df = merged_df[['X_logs', 'X_coords', 'Y_logs', 'Y_coords']]
XY_df.head(10)