Taranaki Basin Curated Well Logs¶

This notebooks investigates the Taranaki Well Logs dataset which is open sourced by the IBM Research and is available to download freely on the IBM Developer Data Asset Exchange: Taranaki Basin Curated Well Logs Dataset. This notebook can be found on Watson Studio: Taranaki Basin Curated Well Logs Notebook.

This dataset contains details about a set of oil wells located in the Taranaki Basin. The Taranaki Basin comprises an area of about 330,000 square kilometers, located broadly onshore and offshore the New Zealand west coast. This basin is the main region for oil exploration and production in New Zealand, with over 400 wells drilled to date. The basin consists of sedimentary rocks dated from Late Cretaceous to present, covering the Paleozoic and Mesozoic basement rocks.

The data was curated from two sources, the New Zealand Petroleum & Minerals Online Exploration Database (data.nzpam.govt.nz), and the Petlab (pet.gns.cri.nz), which served to characterize the Taranaki basin. In particular, the data served to map important tectonic regions in the basin and the various formations in these regions. We used geological reports to identify formation markers, spreadsheets to find well header and drilling deviation information, and finally, LAS files to characterize a reasonable set of well log annotations. The curated dataset consists of a set with 407 wells containing the main geophysical well logs and reported geological formations in true vertical depth.

The data was then prepared, processed, and cleaned from various files into a final CSV file containing the well logs, the coordinates of the wells, and the corresponding labels.

import requests
import tarfile
import os

Download and Extract the Dataset¶

Lets download the dataset from the Data Asset Exchange Cloud Object Storage bucket and extract the tarball.

# Download the dataset
fname = 'taranaki-basin-curated-well-logs.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-taranaki-basin-curated-well-logs/1.0.0/'
download_link = url + fname
r = requests.get(download_link, allow_redirects=True)
open(fname , 'wb').write(r.content)

228743380

print(r.status_code)
print(os.listdir('.'))
print(r.headers.get('content-type'))

200
['taranaki-basin-curated-well-logs', 'taranaki-basin-curated-well-logs.tar.gz']
application/x-gtar

# Extract the dataset
tar = tarfile.open(fname)
tar.extractall()
tar.close()

# Verify the file was extracted properly
data_path = "taranaki-basin-curated-well-logs"
os.path.exists(data_path)

True

# load dataset into notebook
# Load the Pandas libraries with alias 'pd' 
import pandas as pd 
# Read data from file 'filename.csv' 
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later) 
# Using pandas to read the data 
logs_df = pd.read_csv(data_path + '/logs.csv')
coords_df = pd.read_csv(data_path + '/coords.csv')

/opt/conda/envs/Python36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

# Preview the first 5 lines of the logs data 
logs_df.head()

# Preview the first 5 lines of the coords data 
coords_df.head()

logs_df.dtypes

BS           float64
CALI         float64
DENS         float64
DRHO         float64
DTC          float64
GR           float64
NEUT         float64
PEF          float64
RESD         float64
RESM         float64
RESS         float64
SP           float64
TEMP         float64
TENS         float64
X            float64
Y            float64
Z            float64
DEPT         float64
ONSHORE         bool
DIRSURVEY       bool
WELLNAME      object
FORMATION     object
dtype: object

coords_df.dtypes

WELL_NAME     object
LAT          float64
LONG         float64
X            float64
Y            float64
dtype: object

# Check the well names of both dataset
print(sorted(logs_df['WELLNAME'].unique()) == sorted(coords_df['WELL_NAME'].unique()))

True

# Check how many data points for each well
logs_df.pivot_table(index=['WELLNAME'], aggfunc='size')

WELLNAME
Ahuroa South B-1       21932
Ahuroa South B-1ST1     5274
Ahuroa-1               21276
Ahuroa-1A               4334
Ahuroa-2               15153
Ahuroa-2A               5515
Ahuroa-3               16542
Ahuroa-4               16378
Ahuroa-5               16654
Ahuroa-5 ST1           11755
Albacore-1              6775
Amokura-1              26213
Amokura-2H             23711
Arakamu-1              15521
Arawa-1                19255
Ariki-1                28595
Awatea-1               20416
Beluga-1               26808
Bluff-1                 9199
Burgess-1              21358
Cape Egmont-1          11443
Cape Farewell-1         8396
Cardiff-1              33067
Cardiff-2              32080
Cardiff-2AST1          15282
Cheal B-2              11761
Cheal B-3              12293
Cheal-1                16217
Cheal-2                10216
Cheal-A3X              12034
                       ...  
Turi-1                 25275
Urenui-1               24094
Waihapa-1              23485
Waihapa-1A              8988
Waihapa-1B              2139
Waihapa-2              21184
Waihapa-3              17601
Waihapa-4              20256
Waihapa-5              21135
Waihapa-6              20964
Waihapa-6A             13775
Waihapa-7A              5777
Waihapa-8              23592
Waihapa-H1             18355
Waihi-1                18890
Waihi-1A               23960
Waimanu-1              33664
Wainui-1               22460
Waitui-1               32559
Warea Deep-1           17736
Warea-1                18967
Wawiri-1                8681
West Cape-1            21257
Wharehuia-1            19416
Windsor-1               7978
Windsor-2               6883
Windsor-3A              7240
Wingrove-1              5482
Wingrove-2              8681
Witiora-1              26769
Length: 407, dtype: int64

merged_df = pd.merge(left=logs_df, right=coords_df, left_on='WELLNAME', right_on='WELL_NAME', how='outer')

# Rename the column X Y Z for both dataset
merged_df = merged_df.rename(columns={'X_x': 'X_logs', 'Y_x': 'Y_logs', 'Z': 'Z_logs', 'X_y': 'X_coords', 'Y_y': 'Y_coords'})
merged_df.head(10)

merged_df.dtypes

BS           float64
CALI         float64
DENS         float64
DRHO         float64
DTC          float64
GR           float64
NEUT         float64
PEF          float64
RESD         float64
RESM         float64
RESS         float64
SP           float64
TEMP         float64
TENS         float64
X_logs       float64
Y_logs       float64
Z_logs       float64
DEPT         float64
ONSHORE         bool
DIRSURVEY       bool
WELLNAME      object
FORMATION     object
WELL_NAME     object
LAT          float64
LONG         float64
X_coords     float64
Y_coords     float64
dtype: object

# Compare X and Y location in both dataset
XY_df = merged_df[['X_logs', 'X_coords', 'Y_logs', 'Y_coords']]
XY_df.head(10)

	BS	CALI	DENS	DRHO	DTC	GR	NEUT	PEF	RESD	RESM	...	TEMP	TENS	X	Y	Z	DEPT	ONSHORE	DIRSURVEY	WELLNAME	FORMATION
0	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	424403.93838	399869.5574	229.4244	27.5844	True	False	Ahuroa South B-1	NaN
1	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	424403.93838	399869.5574	229.5768	27.7368	True	False	Ahuroa South B-1	NaN
2	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	424403.93838	399869.5574	229.7292	27.8892	True	False	Ahuroa South B-1	NaN
3	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	33.0429	NaN	424403.93838	399869.5574	229.8816	28.0416	True	False	Ahuroa South B-1	NaN
4	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	33.0429	NaN	424403.93838	399869.5574	230.0340	28.1940	True	False	Ahuroa South B-1	NaN

	WELL_NAME	LAT	LONG	X	Y
0	Supplejack-1	-39.160173	174.231151	415005.272386	411576.526872
1	Tariki-2B	-39.199475	174.365528	426576.126071	407175.416904
2	Kapuni-8	-39.431302	174.180483	410595.589204	381435.476294
3	Kapuni-1	-39.472414	174.175483	410160.200502	376864.629747
4	Radnor-1	-39.299885	174.266823	418044.867622	396034.751670

	BS	CALI	DENS	DRHO	DTC	GR	NEUT	PEF	RESD	RESM	...	DEPT	ONSHORE	DIRSURVEY	WELLNAME	FORMATION	WELL_NAME	LAT	LONG	X_coords	Y_coords
0	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	27.5844	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
1	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	27.7368	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
2	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	27.8892	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
3	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.0416	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
4	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.1940	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
5	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.3464	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
6	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.4988	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
7	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.6512	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
8	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.8036	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401
9	12.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	28.9560	True	False	Ahuroa South B-1	NaN	Ahuroa South B-1	-39.265242	174.340586	424403.938384	399869.557401

	X_logs	X_coords	Y_logs	Y_coords
0	424403.93838	424403.938384	399869.5574	399869.557401
1	424403.93838	424403.938384	399869.5574	399869.557401
2	424403.93838	424403.938384	399869.5574	399869.557401
3	424403.93838	424403.938384	399869.5574	399869.557401
4	424403.93838	424403.938384	399869.5574	399869.557401
5	424403.93838	424403.938384	399869.5574	399869.557401
6	424403.93838	424403.938384	399869.5574	399869.557401
7	424403.93838	424403.938384	399869.5574	399869.557401
8	424403.93838	424403.938384	399869.5574	399869.557401
9	424403.93838	424403.938384	399869.5574	399869.557401