Taranaki Basin Curated Well Logs

This notebooks investigates the Taranaki Well Logs dataset which is open sourced by the IBM Research and is available to download freely on the IBM Developer Data Asset Exchange: Taranaki Basin Curated Well Logs Dataset. This notebook can be found on Watson Studio: Taranaki Basin Curated Well Logs Notebook.

This dataset contains details about a set of oil wells located in the Taranaki Basin. The Taranaki Basin comprises an area of about 330,000 square kilometers, located broadly onshore and offshore the New Zealand west coast. This basin is the main region for oil exploration and production in New Zealand, with over 400 wells drilled to date. The basin consists of sedimentary rocks dated from Late Cretaceous to present, covering the Paleozoic and Mesozoic basement rocks.

The data was curated from two sources, the New Zealand Petroleum & Minerals Online Exploration Database (data.nzpam.govt.nz), and the Petlab (pet.gns.cri.nz), which served to characterize the Taranaki basin. In particular, the data served to map important tectonic regions in the basin and the various formations in these regions. We used geological reports to identify formation markers, spreadsheets to find well header and drilling deviation information, and finally, LAS files to characterize a reasonable set of well log annotations. The curated dataset consists of a set with 407 wells containing the main geophysical well logs and reported geological formations in true vertical depth.

The data was then prepared, processed, and cleaned from various files into a final CSV file containing the well logs, the coordinates of the wells, and the corresponding labels.

In [1]:
import requests
import tarfile
import os

Download and Extract the Dataset

Lets download the dataset from the Data Asset Exchange Cloud Object Storage bucket and extract the tarball.

In [2]:
# Download the dataset
fname = 'taranaki-basin-curated-well-logs.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-taranaki-basin-curated-well-logs/1.0.0/'
download_link = url + fname
r = requests.get(download_link, allow_redirects=True)
open(fname , 'wb').write(r.content)
Out[2]:
228743380
In [3]:
print(r.status_code)
print(os.listdir('.'))
print(r.headers.get('content-type'))
200
['taranaki-basin-curated-well-logs', 'taranaki-basin-curated-well-logs.tar.gz']
application/x-gtar
In [4]:
# Extract the dataset
tar = tarfile.open(fname)
tar.extractall()
tar.close()
In [5]:
# Verify the file was extracted properly
data_path = "taranaki-basin-curated-well-logs"
os.path.exists(data_path)
Out[5]:
True
In [6]:
# load dataset into notebook
# Load the Pandas libraries with alias 'pd' 
import pandas as pd 
# Read data from file 'filename.csv' 
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv (see later) 
# Using pandas to read the data 
logs_df = pd.read_csv(data_path + '/logs.csv')
coords_df = pd.read_csv(data_path + '/coords.csv')
/opt/conda/envs/Python36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (21) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
In [7]:
# Preview the first 5 lines of the logs data 
logs_df.head()
Out[7]:
BS CALI DENS DRHO DTC GR NEUT PEF RESD RESM ... TEMP TENS X Y Z DEPT ONSHORE DIRSURVEY WELLNAME FORMATION
0 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 424403.93838 399869.5574 229.4244 27.5844 True False Ahuroa South B-1 NaN
1 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 424403.93838 399869.5574 229.5768 27.7368 True False Ahuroa South B-1 NaN
2 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 424403.93838 399869.5574 229.7292 27.8892 True False Ahuroa South B-1 NaN
3 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 33.0429 NaN 424403.93838 399869.5574 229.8816 28.0416 True False Ahuroa South B-1 NaN
4 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 33.0429 NaN 424403.93838 399869.5574 230.0340 28.1940 True False Ahuroa South B-1 NaN

5 rows × 22 columns

In [8]:
# Preview the first 5 lines of the coords data 
coords_df.head()
Out[8]:
WELL_NAME LAT LONG X Y
0 Supplejack-1 -39.160173 174.231151 415005.272386 411576.526872
1 Tariki-2B -39.199475 174.365528 426576.126071 407175.416904
2 Kapuni-8 -39.431302 174.180483 410595.589204 381435.476294
3 Kapuni-1 -39.472414 174.175483 410160.200502 376864.629747
4 Radnor-1 -39.299885 174.266823 418044.867622 396034.751670
In [9]:
logs_df.dtypes
Out[9]:
BS           float64
CALI         float64
DENS         float64
DRHO         float64
DTC          float64
GR           float64
NEUT         float64
PEF          float64
RESD         float64
RESM         float64
RESS         float64
SP           float64
TEMP         float64
TENS         float64
X            float64
Y            float64
Z            float64
DEPT         float64
ONSHORE         bool
DIRSURVEY       bool
WELLNAME      object
FORMATION     object
dtype: object
In [10]:
coords_df.dtypes
Out[10]:
WELL_NAME     object
LAT          float64
LONG         float64
X            float64
Y            float64
dtype: object
In [11]:
# Check the well names of both dataset
print(sorted(logs_df['WELLNAME'].unique()) == sorted(coords_df['WELL_NAME'].unique()))
True
In [12]:
# Check how many data points for each well
logs_df.pivot_table(index=['WELLNAME'], aggfunc='size')
Out[12]:
WELLNAME
Ahuroa South B-1       21932
Ahuroa South B-1ST1     5274
Ahuroa-1               21276
Ahuroa-1A               4334
Ahuroa-2               15153
Ahuroa-2A               5515
Ahuroa-3               16542
Ahuroa-4               16378
Ahuroa-5               16654
Ahuroa-5 ST1           11755
Albacore-1              6775
Amokura-1              26213
Amokura-2H             23711
Arakamu-1              15521
Arawa-1                19255
Ariki-1                28595
Awatea-1               20416
Beluga-1               26808
Bluff-1                 9199
Burgess-1              21358
Cape Egmont-1          11443
Cape Farewell-1         8396
Cardiff-1              33067
Cardiff-2              32080
Cardiff-2AST1          15282
Cheal B-2              11761
Cheal B-3              12293
Cheal-1                16217
Cheal-2                10216
Cheal-A3X              12034
                       ...  
Turi-1                 25275
Urenui-1               24094
Waihapa-1              23485
Waihapa-1A              8988
Waihapa-1B              2139
Waihapa-2              21184
Waihapa-3              17601
Waihapa-4              20256
Waihapa-5              21135
Waihapa-6              20964
Waihapa-6A             13775
Waihapa-7A              5777
Waihapa-8              23592
Waihapa-H1             18355
Waihi-1                18890
Waihi-1A               23960
Waimanu-1              33664
Wainui-1               22460
Waitui-1               32559
Warea Deep-1           17736
Warea-1                18967
Wawiri-1                8681
West Cape-1            21257
Wharehuia-1            19416
Windsor-1               7978
Windsor-2               6883
Windsor-3A              7240
Wingrove-1              5482
Wingrove-2              8681
Witiora-1              26769
Length: 407, dtype: int64
In [13]:
merged_df = pd.merge(left=logs_df, right=coords_df, left_on='WELLNAME', right_on='WELL_NAME', how='outer')
In [14]:
# Rename the column X Y Z for both dataset
merged_df = merged_df.rename(columns={'X_x': 'X_logs', 'Y_x': 'Y_logs', 'Z': 'Z_logs', 'X_y': 'X_coords', 'Y_y': 'Y_coords'})
merged_df.head(10)
Out[14]:
BS CALI DENS DRHO DTC GR NEUT PEF RESD RESM ... DEPT ONSHORE DIRSURVEY WELLNAME FORMATION WELL_NAME LAT LONG X_coords Y_coords
0 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 27.5844 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
1 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 27.7368 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
2 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 27.8892 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
3 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.0416 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
4 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.1940 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
5 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.3464 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
6 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.4988 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
7 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.6512 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
8 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.8036 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401
9 12.25 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 28.9560 True False Ahuroa South B-1 NaN Ahuroa South B-1 -39.265242 174.340586 424403.938384 399869.557401

10 rows × 27 columns

In [15]:
merged_df.dtypes
Out[15]:
BS           float64
CALI         float64
DENS         float64
DRHO         float64
DTC          float64
GR           float64
NEUT         float64
PEF          float64
RESD         float64
RESM         float64
RESS         float64
SP           float64
TEMP         float64
TENS         float64
X_logs       float64
Y_logs       float64
Z_logs       float64
DEPT         float64
ONSHORE         bool
DIRSURVEY       bool
WELLNAME      object
FORMATION     object
WELL_NAME     object
LAT          float64
LONG         float64
X_coords     float64
Y_coords     float64
dtype: object
In [16]:
# Compare X and Y location in both dataset
XY_df = merged_df[['X_logs', 'X_coords', 'Y_logs', 'Y_coords']]
XY_df.head(10)
Out[16]:
X_logs X_coords Y_logs Y_coords
0 424403.93838 424403.938384 399869.5574 399869.557401
1 424403.93838 424403.938384 399869.5574 399869.557401
2 424403.93838 424403.938384 399869.5574 399869.557401
3 424403.93838 424403.938384 399869.5574 399869.557401
4 424403.93838 424403.938384 399869.5574 399869.557401
5 424403.93838 424403.938384 399869.5574 399869.557401
6 424403.93838 424403.938384 399869.5574 399869.557401
7 424403.93838 424403.938384 399869.5574 399869.557401
8 424403.93838 424403.938384 399869.5574 399869.557401
9 424403.93838 424403.938384 399869.5574 399869.557401