3W Dataset

To the best of its authors' knowledge, this is the first realistic and public dataset with rare undesirable real events in oil wells that can be readily used as a benchmark dataset for development of machine learning techniques related to inherent difficulties of actual data.

For more information about the theory behind this dataset, refer to the paper A realistic and public dataset with rare undesirable real events in oil wells published in the Journal of Petroleum Science and Engineering (link here). Specific challenges (benchmarks) that practitioners and researchers can use together with the 3W dataset are defined and proposed in this paper.

Preamble

This is the supporting repository for the paper mentioned above. If you use the 3W dataset or the proposed benchmarks, please cite the paper and the 3W dataset:

@article{VARGAS2019106223,
title = "A realistic and public dataset with rare undesirable real events in oil wells",
journal = "Journal of Petroleum Science and Engineering",
volume = "181",
pages = "106223",
year = "2019",
issn = "0920-4105",
doi = "https://doi.org/10.1016/j.petrol.2019.106223",
url = "http://www.sciencedirect.com/science/article/pii/S0920410519306357",
author = "Ricardo Emanuel Vaz Vargas and Celso José Munaro and Patrick Marques Ciarelli and André Gonçalves Medeiros and Bruno Guberfain do Amaral and Daniel Centurion Barrionuevo and Jean Carlos Dias de Araújo and Jorge Lins Ribeiro and Lucas Pierezan Magalhães",
keywords = "Fault detection and diagnosis, Oil well monitoring, Abnormal event management, Multivariate time series classification",
abstract = "Detection of undesirable events in oil and gas wells can help prevent production losses, environmental accidents, and human casualties and reduce maintenance costs. The scarcity of measurements in such processes is a drawback due to the low reliability of instrumentation in such hostile environments. Another issue is the absence of adequately structured data related to events that should be detected. To contribute to providing a priori knowledge about undesirable events for diagnostic algorithms in offshore naturally flowing wells, this work presents an original and valuable dataset with instances of eight types of undesirable events characterized by eight process variables. Many hours of expert work were required to validate historical instances and to produce simulated and hand-drawn instances that can be useful to distinguish normal and abnormal actual events under different operating conditions. The choices made during this dataset's preparation are described and justified, and specific benchmarks that practitioners and researchers can use together with the published dataset are defined. This work has resulted in two relevant contributions. A challenging public dataset that can be used as a benchmark for the development of (i) machine learning techniques related to inherent difficulties of actual data, and (ii) methods for specific tasks associated with detecting and diagnosing undesirable events in offshore naturally flowing oil and gas wells. The other contribution is the proposal of the defined benchmarks."
}

Vargas, Ricardo; Munaro, Celso; Ciarelli, Patrick; Medeiros, André; Amaral, Bruno; Barrionuevo, Daniel; Araújo, Jean; Ribeiro, Jorge; Magalhães, Lucas (2019), “Data for: A Realistic and Public Dataset with Rare Undesirable Real Events in Oil Wells”, Mendeley Data, v1. http://dx.doi.org/10.17632/r7774rwc7v.1

If you have questions or want to contribute with this work, please drop me an email at ricardovvargas at gmail dot com.

3W Dataset's Structure

The 3W dataset consists of 1,984 CSV files structured as follows. Due to the limitation of GitHub, this dataset is kept in 7z files splited automatically and saved in the data directory. Before using 3W dataset, they must be decompressed. After that, the subdirectory names are the instances' labels. Each file represents one instance. The filename reveals its source. All files are standardized as follow. There are one observation per line and one series per column. Columns are separated by commas and decimals are separated by periods. The first column contains timestamps, the last one reveals the observations' labels, and the other columns are the Multivariate Time Series (MTS) (i.e. the instance itself).

3W Dataset's Overview

A 3W Dataset's general presentation with some quantities and statistics is available here.

Experiments for Proposed Benchmarks

The results of the following experiments can be used as baselines.

Bechmark 1: Impact of Using Simulated and Hand-Drawn Instances (code and results here).
Bechmark 2: Anomaly Detection (code and results here).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo_1_benchmark_impact_of_using_simulated_and_hand-drawn_instances.ipynb		demo_1_benchmark_impact_of_using_simulated_and_hand-drawn_instances.ipynb
demo_2_benchmark_anomaly_detection.ipynb		demo_2_benchmark_anomaly_detection.ipynb
overview.ipynb		overview.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3W Dataset

Preamble

3W Dataset's Structure

3W Dataset's Overview

Experiments for Proposed Benchmarks

About

Releases

Packages

Languages

License

LFL077/3w_dataset

Folders and files

Latest commit

History

Repository files navigation

3W Dataset

Preamble

3W Dataset's Structure

3W Dataset's Overview

Experiments for Proposed Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages