-
Notifications
You must be signed in to change notification settings - Fork 299
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/hotfixes' into release
- Loading branch information
Showing
16 changed files
with
1,703 additions
and
4,664 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,284 +1,149 @@ | ||
Supported/Described Version(s): pm4py 2.7.11.11 | ||
|
||
This documentation assumes that the reader has a basic understanding of process | ||
mining | ||
and python concepts. | ||
|
||
|
||
# Handling Event Data | ||
|
||
## Importing IEEE XES Files | ||
|
||
IEEE XES is a standard format describing how event logs are stored. | ||
For more information about the format, please study the [IEEE XES Website](http://www.xes-standard.org). | ||
A simple synthetic event log (`running-example.xes`) can be downloaded from [here](static/assets/examples/running-example.xes). | ||
Note that several real event logs have been made available over the past few years. | ||
You can find them [here](https://data.4tu.nl/search?q=:keyword:%20real%20life%20event%20logs). | ||
|
||
|
||
## Importing IEEE XES files | ||
|
||
|
||
IEEE XES is a standard format describing how event logs are stored. | ||
For more information about the format, please study the | ||
IEEE XES Website (http://www.xes-standard.org) | ||
. | ||
A simple synthetic event log ( | ||
running-example.xes | ||
) can be downloaded from | ||
here (static/assets/examples/running-example.xes) | ||
. | ||
Note that several real event logs have been made available, over the past few | ||
years. | ||
You can find them | ||
here (https://data.4tu.nl/search?q=:keyword:%20real%20life%20event%20logs) | ||
. | ||
|
||
|
||
|
||
The example code on the right shows how to import an event log, stored in the IEEE | ||
XES format, given a file path to the log file. | ||
The code fragment uses the standard importer (iterparse, described in a later | ||
paragraph). | ||
Note that IEEE XES Event Logs are imported into a Pandas dataframe object. | ||
|
||
The example code on the right shows how to import an event log stored in the IEEE XES format, given a file path to the log file. | ||
The code fragment uses the standard importer (`iterparse`, described in a later paragraph). | ||
Note that IEEE XES Event Logs are imported into a Pandas DataFrame object. | ||
|
||
```python | ||
import pm4py | ||
if __name__ == "__main__": | ||
log = pm4py.read_xes('tests/input_data/running-example.xes') | ||
log = pm4py.read_xes('tests/input_data/running-example.xes') | ||
``` | ||
|
||
## Importing CSV Files | ||
|
||
Apart from the IEEE XES standard, many event logs are actually stored in a [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) file. | ||
|
||
In general, there are two ways to deal with CSV files in PM4Py: | ||
|
||
- **Import the CSV into a [Pandas](https://pandas.pydata.org) [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv);** | ||
In general, most existing algorithms in PM4Py are coded to be flexible in terms of their input. | ||
If a certain event log object is provided that is not in the right form, we translate it to the appropriate form for you. | ||
Hence, after importing a DataFrame, most algorithms are directly able to work with the DataFrame. | ||
|
||
## Importing CSV files | ||
|
||
|
||
Apart from the IEEE XES standard, a lot of event logs are actually stored in a | ||
CSV | ||
file (https://en.wikipedia.org/wiki/Comma-separated_values) | ||
. | ||
In general, there is two ways to deal with CSV files in pm4py: | ||
, | ||
|
||
- Import the CSV into a | ||
pandas (https://pandas.pydata.org) | ||
|
||
DataFrame (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) | ||
; | ||
In general, most existing algorithms in pm4py are coded to be flexible in terms | ||
of their | ||
input, i.e., if a certain event log object is provided that is not in the right | ||
form, we | ||
translate it to the appropriate form for you. | ||
Hence, after importing a dataframe, most algorithms are directly able to work | ||
with the | ||
data frame. | ||
, | ||
|
||
- Convert the CSV into an event log object (similar to the result of the IEEE XES | ||
importer | ||
presented in the previous section); | ||
In this case, the first step is to import the CSV file using pandas (similar to | ||
the | ||
previous bullet) and subsequently converting it to the event log object. | ||
In the remainder of this section, we briefly highlight how to convert a pandas | ||
DataFrame | ||
to an event log. | ||
Note that most algorithms use the same type of conversion, in case a given | ||
event data | ||
object is not of the right type. | ||
|
||
|
||
The example code on the right shows how to convert a CSV file into the pm4py | ||
internal event data object types. | ||
By default, the converter converts the dataframe to an Event Log object (i.e., not | ||
an Event Stream). | ||
- **Convert the CSV into an event log object** (similar to the result of the IEEE XES importer presented in the previous section); | ||
In this case, the first step is to import the CSV file using Pandas (similar to the previous bullet) and subsequently convert it to the event log object. | ||
In the remainder of this section, we briefly highlight how to convert a Pandas DataFrame to an event log. | ||
Note that most algorithms use the same type of conversion in case a given event data object is not of the right type. | ||
|
||
The example code on the right shows how to convert a CSV file into the PM4Py internal event data object types. | ||
By default, the converter converts the DataFrame to an Event Log object (i.e., not an Event Stream). | ||
|
||
```python | ||
import pandas as pd | ||
import pm4py | ||
|
||
if __name__ == "__main__": | ||
dataframe = pd.read_csv('tests/input_data/running-example.csv', sep=',') | ||
dataframe = pm4py.format_dataframe(dataframe, case_id='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp') | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
dataframe = pd.read_csv('tests/input_data/running-example.csv', sep=',') | ||
dataframe = pm4py.format_dataframe(dataframe, case_id='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp') | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
``` | ||
|
||
Note that the example code above does not directly work in many cases. Let us consider a very simple example event log and assume it is stored as a `csv` file: | ||
|
||
Note that the example code above does not directly work in a lot of cases. Let us consider a very simple example event log, and, assume it is stored | ||
as a | ||
`csv`, | ||
|
||
-file: | ||
|
||
|
||
|
||
|CaseID|Activity|Timestamp|clientID| | ||
|---|---|---|---| | ||
|1|register request|20200422T0455|1337| | ||
|2|register request|20200422T0457|1479| | ||
|1|submit payment|20200422T0503|1337| | ||
||||| | ||
|
||
|
||
|
||
In this small example table, we observe four columns, i.e., | ||
`CaseID` | ||
, | ||
`Activity` | ||
, | ||
`Timestamp` | ||
and | ||
`clientID` | ||
. | ||
Clearly, when importing the data and converting it to an Event Log object, we aim to | ||
combine all rows (events) with the same value for the | ||
`CaseID` | ||
column | ||
together. | ||
Another interesting phenomenon in the example data is the fourth column, i.e., | ||
`clientID` | ||
. | ||
In fact, the client ID is an attribute that will not change over the course of | ||
execution | ||
a process instance, i.e., it is a | ||
case-level attribute | ||
. | ||
pm4py allows us to specify that a column actually describes a case-level attribute | ||
(under the assumption that the attribute does not change during the execution of a | ||
process). | ||
|
||
The example code on the right shows how to convert the previously examplified csv | ||
data file. | ||
After loading the csv file of the example table, we rename the | ||
`clientID` | ||
column to | ||
`case:clientID` | ||
(this is a specific operation provided by | ||
pandas!). | ||
| CaseID | Activity | Timestamp | clientID | | ||
|--------|------------------|--------------|----------| | ||
| 1 | register request | 20200422T0455 | 1337 | | ||
| 2 | register request | 20200422T0457 | 1479 | | ||
| 1 | submit payment | 20200422T0503 | 1337 | | ||
| | | | | | ||
|
||
In this small example table, we observe four columns: `CaseID`, `Activity`, `Timestamp`, and `clientID`. | ||
Clearly, when importing the data and converting it to an Event Log object, we aim to combine all rows (events) with the same value for the `CaseID` column together. | ||
Another interesting phenomenon in the example data is the fourth column, `clientID`. | ||
In fact, the client ID is an attribute that will not change over the course of executing a process instance; i.e., it is a case-level attribute. | ||
PM4Py allows us to specify that a column actually describes a case-level attribute (under the assumption that the attribute does not change during the execution of a process). | ||
|
||
The example code on the right shows how to convert the previously exemplified CSV data file. | ||
After loading the CSV file of the example table, we rename the `clientID` column to `case:clientID` (this is a specific operation provided by Pandas!). | ||
|
||
```python | ||
import pandas as pd | ||
import pm4py | ||
|
||
if __name__ == "__main__": | ||
dataframe = pd.read_csv('tests/input_data/running-example-transformed.csv', sep=',') | ||
dataframe = dataframe.rename(columns={'clientID': 'case:clientID'}) | ||
dataframe = pm4py.format_dataframe(dataframe, case_id='CaseID', activity_key='Activity', timestamp_key='Timestamp') | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
dataframe = pd.read_csv('tests/input_data/running-example-transformed.csv', sep=',') | ||
dataframe = dataframe.rename(columns={'clientID': 'case:clientID'}) | ||
dataframe = pm4py.format_dataframe(dataframe, case_id='CaseID', activity_key='Activity', timestamp_key='Timestamp') | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
``` | ||
|
||
|
||
|
||
|
||
## Converting Event Data | ||
|
||
In this section, we describe how to convert event log objects from one object type to another. | ||
There are three objects that we can switch between: Event Log, Event Stream, and DataFrame objects. | ||
Please refer to the previous code snippet for an example of applying log conversion (applied when importing a CSV object). | ||
Finally, note that most algorithms internally use the converters to handle an input event data object of any form. | ||
In such cases, the default parameters are used. | ||
|
||
In this section, we describe how to convert event log objects from one object type | ||
to another object type. | ||
There are three objects, which we are able to 'switch' between, i.e., Event Log, | ||
Event Stream and Data Frame objects. | ||
Please refer to the previous code snippet for an example of applying log conversion | ||
(applied when importing a CSV object). | ||
Finally, note that most algorithms internally use the converters, in order to be | ||
able to handle an input event data object of any form. | ||
In such a case, the default parameters are used. | ||
To convert from any object to an event log, the following method can be used: | ||
|
||
|
||
```python | ||
import pm4py | ||
if __name__ == "__main__": | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
event_log = pm4py.convert_to_event_log(dataframe) | ||
``` | ||
|
||
|
||
To convert from any object to an event stream, the following method can be used: | ||
|
||
|
||
```python | ||
import pm4py | ||
if __name__ == "__main__": | ||
event_stream = pm4py.convert_to_event_stream(dataframe) | ||
event_stream = pm4py.convert_to_event_stream(dataframe) | ||
``` | ||
|
||
|
||
To convert from any object to a dataframe, the following method can be used: | ||
|
||
To convert from any object to a DataFrame, the following method can be used: | ||
|
||
```python | ||
import pm4py | ||
if __name__ == "__main__": | ||
dataframe = pm4py.convert_to_dataframe(dataframe) | ||
dataframe = pm4py.convert_to_dataframe(dataframe) | ||
``` | ||
|
||
## Exporting IEEE XES Files | ||
|
||
|
||
|
||
## Exporting IEEE XES files | ||
|
||
|
||
Exporting an Event Log object to an IEEE Xes file is fairly straightforward in pm4py. | ||
Consider the example code fragment on the right, which depicts this | ||
functionality. | ||
|
||
Exporting an Event Log object to an IEEE XES file is straightforward in PM4Py. | ||
Consider the example code fragment on the right, which depicts this functionality. | ||
|
||
```python | ||
import pm4py | ||
if __name__ == "__main__": | ||
pm4py.write_xes(log, 'exported.xes') | ||
pm4py.write_xes(log, 'exported.xes') | ||
``` | ||
|
||
In the example, the `log` object is assumed to be an Event Log object. | ||
The exporter also accepts an Event Stream or DataFrame object as input. | ||
However, the exporter will first convert the given input object into an Event Log. | ||
Hence, in this case, standard parameters for the conversion are used. | ||
Thus, if the user wants more control, it is advisable to apply the conversion to an Event Log prior to exporting. | ||
|
||
In the example, the | ||
`log` | ||
object is assumed to be an Event Log object. | ||
The exporter also accepts an Event Stream or DataFrame object as an input. | ||
However, the exporter will first convert the given input object into an Event Log. | ||
Hence, in this case, standard parameters for the conversion are used. | ||
Thus, if the user wants more control, it is advisable to apply the conversion to | ||
Event Log, prior to exporting. | ||
|
||
|
||
|
||
## Exporting logs to CSV | ||
|
||
|
||
To export an event log to a | ||
`csv`, | ||
|
||
-file, pm4py uses Pandas. | ||
Hence, an event log is first converted to a Pandas Data Frame, after which it is | ||
written to disk. | ||
|
||
## Exporting Logs to CSV | ||
|
||
To export an event log to a `csv` file, PM4Py uses Pandas. | ||
Hence, an event log is first converted to a Pandas DataFrame, after which it is written to disk. | ||
|
||
```python | ||
import pandas as pd | ||
import pm4py | ||
|
||
if __name__ == "__main__": | ||
dataframe = pm4py.convert_to_dataframe(log) | ||
dataframe.to_csv('exported.csv') | ||
dataframe = pm4py.convert_to_dataframe(log) | ||
dataframe.to_csv('exported.csv') | ||
``` | ||
|
||
|
||
|
||
In case an event log object is provided that is not a dataframe, i.e., an Event Log | ||
or Event Stream, the conversion is applied, using the default parameter values, | ||
i.e., as presented in the | ||
Converting | ||
Event Data (#item-convert-logs) | ||
section. | ||
Note that exporting event data to as csv file has no parameters. | ||
In case more control over the conversion is needed, please apply a conversion to | ||
dataframe first, prior to exporting to csv. | ||
|
||
|
||
In case an event log object is provided that is not a DataFrame, i.e., an Event Log or Event Stream, the conversion is applied using the default parameter values, as presented in the [Converting Event Data](#converting-event-data) section. | ||
Note that exporting event data as a CSV file has no parameters. | ||
If more control over the conversion is needed, please apply a conversion to a DataFrame first prior to exporting to CSV. | ||
|
||
## I/O with Other File Types | ||
|
||
|
||
At this moment, I/O of any format supported by Pandas (dataframes) is implicitly | ||
supported. | ||
As long as data can be loaded into a Pandas dataframe, pm4py is reasonably able to work | ||
with such files. | ||
At this moment, I/O of any format supported by Pandas (DataFrames) is implicitly supported. | ||
As long as data can be loaded into a Pandas DataFrame, PM4Py is reasonably able to work with such files. |
Oops, something went wrong.