Skip to content

Commit

Permalink
Added before and after to split recipe
Browse files Browse the repository at this point in the history
  • Loading branch information
javiber committed Apr 17, 2024
1 parent c434ab2 commit 15bce7e
Showing 1 changed file with 12 additions and 38 deletions.
50 changes: 12 additions & 38 deletions docs/src/recipes/split_timestamp.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -92,42 +92,24 @@
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"from datetime import datetime, timedelta\n",
"\n",
"# Define boundaries for train/validation/test\n",
"train_until = datetime(2020, 3, 1).timestamp()\n",
"val_until = datetime(2020, 5, 1).timestamp()"
]
},
{
"cell_type": "markdown",
"id": "d9e540ec-3b8b-4ba0-a727-23c71ca56cd0",
"metadata": {},
"source": [
"### 1. Convert the timestamps into a feature\n",
"\n",
"The `timestamps()` operator creates a single-feature `EventSet` with the unix timestamp of each event, keeping the indexes and samplings compatible with the original `EventSet`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92545fdc-ee14-4ea3-aa85-02b6ef252a56",
"metadata": {},
"outputs": [],
"source": [
"# Get the data timestamps as a feature\n",
"sample_timestamps = sample_evset.timestamps()"
"train_until = datetime(2020, 3, 1)\n",
"val_until = datetime(2020, 5, 1)\n",
"one_milisecond = timedelta(milliseconds=1)"
]
},
{
"cell_type": "markdown",
"id": "58d52487-3006-4bf9-9c4c-dad3d47db300",
"metadata": {},
"source": [
"### 2. Split based on timestamps\n",
"### Split based on timestamps\n",
"\n",
"Now we compare the timestamps feature to the boundary timestamps of each subset. This will create boolean `EventSets` that can be passed directly to the `filter()` operator."
"Using before and after we can split the dataset. Notice that both functions are \n",
"non-inclusive so we need to add one milisecond in one of the cases so that the \n",
"values in the limit are included in one of the sets"
]
},
{
Expand All @@ -137,9 +119,9 @@
"metadata": {},
"outputs": [],
"source": [
"train_evset = sample_evset.filter(sample_timestamps <= train_until)\n",
"val_evset = sample_evset.filter((sample_timestamps > train_until) & (sample_timestamps <= val_until))\n",
"test_evset = sample_evset.filter(sample_timestamps > val_until)"
"train_evset = sample_evset.before(train_until + one_milisecond)\n",
"val_evset = sample_evset.after(train_until).before(val_until + one_milisecond)\n",
"test_evset = sample_evset.after(val_until)"
]
},
{
Expand Down Expand Up @@ -179,14 +161,6 @@
"source": [
"test_evset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6349e115-f4f8-4ada-82c7-dbd0ee695e31",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -205,7 +179,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
"version": "3.9.18"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 15bce7e

Please sign in to comment.