A simple project for relative beginners at Scala working on data engineering tasks.
The task is to create a service that analyses play-by-play data to create a summary of the game that is known as a box score. The NBA has full data from this game, including box score, play-by-play data, and highlights, at https://watch.nba.com/game/20200306/MILLAL Our box score is a bit simpler than the real box score, but you can look at the real box score to get a feel for what we're after.
The play-by-play data is in a CSV file that we have included in the repository. The data was obtained from the sample log at https://www.bigdataball.com/nba-historical-playbyplay-dataset/.
EventService
- Themain
application you are to build up. Its job is to:- Read events until there are no more.
- Collate a
BoxScore
from the events. - Post the
BoxScore
to theBoxScoreService
, ensuring there are no errors.
DataSource
- Translates raw CSV data to a structured event format.BoxScoreService
- Contains the definition of aBoxScore
and a (fake) method topost
it to a remote endpoint.
- Data Modeling
-
Examine the data under src/main/resources to get a sense of the structure and fields.
-
Decide how to encode the CSV rows as a data structure (algebraic data type). Create this type as
DataSource.Event
. -
What methods are available to use from the
CsvReader
? How do they work? You can use the console, viasbt console
, etc. The console (REPL) has tab-completion, which is very handy:> sbt console ... scala> import io.innerproduct.nba._ import io.innerproduct.nba._ scala> val reader = DataSource.reader reader: kantan.csv.CsvReader[kantan.csv.ReadResult[List[String]]] = kantan.codecs.resource.ResourceIterator$$anon$3@75331de9 scala> reader.<TAB> ... lots of methods here ... scala> reader.next res2: kantan.csv.ReadResult[List[String]] = Right(List(GAME-ID, Season that this data sets belongs to., Game ...
-
Implement
DataSource.event()
to translate an event from a row of the CSV.
-
- Reading events.
- In
EventService
, read all the available events. Decide how to handle errors during reading, if any. - Design the transformation of
DataSource.Event
values into the datatypes ofBoxScoreService
. How do you accumulate the necessary state? How do you validate the necessary inputs and outputs? - Sidebar:
Validated
.
- In
- Post valid events.
- Invoke
BoxScoreService.post
and handle successes and errors.
- Invoke