Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Statistic / Aggregate Handlers #26

Closed
48 of 60 tasks
Maelstromeous opened this issue Jul 5, 2020 · 8 comments
Closed
48 of 60 tasks

Create Statistic / Aggregate Handlers #26

Maelstromeous opened this issue Jul 5, 2020 · 8 comments
Assignees
Labels
do next Issues needing picking up next enhancement New feature or request p1 Show-stopping issues

Comments

@Maelstromeous
Copy link
Member

Maelstromeous commented Jul 5, 2020

As part of #18 we now have a solid foundation where messages are being handled and will then be proxied off for processing. In effect the collection and validation of messages is now done.

Next step is to process the data into a useful format for the Alerts Statiatics dataset itself.

In the old code, there were very crude implementations which enabled aggregation of statistics, e.g. for a player how many kills they got on a per-alert basis for example. This was done by simply incrementing a number in a database row. However, it was done in a very poor manner which meant any changes to the database schema were a royal ballache, as the implementation was basically a vast set of update queries, each being a special snowflake.

Therefore, we now need to create Aggregate Handlers which updates the appropiate sections of the data set when we receieve relevent events. This is required to reduce the vast amount of processing required to generate the statistics, due to the sheer volume of the data. E.g., when we get a PlayerDeath event, we need to do the following things:

  1. Validate the message (done)
  2. Create a record for that player for the particular alert
  3. Create a record (if it doesn't already exist) for the player for their Global Statistics (will explain why it's done here shortly)
  4. If records already exist, increment the statistic for that player.

There are a few things to consider with this approach:

  • Are we going to retain player weapon data? e.g. Maelstrome26 got 20 kills with the Lasher in Alert #101234
  • Are we wanting to aggregate an entire set of a player's statistics on a per-alert level or globally? Would be nice to do it on a per-alert level where in the frontend we can denote "Player Maelstrome26 killed 5 players in a Magrider this alert" etc.
  • Are we comfortable logging a lot of player-level statistics? It can be quite a lot of data.
  • I'm also not proposing we store every single event we have on a player. We must do aggregation, update the relevent stat then drop the excess data. We're talking potentially millions of events per alert otherwise, which will fill up our disks very soon.
  • Do we wish to have a certain timeframe where very detailed statistics e.g. player's weapons on a vehicle is wiped after a certain period to compress data sizes?

From memory, below are the aggregates that the old site used:

Alert level aggregates

Player specific

  • AlertPlayerAggregate - this holds general stats e.g. kills, deaths, TKs, suicides, headshots, etc
    • Model
    • Implemented in EventHandler(s)
  • AlertPlayerWeaponsAggregate - this holds per-alert per-player per-weapon statistics, mainly kills and headshots
    • Model
    • Implemented in EventHandler(s)
  • AlertPlayerVehiclesAggregate - this holds per-player per-vehicle statistics
    • Model
    • Implemented in EventHandler(s)
  • AlertPlayerClassAggregate - holds metrics per-player per-class (this is in the current code but not exposed anywhere)
    • Model
    • Implemented in EventHandler(s)

Outfit specific

  • AlertOutfitAggregate - this holds per-outfit per-alert stats, combat, facilitycontrol captures etc
    • Model
    • Implemented in EventHandler(s)

Potentially we could add AlertOutfitVehicle and AlertOutfitWeapon aggregate but debatable for the usage of this

Alert aggregates

  • AlertWeaponAggregate - this holds total kills with that weapon per alert
    • Model
    • Implemented in EventHandler(s)
  • AlertVehicleAggregate - this holds total vehicle kills / deaths per alert
    • Model
    • Implemented in EventHandler(s)
  • AlertFactionCombatAggregate - holds combat statistics for each faction
    • Model
    • Implemented in EventHandler(s)
  • AlertClassAggregate - holds combat metrics for each class
    • Model
    • Implemented in EventHandler(s)

Global level aggregates

  • GlobalOutfitAggregate - this holds per-outfit stats globally
    • Model
    • Implemented in EventHandler(s)
  • GlobalPlayerAggregate - this holds per-player stats globally, including number of alerts involved etc
    • Model
    • Implemented in EventHandler(s)
  • GlobalPlayerWeaponAggregate - weapon use per-player per "globally" (using world ID seperation)
    • Model
    • Implemented in EventHandler(s)
  • GlobalWeaponAggregate - this holds per-weapon global stats
    • Model
    • Implemented in EventHandler(s)
  • GlobalVehicleAggregate - this holds per-vehicle global stats
    • Model
    • Implemented in EventHandler(s)
  • GlobalVehiclePlayerAggregate - vehicle stats use per player globally
    • Model
    • Implemented in EventHandler(s)
  • GlobalFactionAggregate - stats per-faction globally
    • Model
    • Implemented in EventHandler(s)

Proposed new aggregates

The below was never in the original code (it was calculated from the API and cost CPU cycles to figure it out) and would be very nice to add.

  • AlertFacilityAggregate - holds the number of times a facility changes hands etc (this is currently calculated in API / frontend code)
    • Model
    • Implemented in EventHandler(s)
  • AlertExperienceAggregate - holds the experience types. We can choose which ones we want to track (e.g. medical)
    • Model
    • Implemented in EventHandler(s)
  • GlobalFacilityAggregate - holds captures and defenses for each facility ID
    • Model
    • Implemented in EventHandler(s)
  • GlobalClassAggregate - holds class metrics split by world for all servers
    • Model
    • Implemented in EventHandlers(s)
@Maelstromeous Maelstromeous added the p2 Disruptive issues but not show stoppers label Jul 5, 2020
@Maelstromeous Maelstromeous added this to the Websocket revival milestone Jul 5, 2020
@Maelstromeous Maelstromeous added enhancement New feature or request p1 Show-stopping issues and removed p2 Disruptive issues but not show stoppers labels Jul 5, 2020
@microwavekonijn
Copy link
Member

Seems good. What I would suggest is look into a database setup where we have a performant database for the most accessed data(e.g. recent alerts, data that is displayed on the landing page; also it might be nice to look into splitting it into a read and write) and one that is used for long time storage(e.g. older alerts). With such a setup we might be able to go further by storing more detailed alert data(e.g. kill list) which are later dropped when it is put into the archive database. Also it might be an idea to only keep the alerts older then a month that were during prime time.

Lastly I just want to point out the current way of accumulating data is mostly possible because of linearity. We might want to consider a system that is flexible enough to deal with non-linear data collection. Might be a good idea too talk to the data-analysts about this before we commit to any approach.

P.S.: Is it an idea to create a survey to get an idea of how people used the old ps2alerts and what they expect from the new one?

@Maelstromeous Maelstromeous added the discussion Issues requiring extra discussion label Jul 5, 2020
@Neelesh99
Copy link

This looks like a really good change, those aggregates will definitely enrich the statistics that we can report back to both individual players and globally. On the suggestion of the hybrid database topology, I think it is a good suggestion but we need to implement it carefully, in that we want to put in checks to ensure data integrity when transferred especially if it is large amounts.

Once implemented though, It would certainly help us on the backend to pull data off the archival database for analytics while we use the performant one for player queries on the front end.

With respect to the linearity of data collection, I think you raise a really good point. Just recently census was down for a few days, I'm not sure if the data was retrospectively available for that silent period, but if it were we should be able to take advantage of it. I personally can't see any problems from an analytics side from the non-linear collection as long as it is used for data-recovery rather than as our primary accumulation technique. With linear accumulation, (assuming it's availability) we can have live statistics and rolling averages etc.

@Hailot
Copy link
Contributor

Hailot commented Jul 8, 2020

Looks good so far.. the aggregation is definitely something that is needed, will increase data lookup times alot.. however.. i am a big fan of retaining raw data aswell, atleast for important/interesting things. like Deaths, facilitycaptures.. could provide some nice stats to display to users if not now, then in the future

@marci4
Copy link
Contributor

marci4 commented Jul 8, 2020

If we keep the raw data, we should take a look at views to aggregate values via the database itself

@Maelstromeous
Copy link
Member Author

If we hold raw data we need to be very careful about the retension policy. I'm not going to allow a permant collection of events, the dataset will be absolutely gigantic and it will consume gigabytes of block storage in a matter of days.

@microwavekonijn
Copy link
Member

microwavekonijn commented Jul 8, 2020

I don't know if it is totally representative of the data generated by the event stream. But based on what I collected during a 7h period during prime time on EU across all servers I found(based on the exported json files):

Event MB/h
BattleRankUp 0.5
FacilityControl 0.7
PlayerLogout 1.3
PlayerLogin 1.4
PlayerFacilityCapture 2.3
PlayerFacilityDefend 3.8
VehicleDestroy 6.8
AchievementEarned 16.3
SkillAdded 17.0
ItemAdded 18.3
Death 51.8
GainExperience 307.6

In mongo compared to the json files was also able to store it at 65% of the size. With that if you store 2h of data(so you have some room when it comes to alerts), that seems very reasonable.

Edit: Should be MB/h not per second :P

@Maelstromeous
Copy link
Member Author

I'd say a 48 hour "ultra detail" retention is acceptable. Not GainExperience though, that will have to be aggregated unfortunately.

@Maelstromeous Maelstromeous pinned this issue Jul 10, 2020
@Maelstromeous Maelstromeous self-assigned this Jul 15, 2020
@Maelstromeous Maelstromeous added do next Issues needing picking up next and removed discussion Issues requiring extra discussion labels Jul 20, 2020
Maelstromeous pushed a commit that referenced this issue Jul 22, 2020
Maelstromeous pushed a commit that referenced this issue Jul 23, 2020
Maelstromeous pushed a commit that referenced this issue Jul 23, 2020
Maelstromeous pushed a commit that referenced this issue Jul 23, 2020
#26 - AlertPlayerAggregate & GlobalPlayerAggregate
Maelstromeous pushed a commit that referenced this issue Jul 23, 2020
#26 - AlertWeaponAggregate & GlobalWeaponAggregate
@Maelstromeous
Copy link
Member Author

Remaining aggregates have been split into their own issues #194 #195 #196 #197 therefore closing this parent issue as we can individually track things better.

@Maelstromeous Maelstromeous unpinned this issue Nov 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do next Issues needing picking up next enhancement New feature or request p1 Show-stopping issues
Projects
None yet
Development

No branches or pull requests

5 participants