Skip to content

[Data Product] Workflow Summarizer

Sowmya N Dixit edited this page Jan 28, 2019 · 1 revision

Summary

  • Type - Workflow Summary
  • Granularity - SESSION
  • Dimensions - sid, env, pdata, channel
  • Computation Level - Level 1 (computed from raw events)
  • Frequency - Runs Daily

Purpose

Workflow Summary used to report summary and timespent on per session basis which is then used to generate usage metrics. This will replace Learner Session Summary, Genie Launch Summary, App Session Summary, Content Editor Session Summary and Textbook Session Summary.

Inputs

  • RAW telemetry events.

Output

ME_WORKFLOW_SUMMARY

schema of Workflow Summary

{
    "eid": "ME_WORKFLOW_SUMMARY",
    "ets": long, // Event generation time in epoch
    "syncts": long, // Event sync time in epoch
    "ver": String, // telemetry version number
    "mid": String, // Unique message id.
    "uid": String, // User id of the app,
    "context": {
        "pdata": {
            "id": "AnalyticsDataPipeline",
            "mod": "WorkflowSummarizer",
            "ver": "1.0"
        },
        "granularity": "SESSION",
        "date_range": {
            "from": long, // Time in milli-seconds - epoch format
            "to": long // Time in milli-seconds - epoch format
        },
        "rollup": { // Optional. Only 4 levels are allowed.
            "l1": "",
            "l2": "",
            "l3": "",
            "l4": ""
        },
        "cdata": [{ // Optional. correlation data
            "type":"", // Required. Used to indicate action that is being correlated
            "id": "" // Required. The correlation ID value
        }]
    },
    "object": { // Optional. Object which is the subject of the event.
        "id": , // Required. Id of the object. For ex: content id incase of content
        "type": , // Required. Type of the object. For ex: "Content", "Community", "User" etc.
        "ver": , // Optional. version of the object
        "rollup": { // Optional. Rollups to be computed of the object. Only 4 levels are allowed.
            "l1": "",
            "l2": "",
            "l3": "",
            "l4": ""
        }
    },
    "dimensions" : {
        "channel": String,
        "sid": String, // Session id
        "did": String, //Optional, Device id of the game play
        "pdata": { // Optional. Producer of the event
            "id": , // Required. unique id assigned to that component
            "pid": , // Optional. In case the component is distributed, then which instance of that component
            "ver":  // Optional. version number of the build
        },      
        "type": String //, value could be : app, session, player... etc (`edata.type` value from first event)
        "mode": String // `edata.mode` value from first event
    },
    "tags": [String],
    "edata": {
        "eks": {
            "start_time": Long, // Epoch Timestamp of start. Retrieved from first event.
            "end_time": Long, // Epoch Timestamp of end. Retrieved from last event.
            "time_spent": Double, // Total time spent in seconds excluding idle time.
            "time_diff": Double, //  Diff between the last event and first event in seconds.
            "interact_events_count": Long, // Count of interact events
            "interact_events_per_min": Double, // Count of interact events per minute
            "telemetry_version": String, //Version of the telemetry 1.0 or 2.0
            "env_summary": [{
                "env": String, // High level env within the app (content, domain, resources, community)
                "time_spent": Double, // Time spent per env
                "count": Long // count of times the environment has been visited
            }],
            "events_summary": [{
                "id": String, // event id such as START, END, INTERACT etc.
                "count": Long // Count of events.
            }],
            "page_summary": [{
                "id": String, // Page id
                "type": String, // type of page - view/edit
                "env": String, // env of page
                "time_spent": Double, // Time taken per page
                "visit_count": Long // Number of times each page was visited
            }],
            "mode": String, // Optional, present only when env = ContentPlayer. preview OR play
            "item_responses": [{ // Optional, present only when env = ContentPlayer
                "itemId": String, // qid passed in the ASSESS event
                "itype": String, // Item type fetched from Learning Platform - MCQ etc
                "ilevel": String, // Item level fetched from Learning Platform - EASY, MEDIUM, DIFFICULT etc
                "timeSpent": double, // Time spent on the item in seconds. Retrieved from ASSESS event
                "exTimeSpent": double, // Expected Time spent on the item in seconds. Fetched from Learning Platform
                "res": String, // Response of the item if the item is descriptive in nature. Retrieved from ASSESS event
                "exRes": String, // Expected response of the item. Fetched from Learning Platform
                "incRes": String, // Incorrect responses for the item. Fetched from Learning Platform
                "mmc": Array, // Missing micro concepts. Fetched from Learning Platform
                "mc": Array, // Micro concepts related to the item. Fetched from Learning Platform
                "score": int, // Score. Retrieved from ASSESS event
                "time_stamp": long, // Epoch Timestamp when the event is created. Retrieved from ASSESS event
                "maxScore": int, // Max score for the item. Fetched from Learning Platform
                "domain": String // Domain of the item
            }]
        }
    }
}

Config Parameters

  • idleTime - Time in seconds. If the time diff between two events is greater than idleTime, the event "timeSpent" is ignored. Defaults to 600 (seconds).
  • producerId - Producer Id of the job. Defaults to AnalyticsDataPipeline
  • modelId - Model ID. Defaults to WorkflowSummarizer
  • modelVersion - Model version. Defaults to 1.0

Algorithm

  • Group By did
    • Launch Summary Computation
    • Group By channel AND/OR app_id & sid
      • Other Summary Computation

Test Scenarios

  • Scenario 1: Generate 6 workflow summary events - 1 default app summary without START event, 1 player summary with did1 having END event, 1 session & 3 editor summaries with did2 without END events.

  • Scenario 2: Generate 3 workflow summary - 2 app summaries & 1 session summary with same did, channel & pdata.id and all without END events.

  • Scenario 3: Generate 5 workflow summary with session breaking logic, idle time exclusion in timespent computation, 1 summary with item response.

Validation

  • For the workflow sample from Diksha App below:
app start
	session start 
	____
	____
	session end 

	session start - (guest 1)
		resource start
			play start 
			
			play end
		worksheet end

		resource start
			play start 
			
			play end
		worksheet end

		resource start
			play start 
			
			play end
		worksheet end

		worksheet start
			play start 
				- 30mins idle
			play end
		worksheet end

app end
——
—— (other events)
Summary counts:
    - App Summaries: 3
    - Session Summaries: 3
    - Player Summaries: 5
    - Resource Summaries: 4
    - Worksheet Summaries: 2
  • For the workflow sample from Diksha Portal below(Staging):
session start
	____
	____
	play start 
			
	play end
	____
	____
		
	play start 
	____
	____
	____
session end (incomplete closing of the browser)	
Summary counts:
    - Session Summaries: 1
    - Player Summaries: 2
  • For the course consumption workflow sample from Diksha Portal below(Staging):
____ (some events)
____

	session start
	____
	____
		workflow start (course start)
		____
		____
			play start
			____
			____
			play end
		____
		____
		workflow end
	____
	session end

Summary counts:
    - App Summaries: 1
    - Session Summaries: 1
    - Workflow Summaries: 1
    - Player Summaries: 1