ML Custom Data Product for PII CSV #58
Replies: 20 comments 128 replies
-
@Shakthieshwari @Prateek-slokam if we were to draw a parallel with existing Diksha capabilities, this would be similar to the Course mentor getting access to user details of those people who have given consent? In this context, this becomes a Program, and the details are of users who are part of the Program? @reshmi-nair Given consent reports are part of Lern, this would also be so? |
Beta Was this translation helpful? Give feedback.
-
Lastest Comment by @SanthoshVasabhaktula in this #54 forum, Continuing the conversation here @Shakthieshwari @aishwaryashikshalokam - I am not sure I understand clearly what is meant by filtering. Is it that the job will be submitted with a filter? If so, filters can be added during exhaust job submission and the data product can process it.
But if it is to access via the UI, I am not sure how a password protected file can be displayed in UI. Can anyone elaborate on this approach?
Another question - How is the consent information captured and recorded? Is the current consent API going to be used for it?
It is fine to capture user details during joining the program and storing it in Cassandra or any other transactional DB. However, the PII data should be encrypted before storing.
In addition, if it is a very specific use case targeted to be only used in DIK, create the job in DIK repository but not SB-LERN.
|
Beta Was this translation helpful? Give feedback.
-
@SanthoshVasabhaktula Can we please get on a call for 30-45mins to review our design either tomorrow or on monday ? please let us know your availability ? Thanks |
Beta Was this translation helpful? Give feedback.
-
@rhwarrier As we discussed, you will confirm us that in which Building Block does this Data product will go ? I updated this data product in the component list as well... |
Beta Was this translation helpful? Give feedback.
-
@SanthoshVasabhaktula As per our discussion, I have updated the doc. https://project-sunbird.atlassian.net/wiki/spaces/AN/pages/3266675098/Cassandra+Approach+for+PII+data Please review and provide your approval for the same ... @rhwarrier can you please provide your approval? if we can use Sunbird-Lern Building Block for this data-product, as we could not discuss this in our today's call. Thanks |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari The data product can be part of Lern. You can initiate work once you close on the DC design review. Thanks . |
Beta Was this translation helpful? Give feedback.
-
@reshmi-nair @SanthoshVasabhaktula Can you please let us know in which BB/Git repo this PII Python Flink Job with Kubernetes will be moved ? |
Beta Was this translation helpful? Give feedback.
-
@kumarks1122
Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi @kumarks1122 We are working on Scala code for creating a new Flink job in data-pipeline, So can we use this in our module ? -CC @Shakthieshwari |
Beta Was this translation helpful? Give feedback.
-
Hi @kumarks1122 We have started implementation of data product for PII similar to UserInfoExhaust Job, I am facing 1 issue :- Trying to save the dataframe into local disk, calling this function
Please find my analysis below i have done to debug this issue Values passed to this functions 👎
Dataframe :- +--------------------+--------------------------+------------------------------+-------------------------+----------------+---------------------+--------------------+--------------------+--------------+---------+--------+----------+-----------+--------------------+-------------+-------------+--------------------+ saveToBlobStore function is present in sunbird-analytics-core git repo, I debugged each line by line by putting debugger/print statements. But could not figure out any issues. And also I tried to execute OnDemandDruidExhaustJob Data Product, this also calls the same saveToBlobStore Function only present in analytic-core, But here its saves data into local disk. Attaching the Screenshot of Intellij Below for your reference ... Below is the Model Config :-
Wondering what could be the issue in the PII Data Product and also there is no error, due to which unable to figure out this issue. Can you please help us out here ? Thanks |
Beta Was this translation helpful? Give feedback.
-
@reshmi-nair ML Data Product for PII is part of 5.2 Sunbird Lern BB release. As per the timeline, we will be raising a PR by 10th March. Once the PR is raised and merged, how is the deployment being followed ? where we will deployment happens? Because till now we have not contributed anything on Lern BB, so we wanted to know the process of deployment and how the testing will happen forward ? Can you please help us out here Thanks |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari Pls refer this PR for adding a new dataproduct in to lern. PR link Also we need to add the job id into below config file in devops repo. If any private configurations pls add into sunbird-devops-private repo 'release-5.2.0-lern' branch |
Beta Was this translation helpful? Give feedback.
-
Hi @reshmi-nair As part of 5.2 Sunbird Lern ML Data Product for PII is created, to support this with Cassandra table, ML Data Pipeline Flink job is also built. Can you please help us on this Thanks |
Beta Was this translation helpful? Give feedback.
-
@reshmi-nair ML PII- Program UserInfoExhaust Data Product Implementation is completed, Can you please review and merge the below PR ? And also changes is done in devops repo as well, Please find the below PR I have pinged devops to merge this PR as well. Please do the needful at the earliest .... Thanks Cc- @vijiurs |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari I've set up some time tomorrow. Let's find a way to take this forward in a manner that works for all. |
Beta Was this translation helpful? Give feedback.
-
Hi @reshmi-nair , I have raised the PR in sunbird-utils, Please find the link below I have also created the release-notes, Please find the link below -CC @vijiurs Thanks! |
Beta Was this translation helpful? Give feedback.
-
@reshmi-nair As per this https://docs.google.com/document/d/1Pq0oWdeSNWWw2mGQelXpGmj8b_NS5Q3ndt0folpj2r4/edit# Design Doc, Can you please help us resolve the below mentioned queries :-
And also once, Code is pushed and PR is merged for UserInfoExhaust Data Product in Sunbird-Lern BB from your end, Please let us know. We will pull in our local and do the changes for our PII Data Product similar to UserInfoExhaust Thanks |
Beta Was this translation helpful? Give feedback.
-
@reshmi-nair can you pls let the SL team know? |
Beta Was this translation helpful? Give feedback.
-
@AmiableAnil @kumarks1122 can you pls check ? |
Beta Was this translation helpful? Give feedback.
-
@Shakthieshwari @aks30 Please plan to deploy dependent program related service in Lern dev env for the ML changes in data-product. |
Beta Was this translation helpful? Give feedback.
-
Hi,
As part of next release-5.2, We are developing a PII Feature for Manage Learn Use-Case, as part of this we have requirement for CSV Exhaust.
We are planning to create a new ML custom data product for PII CSV from Cassandra . Below are the details :-
Approach is detailed out in the design doc, Please check.
@SanthoshVasabhaktula @reshmi-nair @rhwarrier @kameshbhr Can you please let us know your availability this week to review? and also let us know the plan for 5.2 release in Lern BB?
Cc- @aishwaryashikshalokam @Prateek-slokam @aks30 @vijiurs @kiranharidas187
Beta Was this translation helpful? Give feedback.
All reactions