-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide the possibility to split data into activity-based equally-sized windows #49
Comments
We just need this with the argument |
With the changes in PR #96, the section is now here: https://github.com/clhunsen/codeface-extraction-r/blob/d421807df819ca2173421401dc9a198954b68774/util-split.R#L773 Also, the specifiy change needs to take place here: https://github.com/clhunsen/codeface-extraction-r/blob/d421807df819ca2173421401dc9a198954b68774/util-misc.R#L174 |
Currently, we have the possibility to split data in a time-based manner by specifying a time period or specific bins. However, we do not have the possibility to specify the number of windows. With this commit, we add this functionality. To implement this functionality, the functions 'split.data.time.based' and 'split.get.bins.time.based' both get a new parameter 'number.windows' and the function 'generate.date.sequence' a parameter 'length.out'. Additionally, adjust function documentation appropriately. This fixes se-sic#49. Signed-off-by: Claus Hunsen <[email protected]>
Currently, we have the possibility to split data in a time-based manner by specifying a time period, specific bins, or the number of resulting time windows. However, we do not have the latter possibility for network splitting. With this commit, we add this functionality. To implement this functionality, the functions 'split.network.time.based' and 'split.networks.time.based' both get a new parameter to handle this functionality. Additionally, streamline other 'split.*' functions for uniformity regarding the 'number.windows' and 'sliding.window' parameters. Finally, adjust function documentation appropriately. This is a follow-up for commit 40974ba. Hopefully, this really fixes se-sic#49. Thanks to @bockthom for pointing this out in his review on PR se-sic#140. Signed-off-by: Claus Hunsen <[email protected]>
Currently, we have the possibility to split networks activity-based by either specifying the number of edges per network or by specifying the number of windows.
However, we do not have this possibility for data-based splitting, we can only specify the number of commits resp. e-mails, but not the number of windows.
So, I suggest to implement a function that computes the activity amount based on the number of wanted windows. Example:
In the case of activity-based network splitting,
input.size
is the overall number of edges.In the case of activity-based data splitting,
input.size
is the overall number of commits resp. e-mails.So, both functions
split.data.activity.based
andsplit.network.activity.based
should provide a parameternumber.windows
and both call the above defined functionget.size.of.equally.sized.windows
when the parameternumber.windows
is given.[In addition, one could think of providing a function for determining equally sized windows also for time-based splitting. In that case, there is no difference between network-based or data-based time-based splitting -- we only need the very first and very last date in the data source to determine a time-period for equally-sized windows given by the amount of windows wanted. However, this will only make sense after #38 is closed.]
The text was updated successfully, but these errors were encountered: