-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Study/file size #58
Comments
@MalloryJfeldman would have more insight on this, but I can tell you that the HRV and EDA output data from our pilot study (N = 67 individuals) adds up to about 2-3 GBs. Child HRV output data (N = 43) adds < 1 GB. For comparison, the last study I managed had 150 families (nested data for parent-child pairs) at two time points for physio. Not all families had 2 parents involved, but you can see how the data multiply. And I am only talking about physio output data. Many GBs would be added if a user started bringing in other types (e.g., surveys; observational codes) after the psyphr_study was aggregated. |
Wow, looks like I need to do some extra thinking. |
Right now as I'm trying to figure out the best approach, I need to know some common characteristics in downstream analyses. Some detailed use cases will help. For example, what are some frequently used statistical models? Are modeling usually done for each and every subject, or across some kind of summation of a group? |
I think I miscalculated; see here for HRV output data for 67 individuals read and wrangled in R.. |
Maybe you were referring to raw ECG signals? That could make more sense.
…On Sat, Jul 20, 2019, 4:18 PM Kathleen Wendt ***@***.***> wrote:
@MalloryJfeldman <https://github.com/MalloryJfeldman> would have more
insight on this, but I can tell you that the HRV and EDA output data from
our pilot study (N = 67 individuals) adds up to about 2-3 GBs. Child HRV
output data (N = 43) adds < 1 GB.
For comparison, the last study I managed had 150 families (nested data for
parent-child pairs) at two time points for physio. Not all families had 2
parents involved, but you can see how the data multiply.
And I am only talking about physio output data. Many GBs would be added if
a user started bringing in other types (e.g., surveys; observational codes)
after the psyphr_study was aggregated.
I think I miscalculated; see here
<https://media.discordapp.net/attachments/575140896249479184/601979176806776833/Screen_Shot_2019-07-19_at_21.29.02.png>
for HRV output data for 67 individuals read and wrangled in R..
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#58?email_source=notifications&email_token=AKE6JFWTB5UMCUJVUOZ6R7DQANXH5A5CNFSM4IE6D5D2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NVITI#issuecomment-513496141>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKE6JFTAUL5J2ZVHG7SMKQ3QANXH5ANCNFSM4IE6D5DQ>
.
|
Maybe. I thought I was checking the properties of only the output files. Oh well. |
Hey- sorry I'm coming to this late. Our studies can generate close up to ~2 GB in output files. Like I said - we never actually ran our experience sampling data through proprietary software so I don't have a good sense for what that might look like (I think this study is not very representative but I would suspect that if we did try and run our experience sampling data through Mindware, we would generate closer to 5-6 GB in output). I think in general, it's fairly typical to generate output files across 2-5 channels for one person for sessions that last between 1 and 4 hours. So thats' 2-5 output files per person containing summaries of physio data from 1-4 hours of recording. I'd say a typical sample is between 50 and 150 subjects; although people are pushing for more these days. For within-subject analysis these numbers can be lower. |
Looking at "Mallory Pilot 1" here, out of 600+ MBs of raw data comes only 1MB of I know we're only dealing with workbooks at the moment, but it makes me wonder: following the above ratio, would 2GBs of output be coming from 1.2TB of input? Wow, that's massive! |
We want
psyphr
to work on a normal laptop, which nowadays has somewhere between 4-12G's of usable memory, and R normally should not use more than half of the total memory. Currentlyread_study()
reads everything all at once. A really big study can create a problem.If the problem exists, there are at least two ways to mitigate the problem:
What is a likely the total size of a study? I'm looking for a figure at about the 80th percentile, and I surely hope it will be small enough.
The text was updated successfully, but these errors were encountered: