-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
268 lines (169 loc) · 9.32 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
---
output:
# html_document:
# css: github.css
github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
# projmaker ![](figures/logo.png)
## Overview
The _projmaker_ should help to setup a project (*create_project.cmd*) at
GeoInsights and clean-up when ready (*cleanup_project.cmd*).
In future projects, a common structure for folders and a common guideline for
file naming should be applied.
In general, these tools help you to setup a corpus (minimal set) of folders
structured according to a general workflow typical at GeoInsights.
Feel free to add or delete folders (e.g. *0111_regiograph_files*).
There should be ample room for individual modifications to create a best
structure for any given project.
Especially the top level folder structure should remain as proposed here.
Currently, the convention for naming project folders is
* for Market Data: *studyname_CTR_YYYY*, e.g. *superproject_CZE_2019*
* for Client Projects: *clientproject_CTR_YYYYMM*, e.g. *megaservice_CZE_201904*.
This means:
* no spaces, german umlauts, special characters, etc.
* *capital ISO3 country codes* (cf. https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3).
If you have suggestions please contact
[Christoph Stepper](mailto:[email protected]) or
[Anja Waldmann](mailto:[email protected]).
## Current structure
At the moment, the directory tree for an initialized project looks like:
[//]: # (run in cmd: tree /f and copy output from cmd)
```bash
├───00_basedata # raw (delivered) data sets and data documentation; input only - i.e. never to be overwritten!
│ ├───001_data # the data itself
│ └───002_metadata # documentation for the data
├───01_analysis # anything to do with analysis/work-in-progress
│ ├───0101_data # intermediate data sets, e.g. results from individual analysis modules (tip: name subfolders corresponding to your R-scripts and save your data)
│ ├───0110_code # all code files, i.e. R code, py code, SAS code, etc.
│ │ ├───01100_r_functions # R functions necessary for projects (longer than a 3-liner), but not worth to be put into a GIpackage; sourced within scripts to avoid code repetition
│ │ ├───01101_r_scripts # R scripts for all analysis steps/modules, named in a comprehensible way (tip: number scrips in the order they need to be executed)
│ │ └───01102_r_markdowns # if applicable, R markdown files (eg. for documentations etc.)
│ ├───0120_figures # any static or interactive visualisations generated during the analysis
│ ├───0121_leaflets # any map visualisation generated during the analysis
│ ├───0130_checks # anything that is to be checked by people other than the analysis author, e.g. excel comparison files in purchasing power
│ └───0140_misc # place for things that somehow do not fit into any of the above, e.g. colour definitions for logos
├───02_docu # project documentation and final checks (Checkliste)
├───03_results # final results that are to be delivered to the client or that are to be pushed to our official products
└───04_presentations # things for kick-of/intermediate/final presentations
```
## General remarks
If you work with _R_, it is generally a good idea to set the __Workspace Options__ as follows:
1. Uncheck _Restore .RData into workspace at startup_
2. Set _Save workspace to .RData on exit:_ to __Never__
Additionally, I strongly advice you to uncheck _Always save history (even when not saving .RData)_.
Have you ever had a look into your saved *.Rhistory* files in any project? I didn't.
You can specify your settings in the _RStudio Options_ menu.
![](figures/workspace_settings_RStudio.png)
It is advisable to use _RStudio Projects_ when working in a project mostly done in R.
Doing so, you can use relative paths to navigate to files within the project and
the project keeps working when moved to another location (e.g. another drive).
See more below in section [Setup RStudio project](#setup-rstudio-project).
## Setup and workflow
### Setup
To use the *projmaker*, the easiest option would be to clone the repository to
your machine and use it from there.
If you only want to use one batch file, feel free to download it directly.
### Generate folder structure
To generate a valid folder structure, just follow these steps:
1. Copy the *create_project.cmd* into the root directory where the new project
should be locatad in.
2. Execute the file by left double-click.
3. Follow the user prompts to setup the folder structure.
+ Enter a valid name for the study, etc.
+ Create directory structure.
+ (Optionally) setup *RStudio Project*.
+ Delete *cmd* from the root directory (either automatically or manually).
Here are two examples for both cases:
* Market Data
![](figures/create_project_marketdata.png)
* Client Project
![](figures/create_project_clientproject.png)
### Setup RStudio project
When a new RStudio project is created,
* a project file (named _\*.Rproj_) is created within the project directory,
containing various project settings,
* a hidden directory (named _.Rproj.user_) is created, where project specific
temporary files are stored.
![](figures/proj_4.png)
If you want to version control your project with _Git_, navigate to the Project
Options and select _Git_ as Version control system.
Then a new git repository gets initialized (hidden _.git_ folder and
_.gitignore_ file in project working directory).
![](figures/project_options_1.png)
![](figures/project_options_2.png)
More info on working with RStudio projects can be found here (worth reading):
[Using Projects](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
To setup your _RStudio_ project, you have two options:
#### 1. Directly create project file (named _\*.Rproj_) within *create_project.cmd*
This file contains a predefined set of project options:
```bash
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
```
I might change these settings, if you continue to keep the global options in
_RStudio_ untouched and save the *.Rhistory* files in your projects.
See section [General remarks](#general-remarks) above.
#### 2. Use the RStudio IDE and execute the following:
1. Navigate to _Project: (None)_ (topright in RStudio IDE) and click on _New Project_
![](figures/proj_1.png)
2. Select: Create Project in _Existing Directory_
![](figures/proj_2.png)
3. Navigate to your project directory to set this as project working directory
![](figures/proj_3.png)
### Clean-up empty directories
There might be some folders in your project directory tree which you did
not use while working on your project.
In order to keep our folder structure as tidy as possible, we'd advice you to
remove all empty folders after finishing your analysis.
You can use the *cleanup_project.cmd* to execute this task:
1. Copy the *cleanup_project.cmd* into the root directory of your project
(e.g. "P:/03_marketdata/CZE/superproject_CZE_2019").
2. Execute the file by left double-click.
3. Follow the user prompts to remove all empty directories (incl. subdirectories)
+ Delete *cmd* from the root directory (either automatically or manually)
Here's an examples for removing all empty directories:
![](figures/cleanup_project.png)
## Further Notes
### Spinning R files
If you want to spin R files (--> Markdown --> html) within a project, you should set the _R Markdown Option_ as follows:
* Evaluate chunks in directory: _Project_
![](figures/RMarkdown_settings_RStudio.png)
### Windows Context Menu
> NOTE: Make changes to the Windows *registry* with caution. Edits may harm the
behavior of your computer.
If you're tired of copying the batch files over and over again to the target
directories, you can add them to the _Right Click Menu_ of your file explorer.
Therefore, you need to tweak the *registry*. Simply add two new keys for the
batch files into the following root in your registry.
```
Computer\HKEY_CLASSES_ROOT\Directory\Background\shell
```
You can add them either manually by opening the *Registry Editor* or by
executing the provided *.reg* files (in the "reg" directory) - make sure that
the links are pointing to the right location where your batch files sit!
Check if the keys were correctly set in your registry:
1. Lauch __regedit.exe__ from the Start menu
![](figures/regedit.png)
2. Navigate to the __HKEY_CLASSES_ROOT\Directory\Background\shell__ key
and check if there are the two newly created keys *create_project* and
*cleanup_project*.
![](figures/registry_directory_background.png)
3. You can evaluate the given value for your key (Default)
string by double-clicking *(Default)* and see what's in the *Value data* field.
![](figures/registry_command_value.png)
Now you can easily open the batch scripts at any location. Just upen the
Right Click Context Menu (by right clicking in the background -
*not on a file or directory*) and select the task you want to run.
It gets executed in the current directory.
![](figures/right_click_menu.png)