From b48f3feff4f95b72035e7ce06a05e872b037ee2c Mon Sep 17 00:00:00 2001 From: KHT77300 Date: Tue, 10 Jan 2017 13:21:59 +0100 Subject: [PATCH 1/3] Solutions to Exercise 1. --- .../Exerciseset1/ExerciseSet1_KHT77300.Rmd | 234 ++++++++++++++++++ 1 file changed, 234 insertions(+) create mode 100644 Exercises/Exerciseset1/ExerciseSet1_KHT77300.Rmd diff --git a/Exercises/Exerciseset1/ExerciseSet1_KHT77300.Rmd b/Exercises/Exerciseset1/ExerciseSet1_KHT77300.Rmd new file mode 100644 index 00000000..d152ff46 --- /dev/null +++ b/Exercises/Exerciseset1/ExerciseSet1_KHT77300.Rmd @@ -0,0 +1,234 @@ + +--- +title: "Exercise Set 1" +author: "T. Evgeniou" +runtime: shiny +output: html_document +--- + + +
+ +The purpose of this exercise is to become familiar with: + +1. Basic statistics functions in R; +2. Simple matrix operations; +3. Simple data manipulations; +4. The idea of functions as well as some useful customized functions provided. + +While doing this exercise we will also see how to generate replicable and customizable reports. For this purpose the exercise uses the R Markdown capabilities (see [Markdown Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) or a [basic introduction to R Markdown](http://rmarkdown.rstudio.com/authoring_basics.html)). These capabilities allow us to create dynamic reports. For example today's date is `r Sys.Date()` (you need to see the .Rmd to understand that this is *not* a static typed-in date but it changes every time you compile the .Rmd - if the date changed of course). + +Before starting, make sure you have pulled the [exercise files](https://github.com/InseadDataAnalytics/INSEADAnalytics/tree/master/Exercises/Exerciseset1) on your github repository (if you pull the course github repository you also get the exercise set files automatically). Moreover, make sure you are in the directory of this exercise. Directory paths may be complicated, and sometimes a frustrating source of problems, so it is recommended that you use these R commands to find out your current working directory and, if needed, set it where you have the main files for the specific exercise/project (there are other ways, but for now just be aware of this path issue). For example, assuming we are now in the "MYDIRECTORY/INSEADAnalytics" directory, we can do these: + +```{r echo=TRUE, eval=FALSE, tidy=TRUE} +#getwd() + +#setwd("Exercises/Exerciseset1/") + +#list.files() +``` + +**Note:** you can always use the `help` command in Rstudio to find out about any R function (e.g. type `help(list.files)` to learn what the R function `list.files` does). + +Let's now see the exercise. + +**IMPORTANT:** You should answer all questions by simply adding your code/answers in this document through editing the file ExerciseSet1.Rmd and then clicking on the "Knit HTML" button in RStudio. Once done, please post your .Rmd and html files in your github repository. + +
+
+ +### Exercise Data + +We download daily prices (open, high, low, close, and adjusted close) and volume data of publicly traded companies and markets from the web (e.g. Yahoo! or Google, etc). This is done by sourcing the file data.R as well as some helper functions in herpersSet1.R which also installs a number of R libraries (hence the first time you run this code you will see a lot of red color text indicating the *download* and *installation* process): + +```{r eval = TRUE, echo=TRUE, error = FALSE, warning=FALSE,message=FALSE,results='asis'} +source("helpersSet1.R") +source("dataSet1.R") +``` + +For more information on downloading finance data from the internet as well as on finance related R tools see these starting points (there is a lot more of course available): + +* [Some finance data loading tools](http://www.r-bloggers.com/r-code-yahoo-finance-data-loading/) +* [Connecting directly to Bloomberg](http://www.r-bloggers.com/rblpapi-connecting-r-to-bloomberg/) +* [Some time series plot tools](http://www.r-bloggers.com/plotting-time-series-in-r-using-yahoo-finance-data/) +* [Various finance code links](https://cran.r-project.org/web/views/Finance.html) +* [More links](http://blog.revolutionanalytics.com/2013/12/quantitative-finance-applications-in-r.html) +* [Even more links](http://www.r-bloggers.com/financial-data-accessible-from-r-part-iv/) +* Of course endless available code (e.g. like this one that seems to [get companies' earnings calendars](https://github.com/gsee/qmao/blob/master/R/getCalendar.R)) + +#### Optional Question + +1. Can you find some interesting finance related R package or github repository? +**Your Answers here:** +
+
+ +
+
+ +### Part I: Statistics of S&P Daily Returns + +We have `r nrow(StockReturns)` days of data, starting from `r rownames(StockReturns)[1]` until `r tail(rownames(StockReturns),1)`. Here are some basic statistics about the S&P returns: + +1. The cumulative returns of the S&P index during this period is `r round(100*sum(StockReturns[,1]),1)`%. +2. The average daily returns of the S&P index during this period is `r round(100*mean(StockReturns[,1]),3)`%; +2. The standard deviation of the daily returns of the S&P index during this period is `r round(100*sd(StockReturns[,1]),3)`%; + +Here are returns of the S&P in this period (note the use of the helper function pnl_plot - defined in file helpersSet1.R): + +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig.height=4,fig.width= 6, fig=TRUE} +SPY = StockReturns[,"SPY"] +pnl_plot(SPY) +``` + +#### Questions + +1. Notice that the code also downloads the returns of Apple during the same period. Can you explain where this is done in the code (including the .R files used)? +2. What are the cumulative, average daily returns, and the standard deviation of the daily returns of Apple in the same period? +3. *(Extra points)* What if we want to also see the returns of another company, say Yahoo!, in the same period? Can you get that data and report the statistics for Yahoo!'s stock, too? + +**Your Answers here:** +
1. mytickers = c("SPY", "AAPL") +
2. The cumulative returns of the AAPL index during this period is 596.4%. The average daily returns of the AAPL index during this period is 0.148%. The standard deviation of the daily returns of the APPL index during this period is 2.39%. +
+
+ +
+
+ +### Part II: Simple Matrix Manipulations + +For this part of the exercise we will do some basic manipulations of the data. First note that the data are in a so-called matrix format. If you run these commands in RStudio (use help to find out what they do) you will see how matrices work: + +```{r eval = FALSE, echo=TRUE} +class(StockReturns) +dim(StockReturns) +nrow(StockReturns) +ncol(StockReturns) +StockReturns[1:4,] +head(StockReturns,5) +tail(StockReturns,5) +``` + +We will now use an R function for matrices that is extremely useful for analyzing data. It is called *apply*. Check it out using help in R. + +For example, we can now quickly estimate the average returns of S&P and Apple (of course this can be done manually, too, but what if we had 500 stocks - e.g. a matrix with 500 columns?) and plot the returns of that 50-50 on S&P and Apple portfolio: + +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig=TRUE} +portfolio = apply(StockReturns,1,mean) +names(portfolio) <- rownames(StockReturns) +pnl_plot(portfolio) +``` + + +We can also transpose the matrix of returns to create a new "horizontal" matrix. Let's call this matrix (variable name) transposedData. We can do so using this command: `transposedData = t(StockReturns)`. + +#### Questions + +1. What R commands can you use to get the number of rows and number of columns of the new matrix called transposedData? +2. Based on the help for the R function *apply* (`help(apply)`), can you create again the portfolio of S&P and Apple and plot the returns in a new figure below? + +**Your Answers here:** +1. nrow(StockReturns); ncol(StockReturns). Typing dim(StockReturns) will provide both. +2. +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig=TRUE} +transposedData = t(StockReturns) +portfolio2 = apply(transposedData,2,mean) +names(portfolio2) <- rownames(transposedData) +pnl_plot(portfolio2) +``` +
+
+ +
+
+ +### Part III: Reproducibility and Customization + +This is an important step and will get you to think about the overall process once again. + +#### Questions + +1. We want to re-do all this analysis with data since 2001-01-01: what change do we need to make in the code (hint: all you need to change is one line - exactly 1 number! - in data.R file), and how can you get the new exercise set with the data since 2001-01-01? +2. *(Extra Exercise)* Can you get the returns of a few companies and plot the returns of an equal weighted portfolio with those companies during some period you select? + +**Your Answers here:** +1. It's already this way. +startDate = "2001-01-01" +
+
+
+ +
+
+ +### Part IV: Read/Write .CSV files + +Finally, one can read and write data in .CSV files. For example, we can save the first 20 days of data for S&P and Apple in a file using the command: + +```{r eval = TRUE, echo=TRUE, comment=NA, warning=FALSE, message=FALSE,results='asis'} +write.csv(StockReturns[1:20,c("SPY","AAPL")], file = "twentydays.csv", row.names = TRUE, col.names = TRUE) +``` + +Do not get surpsised if you see the csv file in your directories suddenly! You can then read the data from the csv file using the read.csv command. For example, this will load the data from the csv file and save it in a new variable that now is called "myData": + +```{r eval = TRUE, echo=TRUE, comment=NA, warning=FALSE, message=FALSE,results='asis'} +myData <- read.csv(file = "twentydays.csv", header = TRUE, sep=";") +``` + +Try it! + +#### Questions + +1. Once you write and read the data as described above, what happens when you run this command in the console of the RStudio: `sum(myData != StockReturns[1:20,])` +2. *(Extra exercise)* What do you think will happen if you now run this command, and why: + +```{r eval = FALSE, echo=TRUE} +myData + StockReturns[1:40,] +``` + +**Your Answers here:** +1. The data isn't usable, and there are errors. The result is "20"; upon closer look, we need to replace the code with myData2 <- read.csv(file = "twentydays.csv", header = TRUE, sep=",") so that R can read the data properly. +
+
+
+ +
+
+ +### Extra Question + +Can you now load another dataset from some CSV file and report some basic statistics about that data? + +
+ +### Creating Interactive Documents + +Finally, just for fun, one can add some interactivity in the report using [Shiny](http://rmarkdown.rstudio.com/authoring_shiny.html).All one needs to do is set the eval flag of the code chunk below (see the .Rmd file) to "TRUE", add the line "runtime: shiny" at the very begining of the .Rmd file, make the markdown output to be "html_document", and then press "Run Document". + +```{r, eval=TRUE, echo = TRUE} +sliderInput("startdate", "Starting Date:", min = 1, max = length(portfolio), + value = 1) +sliderInput("enddate", "End Date:", min = 1, max = length(portfolio), + value = length(portfolio)) + +renderPlot({ + pnl_plot(portfolio[input$startdate:input$enddate]) +}) +``` + +
+ +
+
+ +### Endless explorations (optional homework) + +This is a [recent research article](http://poseidon01.ssrn.com/delivery.php?ID=851091091009083082092113118102076099034023058067019062072066007100008111081022102123034016097101060099003106125099002090116089026058012038004030005113111105079028059062024121067073126072090091089069014121102110107075029090001011087028011082124103085&EXT=pdf) that won an award in 2016. Can you implement a simple strategy as in Figure 1 of this paper? You may find these R commands useful: `names`, `which`, `str_sub`,`diff`,`as.vector`, `length`, `pmin`, `pmax`, `sapply`, `lapply`,`Reduce`,`unique`, `as.numeric`, `%in%` +![A Simple Trading Startegy](simpletrade.png) + +What if you also include information about bonds? (e.g. download the returns of the the ETF with ticker "TLT") Is there any relation between stocks and bonds? + + +**Have fun** + From f0dfe4cec76e2cbacc53d481b2eb23c4e116a182 Mon Sep 17 00:00:00 2001 From: KHT77300 Date: Thu, 12 Jan 2017 13:38:30 +0100 Subject: [PATCH 2/3] terst --- CourseSessions/Sessions23/InteractiveFactorAnalysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CourseSessions/Sessions23/InteractiveFactorAnalysis.Rmd b/CourseSessions/Sessions23/InteractiveFactorAnalysis.Rmd index d46e6cd2..4c74ac71 100644 --- a/CourseSessions/Sessions23/InteractiveFactorAnalysis.Rmd +++ b/CourseSessions/Sessions23/InteractiveFactorAnalysis.Rmd @@ -1,6 +1,6 @@ --- title: "Derived Attributes and Dimensionality Reduction: Interactive Tool" -author: "T. Evgeniou" +author: "T. Evgeniou 4" runtime: shiny output: html_document: From 0b9d60a1417058bd22ed5f417b95589134d1f88f Mon Sep 17 00:00:00 2001 From: KHT77300 Date: Mon, 30 Jan 2017 21:08:42 +0100 Subject: [PATCH 3/3] Submission with answers --- Exercises/Exerciseset2/ExerciseSet2_Kate.Rmd | 395 +++++++++++++++++++ 1 file changed, 395 insertions(+) create mode 100644 Exercises/Exerciseset2/ExerciseSet2_Kate.Rmd diff --git a/Exercises/Exerciseset2/ExerciseSet2_Kate.Rmd b/Exercises/Exerciseset2/ExerciseSet2_Kate.Rmd new file mode 100644 index 00000000..7242ee09 --- /dev/null +++ b/Exercises/Exerciseset2/ExerciseSet2_Kate.Rmd @@ -0,0 +1,395 @@ +--- +title: "Exercise Set 2: A $300 Billion Strategy" +author: "Kate Tsunoda" +output: html_document +--- + +
+ +The purpose of this exercise is to become familiar with: + +1. Some time series analysis tools; +2. Correlation matrices and principal component analysis (PCA) (see [readings of sessions 3-4](http://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions23/FactorAnalysisReading.html)); +3. More data manipulation and reporting tools (including Google Charts). + +As always, while doing this exercise we will also see how to generate replicable and customizable reports. For this purpose the exercise uses the R Markdown capabilities (see [Markdown Cheat Sheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) or a [basic introduction to R Markdown](http://rmarkdown.rstudio.com/authoring_basics.html)). These capabilities allow us to create dynamic reports. For example today's date is `r Sys.Date()` (you need to see the .Rmd to understand that this is *not* a static typed-in date but it changes every time you compile the .Rmd - if the date changed of course). + +Before starting, make sure you have pulled the [exercise set 2 souce code files](https://github.com/InseadDataAnalytics/INSEADAnalytics/tree/master/Exercises/Exerciseset2) on your github repository (if you pull the course github repository you also get the exercise set files automatically). Moreover, make sure you are in the directory of this exercise. Directory paths may be complicated, and sometimes a frustrating source of problems, so it is recommended that you use these R commands to find out your current working directory and, if needed, set it where you have the main files for the specific exercise/project (there are other ways, but for now just be aware of this path issue). For example, assuming we are now in the "Data Analytics R version/INSEADAnalytics" directory, we can do these: + +```{r echo=TRUE, eval=FALSE, tidy=TRUE} +#getwd() + +#setwd("Exercises/Exerciseset2/") + +#list.files() +``` + +**Note:** as always, you can use the `help` command in Rstudio to find out about any R function (e.g. type `help(list.files)` to learn what the R function `list.files` does). + +Let's now see the exercise. + +**IMPORTANT:** You should answer all questions by simply adding your code/answers in this document through editing the file ExerciseSet2.Rmd and then clicking on the "Knit HTML" button in RStudio. Once done, please post your .Rmd and html files in your github repository. + +
+ +### The Exercise: Introduction + +For this exercise we will use the Futures' daily returns to develop what is considered to be a *"classic" hedge fund trading strategy*, a **futures trend following strategy**. There is a lot written about this, so it is worth doing some online search about "futures trend following", or "Managed Futures", or "Commodity Trading Advisors (CTA)". There is about **[$300 billion](http://www.barclayhedge.com/research/indices/cta/Money_Under_Management.html)** invested on this strategy today, and is considered to be one of the **oldest hedge fund strategies**. Some example links are: + +* [A fascinating report on 2 centuries of trend following from the CFM hedge - a $6 billion fund](https://www.trendfollowing.com/whitepaper/Two_Centuries_Trend_Following.pdf) +* [Another fascinating report on 1 century of trend following investing from AQR - a $130 billion fund](https://www.aqr.com/library/aqr-publications/a-century-of-evidence-on-trend-following-investing) +* [Wikipedia on CTAs](https://en.wikipedia.org/wiki/Commodity_trading_advisor) +* [Morningstar on CTAs](http://www.morningstar.co.uk/uk/news/69379/commodity-trading-advisors-(cta)-explained.aspx) +* [A report](http://perspectives.pictet.com/wp-content/uploads/2011/01/Trading-Strategies-Final.pdf) +* [Man AHL (a leading hedge fund on CTAs - among others) - an $80 billion fund](https://www.ahl.com) + +Of course there are also many starting points for developing such a strategy (for example [this R bloggers one](http://www.r-bloggers.com/system-from-trend-following-factors/) (also on [github](https://gist.github.com/timelyportfolio/2855303)), or the [turtle traders website](http://turtletrader.com) which has many resources. + +In this exercise we will develop our own strategy from scratch. + +*Note (given today's market conditions):* **Prices of commodities, like oil or gold, can be excellent indicators of the health of the economy and of various industries, as we will also see below**. + +### Getting the Futures Data + +There are many ways to get futures data. For example, one can use the [Quandl package,](https://www.quandl.com/browse) or the [turtle traders resources,](http://turtletrader.com/hpd/) or (for INSEAD only) get data from the [INSEAD library finance data resources](http://sites.insead.edu/library/E_resources/ER_subject.cfm#Stockmarket) website. One has to pay attention on how to create continuous time series from underlying contracts with varying deliveries (e.g. see [here](https://www.quantstart.com/articles/Continuous-Futures-Contracts-for-Backtesting-Purposes) ). Using a combination of the resources above, we will use data for a number of commodities. + + +### Data description + +Let's load the data and see what we have. + +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +source("helpersSet2.R") +library(googleVis) +load("data/FuturesTrendFollowingData.Rdata") +``` + +
+We have data from `r head(rownames(futures_data),1)` to `r tail(rownames(futures_data),1)` of daily returns for the following `r ncol(futures_data)` futures: + +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE, results='asis'} +show_data = data.frame(colnames(futures_data)) +m1<-gvisTable(show_data,options=list(showRowNumber=TRUE,width=1920, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE,page='disable')) +print(m1,'chart') +``` +
+ + + +### Basic data analysis + +Let's see how these are correlated. Let's also make it look nicer (than, say, what we did in Exercise Set 1), using [Google Charts](https://code.google.com/p/google-motion-charts-with-r/wiki/GadgetExamples) (see examples online, e.g. [examples](https://cran.r-project.org/web/packages/googleVis/vignettes/googleVis_examples.html) and the [R package used used](https://cran.r-project.org/web/packages/googleVis/googleVis.pdf) ).The correlation matrix is as follows (note that the table is "dynamic": for example you can sort it based on each column by clicking on the column's header) + +
+ + +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE, results='asis'} +show_data = data.frame(cbind(colnames(futures_data), round(cor(futures_data),2))) +m1<-gvisTable(show_data,options=list(width=1920, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE)) +print(m1,'chart') +``` + +
+ +We see quite high correlations among some of the futures. Does it make sense? Why? Do you see some negative correlations? Do those make sense? +Given such high correlations, we can try to see whether there are some "principal components" (see [reading on dimensionality reduction](http://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions23/FactorAnalysisReading.html)). This analysis can also indicate whether all futures (the global economy!) are driven by some common "factors" (let's call them **"risk factors"**). + +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +Variance_Explained_Table_results<-PCA(futures_data, graph=FALSE) +Variance_Explained_Table<-cbind(paste("component",1:ncol(futures_data),sep=" "),Variance_Explained_Table_results$eig) +Variance_Explained_Table<-as.data.frame(Variance_Explained_Table) +colnames(Variance_Explained_Table)<-c("Component","Eigenvalue", "Percentage_of_explained_variance", "Cumulative_percentage_of_explained_variance") +``` + +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE, results='asis'} +show_data = data.frame(Variance_Explained_Table) +m1<-gvisTable(show_data,options=list(width=1920, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE,page='disable'),formats=list(Eigenvalue="#.##",Percentage_of_explained_variance="#.##",Cumulative_percentage_of_explained_variance="#.##")) +print(m1,'chart') +``` +
+ +Here is the scree plot (see Sessions 3-4 readings): +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +eigenvalues <- Variance_Explained_Table[,2] +``` + +```{r Fig1, echo=FALSE, comment=NA, results='asis', message=FALSE, fig.align='center', fig=TRUE} +df <- cbind(as.data.frame(eigenvalues), c(1:length(eigenvalues)), rep(1, length(eigenvalues))) +colnames(df) <- c("eigenvalues", "components", "abline") +Line <- gvisLineChart(as.data.frame(df), xvar="components", yvar=c("eigenvalues","abline"), options=list(title='Scree plot', legend="right", width=900, height=600, hAxis="{title:'Number of Components', titleTextStyle:{color:'black'}}", vAxes="[{title:'Eigenvalues'}]", series="[{color:'green',pointSize:3, targetAxisIndex: 0}]")) +print(Line, 'chart') +``` + +
+ +Let's now see how the 20 first (**rotated**) principal components look like. Let's also use the *rotated* factors (note that these are not really the "principal component", as explained in the [reading on dimensionality reduction](http://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions23/FactorAnalysisReading.html)) and not show any numbers less than 0.3 in absolute value, to avoid cluttering. Note again that you can sort the table according to any column by clicking on the header of that column. +
+ +```{r echo=TRUE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis',tidy=TRUE} +corused = cor(futures_data[,apply(futures_data!=0,2,sum) > 10, drop=F]) +Rotated_Results<-principal(corused, nfactors=20, rotate="varimax",score=TRUE) +Rotated_Factors<-round(Rotated_Results$loadings,2) +Rotated_Factors<-as.data.frame(unclass(Rotated_Factors)) +colnames(Rotated_Factors)<-paste("Component",1:ncol(Rotated_Factors),sep=" ") + +sorted_rows <- sort(Rotated_Factors[,1], decreasing = TRUE, index.return = TRUE)$ix +Rotated_Factors <- Rotated_Factors[sorted_rows,] +Rotated_Factors[abs(Rotated_Factors) < 0.3]<-NA +``` + +```{r echo=FALSE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis'} +show_data <- Rotated_Factors +show_data<-cbind(rownames(show_data),show_data) +colnames(show_data)<-c("Variables",colnames(Rotated_Factors)) +m1<-gvisTable(show_data,options=list(showRowNumber=TRUE,width=1220, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE,page='disable')) +print(m1,'chart') +``` +
+ +#### Questions: + +1. How many principal components ("factors") do we need to explain at least 50% of the variance in this data? +2. What are the highest weights (in absolute value) of the first principal component portfolio above on the `r ncol(futures_data)` futures? +3. Can we interpret the first 10 components? How would you call these factors? +4. Can you now generate the principal components and screen plot using only: a) the pre-crisis bull market years (e.g. only using the data between November 1, 2002, and October 1, 2007)? b) the financial crisis years (e.g. only using the data between October 1, 2007 and March 1, 2009), (Hint: you can select subsets of the data using for example the command `crisis_data = futures_data[as.Date(rownames(futures_data)) > "2007-10-01" & as.Date(rownames(futures_data)) < "2009-03-01", ]) +5. Based on your analysis in question 3, please discuss any differences you observe about the futures returns during bull and bear markets. What implications may these results have? What do the results imply about how assets are correlated during bear years compared to bull years? +6. (Extra - optional) Can you create an interactive (shiny based) tool so that we can study how the "**risk factors**" change ove time? (Hint: see [Exercise set 1](https://github.com/InseadDataAnalytics/INSEADAnalytics/blob/master/Exercises/Exerciseset1/ExerciseSet1.Rmd) and online resources on [Shiny](http://rmarkdown.rstudio.com/authoring_shiny.html) such as these [Shiny lessons](http://shiny.rstudio.com/tutorial/lesson1/). Note however that you may need to pay attention to various details e.g. about how to include Google Charts in Shiny tools - so keep this extra exercise for later!). + +
+ +**Your Answers here:** +
+ +1. Six components will explain at least 50% of the data. + +2. The highest weight is 21.94. This is the most important component. + +3. The first 10 components represent profiles of factors. The correlations illustrate the strength of a relationship (positive or negative) between characteristics. Where correlations are significant, the model groups them together under a component, or a factor. These factors create a profile that can be used to describe the data more simply. Below are some suggested names: + +
+Component 1: Bonds +
+Component 2: Currencies +
+Component 3: Stockmarkets representing a fraction of futures contract, group a +
+Component 4: Stockmarkets representing a fraction of futures contract, group b +
+Component 5: Daily reference rate +
+Component 6: Oil +
+Component 7: Metals +
+Component 8: Agricultural commodities +
+Component 9: Precious metals +
+Component 10: Asian markets +
+ +4. Principal components and screen plot +a) Pre-crisis bull market years (e.g. between November 1, 2002 and October 1, 2007) +
+ +```{r echo=TRUE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis',tidy=TRUE} +crisis_data = futures_data[as.Date(rownames(futures_data)) > "2002-11-01" & as.Date(rownames(futures_data)) < "2007-10-01", ] +corused = cor(crisis_data[,apply(crisis_data!=0,2,sum) > 10, drop=F]) +Rotated_Results<-principal(corused, nfactors=20, rotate="varimax",score=TRUE) +Rotated_Factors<-round(Rotated_Results$loadings,2) +Rotated_Factors<-as.data.frame(unclass(Rotated_Factors)) +colnames(Rotated_Factors)<-paste("Component",1:ncol(Rotated_Factors),sep=" ") + +sorted_rows <- sort(Rotated_Factors[,1], decreasing = TRUE, index.return = TRUE)$ix +Rotated_Factors <- Rotated_Factors[sorted_rows,] +Rotated_Factors[abs(Rotated_Factors) < 0.3]<-NA +``` + +```{r echo=FALSE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis'} +show_data <- Rotated_Factors +show_data<-cbind(rownames(show_data),show_data) +colnames(show_data)<-c("Variables",colnames(Rotated_Factors)) +m1<-gvisTable(show_data,options=list(showRowNumber=TRUE,width=1220, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE,page='disable')) +print(m1,'chart') +``` +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +eigenvalues <- Variance_Explained_Table[,2] +``` + +```{r Fig2, echo=FALSE, comment=NA, results='asis', message=FALSE, fig.align='center', fig=TRUE} +df <- cbind(as.data.frame(eigenvalues), c(1:length(eigenvalues)), rep(1, length(eigenvalues))) +colnames(df) <- c("eigenvalues", "components", "abline") +Line <- gvisLineChart(as.data.frame(df), xvar="components", yvar=c("eigenvalues","abline"), options=list(title='Scree plot', legend="right", width=900, height=600, hAxis="{title:'Number of Components', titleTextStyle:{color:'black'}}", vAxes="[{title:'Eigenvalues'}]", series="[{color:'green',pointSize:3, targetAxisIndex: 0}]")) +print(Line, 'chart') +``` + +
+ +b) Financial crisis years (between October 1, 2007 and March 1, 2009) +```{r echo=TRUE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis',tidy=TRUE} +crisis_data2 = futures_data[as.Date(rownames(futures_data)) > "2007-10-01" & as.Date(rownames(futures_data)) < "2009-03-01", ] +corused = cor(crisis_data2[,apply(crisis_data2!=0,2,sum) > 10, drop=F]) +Rotated_Results<-principal(corused, nfactors=20, rotate="varimax",score=TRUE) +Rotated_Factors<-round(Rotated_Results$loadings,2) +Rotated_Factors<-as.data.frame(unclass(Rotated_Factors)) +colnames(Rotated_Factors)<-paste("Component",1:ncol(Rotated_Factors),sep=" ") + +sorted_rows <- sort(Rotated_Factors[,1], decreasing = TRUE, index.return = TRUE)$ix +Rotated_Factors <- Rotated_Factors[sorted_rows,] +Rotated_Factors[abs(Rotated_Factors) < 0.3]<-NA +``` + +```{r echo=FALSE, comment=NA, warning=FALSE, error=FALSE,message=FALSE,results='asis'} +show_data <- Rotated_Factors +show_data<-cbind(rownames(show_data),show_data) +colnames(show_data)<-c("Variables",colnames(Rotated_Factors)) +m1<-gvisTable(show_data,options=list(showRowNumber=TRUE,width=1220, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE,page='disable')) +print(m1,'chart') +``` +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +eigenvalues <- Variance_Explained_Table[,2] +``` + +```{r Fig3, echo=FALSE, comment=NA, results='asis', message=FALSE, fig.align='center', fig=TRUE} +df <- cbind(as.data.frame(eigenvalues), c(1:length(eigenvalues)), rep(1, length(eigenvalues))) +colnames(df) <- c("eigenvalues", "components", "abline") +Line <- gvisLineChart(as.data.frame(df), xvar="components", yvar=c("eigenvalues","abline"), options=list(title='Scree plot', legend="right", width=900, height=600, hAxis="{title:'Number of Components', titleTextStyle:{color:'black'}}", vAxes="[{title:'Eigenvalues'}]", series="[{color:'green',pointSize:3, targetAxisIndex: 0}]")) +print(Line, 'chart') +``` +
+ +5.The components have changed both in composition as well as assigned weights. Therefore, the results imply that assets are correlated differently during bear years as compared to bull years. + +
+ +### A Simple Futures Trend Following Strategy + +We can now develop a simple futures trend following trading strategy, as outlined in the papers in the Exercise Introduction above. There are about $300 billion invested in such strategies! Of course we cannot develop here a sophisticated product, but with some more work... + +We will do the following: + +1. Calculate a number of moving averages of different "window lengths" for each of the `r ncol(futures_data)` futures - there are [many](http://www.r-bloggers.com/stock-analysis-using-r/) so called [technical indicators](http://www.investopedia.com/active-trading/technical-indicators/) one can use. We will use the "moving average" function `ma` for this (try for example to see what this returns `ma(1:10,2)` ). +2. Add the signs (can also use the actual moving average values of course - try it!) of these moving averages (as if they "vote"), and then scale this sum across all futures so that the sum of their (of the sum across all futures!) absolute value across all futures is 1 (hence we invest $1 every day - you see why?). +3. Then invest every day in each of the `r ncol(futures_data)` an amount that is defined by the weights calculated in step 2, using however the weights calculated using data until 2 days ago (why 2 days and not 1 day?) - see the use of the helper function `shift` for this. +4. Finally see the performance of this strategy. + +Here is the code. +
+ +```{r echo=TRUE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE, tidy=TRUE} +signal_used = 0*futures_data # just initialize the trading signal to be 0 +# Take many moving Average (MA) Signals and let them "vote" with their sign (+-1, e.g. long or short vote, for each signal) +MAfreq<-seq(10,250,by=20) +for (iter in 1:length(MAfreq)) + signal_used = signal_used + sign(apply(futures_data,2, function(r) ma(r,MAfreq[iter]))) +# Now make sure we invest $1 every day (so the sum of the absolute values of the weights is 1 every day) +signal_used = t(apply(signal_used,1,function(r) { + res = r + if ( sum(abs(r)) !=0 ) + res = r/sum(abs(r)) + res +})) +colnames(signal_used) <- colnames(futures_data) +# Now create the returns of the strategy for each futures time series +strategy_by_future <- scrub(shift(signal_used,2)*futures_data) # use the signal from 2 days ago +# finally, this is our futures trend following strategy +trading_strategy = apply(strategy_by_future,1,sum) +names(trading_strategy) <- rownames(futures_data) +``` + + +### Reporting the performance results + +Let's see how this strategy does: +
+
+ +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig.height=5,fig.width= 8, fig=TRUE} +pnl_plot(trading_strategy) +``` + +
+
+ +Here is how this strategy has performed during this period. +
+
+ +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE, results='asis'} +show_data = data.frame(cbind(rownames(pnl_matrix(trading_strategy)), round(pnl_matrix(trading_strategy),2))) +m1<-gvisTable(show_data,options=list(width=1220, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE)) +print(m1,'chart') +``` + +
+
+ +How does this compare with **existing CTA products** such as [this one from Societe Generale?](https://cib.societegenerale.com/fileadmin/indices_feeds/SG_CTA_Monthly_Report.pdf) (Note: one can easily achieve a correlation of more than 0.8 with this specific product - as well as with many other ones) + +![Compare our strategy with this product](societegenerale.png) + +
+ +#### Questions + +1. Can you describe in more detail what the code above does? +2. What happens if you use different moving average technical indicators in the code above? Please explore and report below the returns of a trading strategy you build. (Hint: check that the command line `MAfreq<-seq(10,250,by=20)` above does for example - but not only of course, the possibilities are endless) + +
+ +**Your Answers here:** +
+1. The code is segmented before a moving average is calculated. This information is then used to forecast future performance. + +2. Depending on what you change, the analysis could become more or less fine (if you change the code to segement the data more or less, respectively, as an example). This will change the returns. + +
+ +### A class competition + +Now you have seen how to develop some trading strategies that hedge funds have been using for centuries. Clearly this is only the very first step - as many of the online resources on technical indicators also suggest. Can you now explore more such strategies? How good a **futures trend following hedge fund strategy** can you develop? Let's call this.... a **class competition**! Explore as much as you can and report your best strategy as we move along the course... + +Here is for example something that can be achieved relatively easily... +
+ +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig.height=5,fig.width= 8, fig=TRUE} +load("data/sample_strategy.Rdata") +pnl_plot(sample_strategy) +``` + +
+ +Here is how this strategy has performed during this period. +
+
+ +```{r echo=FALSE, comment=NA, warning=FALSE, message=FALSE, results='asis'} +show_data = data.frame(cbind(rownames(pnl_matrix(sample_strategy)), round(pnl_matrix(sample_strategy),2))) +m1<-gvisTable(show_data,options=list(width=1220, height=min(400,27*(nrow(show_data)+1)),allowHTML=TRUE)) +print(m1,'chart') +``` + +
+
+ +**Finally**: One can develop (shiny based) interactive versions of this report and deploy them using `shinyapps::deployApp('ExerciseSet2.Rmd')` (you need a [shinyapps.io](https://www.shinyapps.io) account for this). This is for example an [interactive version of this exercise.](https://inseaddataanalytics.shinyapps.io/ExerciseSet2/) + +
+
+ +As always, **have fun** + + + + +