Use our money to test your automated stock/FX/crypto trading strategies. All changes users make to our Python GitHub code are added to the repo, and then reflected in the live trading account that goes with it. You can also use that same code to trade with your own money. We have spent over 10 years developing automated trading strategies, and are open sourcing all of our programming (meaning it is free for anybody to use) in the hopes that users will help improve it. Here is some of what we have learned so far from doing automated trading:
- Backtests are great to use for developing strategies but are somewhat meaningless in predicting actual profits. The only way to know how well your strategy works is to trade it live.
- There are all sorts of issues that come up in live trading which are not reflected accurately in backtesting (or paper trading). For example, even if you do a good job estimating the brokerage fees and the bid/ask spread, they will almost always end up being worse in real trading. Limit orders can help with this slippage in live trading but are almost impossible to backtest because they would almost never get filled on the price the backtester shows.
- It is very hard to go from backtesting to live trading. Most backtesting systems do not support live trading, at least not without additional programming.
- Very few backtesting and trading programs support machine learning (using artificial intelligence to predict the price). They only allow you to create a strategy using the standard list of indicators (moving average, Bollinger bands, RSI, etc.).
- The best way to combine multiple strategies is to use machine learning (ML). ML automatically determines which strategies are best to use at what times, and can be easily retrained with new data. Otherwise, strategies that backtested well may work for a little while and then become old and stop working.
- High frequency trading (HFT) is only worth pursuing if you are trading millions of dollars and are willing to spend thousands of dollars a month on quant technologies. This is not something we are interested in doing.
There are 2 main ways to improve our existing ML trading algorithm:
- Increase the accuracy by trying new machine learning methods. For example, there are always new types of neural networks that could be tested, or new time series prediction libraries that could be used. That similar to how they do it in the contest at Numerai, but the problem with Numerai is that all of the data is anonymous and encrypted. You have no idea what financial instruments and indicators/features you are working with. So if you come up with something good, there is no easy way to apply it to real trading for yourself.
- Add new features to help the ML model learn better. Features can be new technical indicators or a complete strategy that gives buy and sell signals. The ML combines these new indicators and strategies with all of the existing ones, to improve on the model it already has. This is similar to the contests at Kaggle.com, but Kaggle rarely deals with stocks, and when they do it, it is still hard to apply the results to real trading.
- Backtester with stock, FX, and crypto data.
- 100+ indicators from TA-Lib (ta-lib.org) plus some we created ourselves, added as features. Also various time series features such as ARIMA, SARIMA, ARIMAX, and SARIMAX.
- Optimization of the 11,000+ indicator paramater combinations, using feature reduction to find which ones work best.
- Over 60 classification and regression algorithms and neural networks using our custom made AutoML program.
- ML model parameter optimization using Skopt, genetic algorithm, or exhaustive search.
- Voting ensembles and stacking of algorithms to acheive higher accuracy.
- Tests using evolutionary algorithms (such as NEAT) and genetic programming (such as gplearn).
- Tests using reinforcement learning algorithms such as policy gradient, Q-learning, evolution strategy, actor-critic, curiosity Q-learning, and neuroevolution with novelty search.
- Over a dozen dimensionality reduction techniques for selecting the best features, such as PCA, RandomTreesEmbedding, LDA, SelectKBest, SelectFromModel, SymbolicTransformer, GeneticSelectionCV, SNERBFSampler, RFECV, FastICA, Isomap, SpectralEmbedding, FeatureAgglomeration, and LocallyLinearEmbedding.
- Coming Soon - Over 1000 new features from various MetaTrader indicators and strategies. These can be used on anything, not just FX.
- Coming Soon - Live trading. We have the ability to trade live right now, but we want to make the algorithm more profitable before we do that.
Below are the accuracy results (classification) on unseen test data using 4 years of hourly EURUSD data, with a 75%/25% train/test split:
[coming soon]
And here's the backtest results (exit trade at the end of each 1 hour bar, no commissions/slippage) on the unseen test set, using XGBoost:
[coming soon]
Program Requirements:
Linux Server (we use Ubuntu 16.04)
Python 3.6
There are many other smaller packages that you will need also, see the requirements.txt file.
Our Programs Consists of 4 Main Python Notebooks:
Data Handling and Feature Generation - datamodel_dev.ipynb - Loads the raw price data, preprocesses it (drop NANs, normalization, etc.), balances the dataset (equal number of ups and downs), and adds new features (such as moving average, Bollinger Bands, time lags etc.).
Feature Selection - Uses dimensionality reduction (such as PCA) to find the most useful of the 11000+ features from the data file. This helps the ML learn better by letting it focus on what is really important, instead of distracting it with useless noise. And, it makes everything run much faster to use less data. We run over 20 different feature selection programs and then use XGBoost on the datasets they generate, to see which has the highest accuracy. This narrows it down from using over 11,000 features to only the top 100.
Algorithm Selection - Test 60 different ML algorithms on the data to see which one gives the highest accuracy.
HPO and Ensembles - Optimizes the parameters of the ML model (such as XGBoost) to increase accuracy. It then runs it 100 times to create an ensemble model, giving it more stability.
Note: The RL programming was done by Peter Chervenski (https://www.facebook.com/l.s.d.records), with assistance from Eric Borgos. The "Roadmap For The Future" part below is by Eric Borgos.
Suggested improvements for users to try:
- Use these programs to get new features:
TSFresh: https://github.com/blue-yonder/tsfresh
Cesium: https://github.com/cesium-ml/cesium
PyAF: https://github.com/antoinecarme/pyaf
TSLearn - https://github.com/rtavenar/tslearn
pyts: https://github.com/johannfaouzi/pyts
Time Series Feature Extraction Library: https://github.com/fraunhoferportugal/tsfel
Khiva: https://github.com/shapelets/khiva (C++) or https://github.com/shapelets/khiva-python (python bindings)
PyFTS: https://github.com/PYFTS/pyFTS
Genetic Discovert of Shapelets: https://github.com/IBCNServices/GENDIS
PyFlux: https://github.com/RJT1990/pyflux
Deep Learning for Time Series Classification: https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline
PMDARIMA: https://github.com/tgsmith61591/pmdarima
Stumpy: https://github.com/TDAmeritrade/stumpy
Feets: https://github.com/carpyncho/feets
Thalesians' Time Series Analysis (TSA) - https://github.com/thalesians/tsa (notebooks at https://github.com/thalesians/tsa/tree/master/src/jupyter/python) - Made specifically for stock time series analysis.
SLearn: https://github.com/mzoll/slearn
TimeNet: https://github.com/kirarenctaon/timenet
Signature Transforms: https://github.com/patrick-kidger/signatory
Also:
A) There is good code/info for ARCH, GARCH, and EGARCH: https://0xboz.github.io/blog/understand-and-model-cryptocurrencies-volatility-using-garch-variants/
B) If you need a faster version of Fourier Transform, see https://github.com/ShihuaHuang/Fast-Fourier-Transform
C) If you have not already used Dynamic Time Warping (DTW), here's info for it:
TSLearn - https://github.com/rtavenar/tslearn
pyts: https://github.com/johannfaouzi/pyts
See these articles about how it can be used in a nearest neighbors sort of way to find stock price patterns that are similar to the current one: https://systematicinvestor.wordpress.com/2012/01/20/time-series-matching-with-dynamic-time-warping/ and https://systematicinvestor.wordpress.com/2012/01/13/time-series-matching/ ).
Other DTW programs:
https://github.com/wannesm/dtaidistance
https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping
https://github.com/pierre-rouanet/dtw
https://github.com/fpetitjean/DBA - Try these probabilistic modeling programs:
A) https://github.com/tensorflow/probability (Includes Edward)
B) http://pyro.ai/ (From Uber Labs)
C) https://brancher.org (see their time series module)
D) Pomegranate - https://github.com/jmschrei/pomegranate
- More programs to get features/predictions from:
1. SKTime:
https://github.com/alan-turing-institute/sktime/blob/master/examples/time_series_classification.ipynb
https://github.com/alan-turing-institute/sktime/blob/master/examples/forecasting.ipynb
https://github.com/alan-turing-institute/sktime/blob/master/examples/shapelet_transform.ipynb
1.5 Skits - https://github.com/EthanRosenthal/skits - Scikit-inspired time series.
2. Stock Prediction AI - https://github.com/borisbanushev/stockpredictionai
3. Facebook Prophet at https://facebook.github.io/prophet/ . Also see https://github.com/CollinRooney12/htsprophet and https://github.com/advaitsave/Introduction-to-Time-Series-forecasting-Python/blob/master/Time%20Series%20in%20Python.ipynb .
4. Statsmodels: Try all the methods at https://github.com/statsmodels/statsmodels
5. Shapelets (MLP): https://github.com/mohaseeb/shaplets-python
6. 11 methods from https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
7. Deep Learning for Time Series Classification: https://github.com/hfawaz/dl-4-tsc - Various methods in Keras
8. ES-RNN - https://github.com/damitkwr/ESRNN-GPU (also maybe at https://github.com/M4Competition/M4-methods/tree/master/118%20-%20slaweks17 )
9. Keras: https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Series-Prediction
10. Keras: https://github.com/BenjiKCF/Neural-Net-with-Financial-Time-Series-Data
11. Keras LSTM Fully Convolutional Networks for Time Series Classification : https://github.com/titu1994/LSTM-FCN
11.5 https://github.com/titu1994/MLSTM-FCN
12. HMM-LSTM: https://github.com/JINGEWU/Stock-Market-Trend-Analysis-Using-HMM-LSTM
13. https://teddykoker.com/2019/06/trading-with-reinforcement-learning-in-python-part-ii-application/
14. https://www.reddit.com/r/algotrading/comments/bwmji0/stockmlcloud_ready_trading_toolbot_for/
15. OgmaNEO2: https://ogma.ai/2019/06/ogmaneo2-and-reinforcement-learning/
16. See all the methods listed in this blog posting - Using the latest advancements in deep learning to predict stock price movements - https://towardsdatascience.com/aifortrading-2edd6fac689d
There are some new indicators you can use (use any TA-Lib does not already have) at:
17. R-Transformer RNN: https://github.com/DSE-MSU/R-transformer
18. Temporal CNN: https://github.com/uchidalab/dtw-features-cnn
19. Temporal Causal Discovery Framework: https://github.com/M-Nauta/TCDF
20. Echo state networks:
https://github.com/lucapedrelli/DeepESN
https://github.com/kalekiu/easyesn
https://github.com/ahmedmdl/Dynamic_reservoir_keras (Keras version)
https://github.com/FilippoMB/Reservoir-Computing-framework-for-multivariate-time-series-classification
21. Time series models in Keras: https://github.com/vlawhern/arl-eegmodels
22. Unsupervised Time Series Method: https://github.com/White-Link/UnsupervisedScalableRepresentationLearningTimeSeries
23. Time series classification and clustering: https://github.com/alexminnaar/time-series-classification-and-clustering
24. Time series data augmentation: https://github.com/hfawaz/aaltd18 (Keras)
25. Time series anomaly detection: https://github.com/chickenbestlover/RNN-Time-series-Anomaly-Detection
26. Time series deep learning baseline: https://github.com/cauchyturing/UCR_Time_Series_Classification_Deep_Learning_Baseline
27. RNN autoencoder: https://github.com/RobRomijnders/AE_ts
28. GuonTS - https://github.com/awslabs/gluon-ts
29. Deep Time Series (Keras) - https://github.com/pipidog/DeepTimeSeries - RNN2Dense, Seq2Seq, Attention-Based, etc.
30. Dilated CNN with WaveNet - https://github.com/kristpapadopoulos/seriesnet
31. Keras Time Series Models: https://github.com/zhangxu0307/time-series-forecasting-keras - LSTM, GRU, RNN, MLP, SVR, ARIMA, time series decomposition
32. Pytorch time series models: https://github.com/zhangxu0307/time_series_forecasting_pytorch
33. SegLearn - https://github.com/dmbee/seglearn
34. Deep Learning Time Series - https://github.com/mb4310/Time-Series
35. Variational Autoencoder for Dimensionality Reduction of Time-Series - https://github.com/msmbuilder/vde
36. Singular Spectrum Analysis - https://github.com/kieferk/pymssa
37. Temporal Pattern Attention for Multivariate Time Series Forecasting - https://github.com/gantheory/TPA-LSTM
38. Dual-Stage Attention-Based Recurrent Neural Net for Time Series Prediction - https://github.com/Seanny123/da-rnn (blog posting about it at https://chandlerzuo.github.io/blog/2017/11/darnn )
39. Piecewise regression: https://github.com/DataDog/piecewise
40. Probabilistic Inference on Noisy Time Series (PINTS): https://github.com/pints-team/pints
41. Multivariate Anomaly Detection for Time Series Data with GANs - https://github.com/LiDan456/MAD-GANs
42. Wotan - https://github.com/hippke/wotan - Automagically remove trends from time-series data.
43. Nonlinear measures for dynamical systems (based on one-dimensional time series) - https://github.com/CSchoel/nolds
44. NOnLinear measures for Dynamical Systems (nolds) - https://github.com/CSchoel/nolds
45. LSTNet - https://github.com/Vsooong/pattern_recognize - Long- and Short-term Time-series network. Uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends.
46. Indicators to add, if they are not already in TA-Lib:
A) https://github.com/joosthoeks/jhTAlib/tree/master/jhtalib
B) https://github.com/peerchemist/finta (use code at https://github.com/peerchemist/finta/blob/master/finta/finta.py)
C) https://tulipindicators.org/benchmark (use the Python bindings at https://github.com/cirla/tulipy or "Tulip Indicators works well with C++. Just compile the C code; you shouldn't have any problems." )
D) https://github.com/kylejusticemagnuson/pyti
47. Programs Like Skope (https://github.com/scikit-learn-contrib/skope-rules):
https://github.com/meelgroup/MLIC
https://github.com/christophM/rulefit - Regression only, but we could still use that.
https://github.com/alienJohny/Rules-Extraction-from-sklearn-DecisionTreeClassifier (C++)
48. Deep Anomaly Detection: https://github.com/KDD-OpenSource/DeepADoTS
49. Deep4Cast: https://github.com/MSRDL/Deep4Cast - WaveNet based.
50. LSTM Variational autoencoder: https://github.com/Danyleb/Variational-Lstm-Autoencoder
51. Deep Neural Network Ensembles for Time Series Classification - https://github.com/hfawaz/ijcnn19ensemble
52. RobustSTL: https://github.com/LeeDoYup/RobustSTL - A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series
53. AntiCipy: https://github.com/sky-uk/anticipy
54. Autoencoders: https://github.com/hamaadshah/autoencoders_keras/blob/master/Python/autoencoders.ipynb (use the various autoencoders on the 2nd half of the page)
55. Use automatic feature engineering using GANs: https://github.com/hamaadshah/gan_public/blob/master/Python/gan.ipynb
56. Feature Engineering Wrapper (Few): https://github.com/lacava/few
57. LibFM (in Keras) - https://github.com/jfpuget/LibFM_in_Keras - He used this to generate new features in a Kaggle compeition.
58. Metric Learning - https://github.com/scikit-learn-contrib/metric-learn. Metric Learning is explained at http://metric-learn.github.io/metric-learn/introduction.html . Also, the code for many more advanced methods are listed at https://github.com/kdhht2334/Survey_of_Deep_Metric_Learning .
59. knnFeat - https://github.com/upura/knnFeat and https://github.com/momijiame/gokinjo - Knn feature extraction.
60. Surrogate Assisted Feature Extraction (SAFE) - https://github.com/ModelOriented/SAFE
61. Feature Stuff: https://github.com/hiflyin/Feature-Stuff
62. GAN-Keras: https://github.com/hamaadshah/gan_keras - Automatic feature engineering using Generative Adversarial Networks.
63. AutoFeat: https://github.com/cod3licious/autofeat - Linear Regression Model with Automated Feature Engineering and Selection Capabilities.
64. Various Keras NNs for stocks: https://github.com/rosdyana/Going-Deeper-with-Convolutional-Neural-Network-for-Stock-Market-Prediction
65. Temporal Pattern Attention for Multivariate Time Series Forecasting - https://github.com/gantheory/TPA-LSTM 66. WATTNet: https://github.com/Zymrael/wattnet-fx-trading - Learning to Trade FX with Hierarchical Spatio-Temporal Representations of Highly Multivariate Time Series
67. TabNet: https://github.com/titu1994/tf-TabNet
Also, see this big study which shows the simple models such as ARIMA are better than NNs for time series prediction: https://machinelearningmastery.com/findings-comparing-classical-and-machine-learning-methods-for-time-series-forecasting/
- Some strategies to consider adding, as features:
A) The ML version of their crypto strategy at https://github.com/achmand/ari5123_assignment/blob/master/src/algo_trading.ipynb shows big profits: https://github.com/achmand/ari5123_assignment/blob/master/src/algo_trading.ipynb What is most interesting is that in their backtests, using XGBoost with 7 features (such as moving average) did much better than using the typical strategy of trading each of those indicators individually. ML was able to take a group of not too profitable indicators, and use them in a smart way to make money.
B) RL Bitcoin trading with a profit - https://github.com/teddykoker/blog/blob/master/notebooks/trading-with-reinforcement-learning-in-python-part-two-application.ipynb Also read the first part of his article, about "gradient ascent", at https://github.com/teddykoker/blog/blob/master/notebooks/trading-with-reinforcement-learning-in-python-part-one-gradient-ascent.ipynb
C) Ernie Chan's Mean Reversion - https://github.com/teddykoker/blog/blob/master/notebooks/cross-sectional-mean-reversion-strategy-in-python-with-backtrader.ipynb
D) Andrew Clenow's Momentum - https://github.com/teddykoker/blog/blob/master/notebooks/momentum-strategy-from-stocks-on-the-move-in-python.ipynb
E) B and C above could also be tested with shorter bars.
F) Futures Momentum Investing - https://www.linkedin.com/pulse/implement-cta-less-than-10-lines-code-thomas-schmelzer/ - I assume this could apply to stocks/fx/crypto
G) Seahorse: https://github.com/fwu03/ML_Stock_Trading-Seahorse
H) Here's a pair trading strategy in Java: https://github.com/lukstei/trading-backtest
I) Another pair trading strategy: https://github.com/oskarringstrom/StatArbProject
J) There is a profitable RL strategy at https://github.com/sachink2010/AutomatedStockTrading-DeepQ-Learning/blob/master/Trading.ipynb
K) RL Trading Bot: https://github.com/ai-portfolio/deep_reinforcement_learning_stock_trading_bot/blob/master/deep_reinforcement_learning_stock_trading_bot.ipynb
L) RL Portfolio Management - https://github.com/ZhengyaoJiang/PGPortfolio - The paper they reference is really good.
M) Deep RL Trading - https://github.com/golsun/deep-RL-trading - The paper that goes with this has some great info about using a CNN vs RNN.
N) RL Options Trading - https://github.com/randli/Optimal-Trading-Price - They did not upload their code, but their 1 page PDF summary paper is interesting.
O) Personae: https://github.com/Ceruleanacg/Personae P) Free-Lunch Saliency via Attention in Atari Agents - https://github.com/dniku/free-lunch-saliency Q) RL Algos - https://github.com/xlnwel/model-free-algorithms
- Add all the indicators from these libraries that we don't already have from TA-Lib:
https://github.com/peerchemist/finta (use code at https://github.com/peerchemist/finta/blob/master/finta/finta.py)
https://tulipindicators.org/benchmark (use the Python bindings at https://github.com/cirla/tulipy or "Tulip Indicators works well with C++. Just compile the C code; you shouldn't have any problems." )
https://github.com/kylejusticemagnuson/pyti
- Use Featuretools.com to generate new features:
FeatureTools Methods see https://github.com/Featuretools/featuretools and https://www.kdnuggets.com/2018/02/deep-feature-synthesis-automated-feature-engineering.html):
Create Entities: https://docs.featuretools.com/loading_data/using_entitysets.html - A parent identity made from the relationship between child identities.
Feature Primatives: TimeSincePrevious, Mean, Max, Min, Std, Skew
Aggregation primitives: These primitives take related instances as an input and output a single value. They are applied across a parent-child relationship in an entity set. E.g: Count, Sum, AvgTimeBetween
Transform primitives: These primitives take one or more variables from an entity as an input and output a new variable for that entity. They are applied to a single entity. E.g: Hour, TimeSincePrevious, Absolute.
Deep Feature Synthesis: https://docs.featuretools.com/automated_feature_engineering/afe.html - Stacking features to create new features.
To parallize featuretools, see https://medium.com/feature-labs-engineering/scaling-featuretools-with-dask-ce46f9774c7d and https://docs.featuretools.com/guides/parallel.html
- Unsupervised Feature Extraction and Reduction:
PyDeep (https://github.com/MelJan/PyDeep - A machine learning / deep learning library with focus on unsupervised learning. Has 25 different methods (PCA, ICA, etc.),)
Fastknn - https://github.com/davpinto/fastknn - Unlike normal knn, this has a command knnExtract() that extracts features from the data, Kaggle style.
Scikit-Learn Unsupervised: http://scikit-learn.org/stable/unsupervised_learning.html - Gaussian mixture models, manifold learning, clustering, and decomposing signals in components (matrix factorization problems)
Ladder Networks: http://bair.berkeley.edu/blog/2018/01/23/kernels/ Code: https://github.com/search?l=Python&q=ladder+network&type=Repositories&utf8=%E2%9C%93
Robust Continuous Clustering - https://github.com/yhenon/pyrcc
SOINN(Self-Organizing Incremental Neural Network) - https://github.com/fukatani/soinn
Self Organizing Maps - https://github.com/search?p=1&q=%22Self+Organizing+Map%22&type=Repositories&utf8=%E2%9C%93
Paysage - https://github.com/drckf/paysage - A library for unsuperised learning and probabilistic generative models. Bernoulli Restricted Boltzmann Machines, Gaussian Restricted Boltzmann Machines, Hopfield Models. Using advanced mean field and Markov Chain Monte Carlo methods.
Parametric t-SNE: https://github.com/search?utf8=%E2%9C%93&q=Parametric+t-SNE&type=
Largevis: https://github.com/ml4me/largevis
Feature Extraction: https://github.com/search?l=Python&q="feature+extraction"&type=Repositories&utf8=? Unsupervised Learning by Predicting Noise - https://arxiv.org/abs/1704.05310 Code: https://github.com/search?l=Python&q=%22Unsupervised+Learning+by+Predicting+Noise%22&type=Repositories&utf8=%E2%9C%93
Clustering Algorithms - https://github.com/search?l=Python&q=%22clustering+algorithms%22&type=Repositories&utf8=%E2%9C%93
Gap Statistic: https://github.com/milesgranger/gap_statistic
Neural Clustering: Concatenating Layers for Better Projections - https://openreview.net/forum?id=r1PyAP4Yl
Unsupervised Learning on Neural Network Outputs - https://github.com/yaolubrain/ULNNO
K-medoids clustering algorithm with NEAT: http://blog.otoro.net/2015/08/23/k-medoids-clustering-algorithm/ Code: https://github.com/search?l=Python&q=k-medoids&type=Repositories&utf8=%E2%9C%93
Autoencoder Trees: Paper: https://www.cmpe.boun.edu.tr/~ethem/files/papers/Ozan_Neurocomp.pdf Code: https://github.com/gionuno/autoencoder_trees
Denoising Autoencoder: https://github.com/AdilBaaj/unsupervised-image-retrieval
Stacked Denoising Autoencoder: https://www.researchgate.net/publication/285392625_Stacked_Denoise_Autoencoder_Based_Feature_Extraction_and_Classification_for_Hyperspectral_Images Keras Code: https://github.com/madhumita-git/SDAE
Non-negative Matrix Factorization: https://github.com/search?l=Python&q=%22non-negative+matrix+factorization%22&type=Repositories&utf8=%E2%9C%93
Group Factor Analysis: https://github.com/mladv15/gfa-python
SVDD: https://github.com/sdvermillion/pySVDD and https://github.com/search?l=Python&q=SVDD&type=Repositories&utf8=%E2%9C%93
Kinetic PCA: https://github.com/alexandrudaia/NumeraiExperiments/blob/master/kineticPcaNumerai.ipynb
TFeat - https://github.com/vbalnt/tfeat (based on http://www.bmva.org/bmvc/2016/papers/paper119/paper119.pdf )
UMap: https://github.com/lmcinnes/umap
- More Feature Exctraction Methods:
Interaction Features: http://www.ultravioletanalytics.com/blog/kaggle-titantic-competition-part-v-interaction-variables - Adding/multiplying/dividing/subtracting each of the existing features with each other. Scikit-learn has this ability with "PolynomialFeatures", see https://chrisalbon.com/machine_learning/linear_regression/create_interaction_features/ .
Derived Variables: http://www.ultravioletanalytics.com/blog/kaggle-titanic-competition-part-iv-derived-variables - Based on Name, Cabin, Ticket #.
Variable Transformations - http://www.ultravioletanalytics.com/blog/kaggle-titanic-competition-part-iii-variable-transformations - Dummy Variables, Factorizing, Scaling, Binning
NN Feature Extraction - https://github.com/tomrunia/TF_FeatureExtraction - Gets features from VGG, ResNet, Inception.
Feature Engineering - Use all the methods in Scikit-Learn's preprocessing module at http://scikit-learn.org/stable/modules/preprocessing.html and also at https://www.slideshare.net/gabrielspmoreira/feature-engineering-getting-most-out-of-data-for-predictive-models) - Binarizing, Rounding, Binning, Quantiles, Log Transform, Scaling (Min-Max, Standard Z), Normalization, Rounding, Polynomial Features, Feature Hashing, Bin-counting, LabelCount Encoding, Category Embedding, etc. A good explanation of it all is also at https://github.com/bobbbbbi/Machine-learning-Feature-engineering-techniques/blob/master/python%20feature%20engineering.pdf .
XAM: https://github.com/MaxHalford/xam - Binning, Combining features, Groupby transformer, Likelihood encoding, Resampling, etc.
Exponential Moving Average of the Weights (in Keras): https://gist.github.com/soheilb/c5bf0ba7197caa095acfcb69744df756
Categorical Interaction Features -http://blog.kaggle.com/2017/02/27/allstate-claims-severity-competition-2nd-place-winners-interview-alexey-noskov/ - "...the last trick I used was forming categorical interaction features, applying lexical encoding to them. These combinations may be easily extracted from XGBoost models by just trying the most important categorical features, or better, analysing the model dump with the excellent Xgbfi tool."
FeatureFu (https://github.com/linkedin/FeatureFu) does thing like Feature normalization, Feature combination, Nonlinear featurization, Cascading modeling, Model combination
CopperSmith: https://github.com/CommBank/coppersmith
Feng: https://github.com/mewwts/feng
Faegen: https://github.com/ianlini/feagen
Ratios of one feature to another, like at https://www.kaggle.com/sudalairajkumar/feature-engineering-validation-strategy
Locally weighted bagging: https://maxhalford.github.io/blog/locally-weighted-bagging/
Prince: https://github.com/MaxHalford/prince - Does PCA, Correspondance Analysis (CA), Multiple Correspondance Analysis (MCA)
Self-Normalizing Neural Networks: https://github.com/atulshanbhag/Self-Normalizing-Neural-Networks-SNN-
Create new features like at https://www.datacamp.com/community/tutorials/feature-engineering-kaggle
Pairwise Interactions: https://medium.com/jim-fleming/notes-on-the-numerai-ml-competition-14e3d42c19f3 (code at https://github.com/jimfleming/numerai ) - 'given features from two samples predict which of the two had a greater probability of being classified as '1'.'
t-SNE Multiple Runs - https://medium.com/jim-fleming/notes-on-the-numerai-ml-competition-14e3d42c19f3 (code at https://github.com/jimfleming/numerai ) - 'Since t-SNE is stochastic, multiple runs will produce different embeddings. To exploit this I will run t-SNE 5 or 6 times at different perplexities and dimensions (2D and 3D) then incorporate these extra features. Now the validation loss is 0.68839 (-0.65% from baseline).'
- More Autoencoders to try:
See https://towardsdatascience.com/autoencoders-for-the-compression-of-stock-market-data-28e8c1a2da3e
also Try Wavenet:
https://github.com/PhilippeNguyen/keras_wavenet
https://github.com/PyWavelets/pywt
https://github.com/kykosic/WaveNet-BTC
- Things To Try In Keras:
1) LearningRateScheduler with step decay schedule: https://gist.github.com/jeremyjordan/86398d7c05c02396c24661baa4c88165
2) Cyclical Learning Rate - https://github.com/leaprovenzano/cyclical_lr_keras
3) AdamW - http://34.230.249.160:8888/notebooks/new/automl/gsketch-resnet50-128x128-AdamW.ipynb (using the code from AdamW: https://github.com/GLambard/AdamW_Keras)
4) SGDR - http://34.230.249.160:8888/notebooks/new/automl/gsketch-kerasnn-sgdr-128x128.ipynb or http://34.230.249.160:8888/notebooks/deepneat/reptile-test-SGDR.ipynb
5) One Cycle Learning Rate Policy for Keras - https://github.com/titu1994/keras-one-cycle
6) Optimal Learning Rate Finder: https://gist.github.com/jeremyjordan/ac0229abd4b2b7000aca1643e88e0f02 and https://github.com/metachi/fastaiv2keras (from https://towardsdatascience.com/estimating-optimal-learning-rate-for-a-deep-neural-network-ce32f2556ce0).
7) Add Gradient Noise - https://github.com/cpury/keras_gradient_noise
8) Stochastic Weight Averaging - https://github.com/kristpapadopoulos/keras_callbacks
9) Variants of RMSProp and Adagrad - https://github.com/mmahesh/variants_of_rmsprop_and_adagrad . Also Nadamax, Radamax, AdamDelta: https://github.co
10) Pyramid Pooling Layers: https://github.com/yhenon/keras-spp
11) One Hundred Layers Tiramisu - https://github.com/0bserver07/One-Hundred-Layers-Tiramisu and https://github.com/junjungoal/Tiramisu-keras
12) Neural Tensor Layer - https://github.com/dapurv5/keras-neural-tensor-layer
13) Coordconv - https://github.com/titu1994/keras-coordconv
14) RBF Layer - https://github.com/PetraVidnerova/rbf_keras
15) Mixture Network Desnity Layer: https://github.com/cpmpercussion/keras-mdn-layer
16) Position Embedding Layers: https://github.com/CyberZHG/keras-pos-embd
17) Mixture of Experts: https://github.com/eminorhan/mixture-of-experts
18) Multi Head - https://github.com/CyberZHG/keras-multi-head
19) Spectral Normalization: https://github.com/IShengFang/SpectralNormalizationKeras
20) Gradient Reversal: https://github.com/michetonu/gradient_reversal_keras_tf
21) Keras learning rate callbacks: https://github.com/Drxan/DNN_Learning_Rate/blob/master/lr_callbacks.py
- Dynamic Ensemble Selection (DES)
I tried it for stocks data once and it did not give good results but maybe somebody else will can have better luck with it. I have it working at http://34.230.249.160:8888/notebooks/DESlib/examples/Notebooks_examples/Example_Eric-Copy1.ipynb and also using multiple base classifiers at http://34.230.249.160:8888/notebooks/DESlib/examples/Notebooks_examples/Example_Eric-Copy2.ipynb but there are other ways doing it at http://34.230.249.160:8888/tree/DESlib/examples/Notebooks_examples
DES Code: https://github.com/Menelau/DESlib
DES Papers:
https://arxiv.org/abs/1802.04967
https://arxiv.org/pdf/1509.00825.pdf
https://arxiv.org/abs/1804.07882
DES Manual: https://deslib.readthedocs.io/en/latest/
How they deal with splitting the data is explained well at https://arxiv.org/pdf/1509.00825.pdf
DES does not use diverse ensembles like voting and stacking, because the goal is to have each classifier be an expert at different parts of the dataset, so using an ensemble of diverse classifiers would defeat that purpose. It does use bagging as explained at https://arxiv.org/abs/1804.07882
- More Genetic Programming (GP) programs to try:
1. Finish Vita - https://github.com/morinim/vita/wiki/features (http://34.230.249.160:8888/tree/new/automl/earthquakes2019/testvita)
Also, see the info about running Vita in parallel: morinim/vita#21
C++ 14 can run multicore natively (see https://www.bfilipek.com/2014/01/tasks-with-stdfuture-and-stdasync.html), or you could use easyLambda: https://github.com/haptork/easyLambda
Plus Vita has around 15 algorithms, so those (and multiple trials) could be run multicore instead of parallelizing Vita directly.
2. Glyph: And there is also this, from https://github.com/Ambrosys/glyph/blob/master/glyph/gp/algorithms.py )/ def make_unique_version(obj):
Takes an algorithm class and creates a sublcass with a modified evolve method.
The modified version will ensures uniqueness of individuals.
If it gives good results, later we can parallelize it:
Dask example: https://github.com/Ambrosys/glyph/blob/master/examples/control/dask_app.py
and also at https://github.com/Ambrosys/glyph/blob/master/examples/control/joblib_app.py
3. Cartesian Generic Programming: https://github.com/shinjikato/cartesian_genetic_programming
4. Multiple Regression GP: Code: https://github.com/prasanna-rn/multiRegresssionGP Explanation: https://flexgp.github.io/gp-learners/mrgp.html
In a paper I read compring various GP methods (not the one that goes with this code), this method won. "Improves the program evaluation process by performing multiple regression on subexpressions of the solution functions. Instead of evaluating the fitness of each individual solution as a whole, MRGP decouples its mathematical expression tree into subtrees. The fitness of the solution is evaluated based on the best linear combination of these subtree structures."
5. A Greedy Search Tree Heuristic for Symbolic Regression - Python Code: https://github.com/folivetti/ITSR Paper: https://arxiv.org/abs/1801.01807
6. Geometric Semantic Genetic Programming - Paper: https://arxiv.org/abs/1804.06808 Code: https://github.com/laic-ufmg/GSGP-Red (Java) plus also https://github.com/search?q=Geometric+Semantic+Genetic+Programming
it says "Advances in Geometric Semantic Genetic Programming (GSGP) have shown that this variant of Genetic Programming (GP) reaches better results than its predecessor for supervised machine learning problems, particularly in the task of symbolic regression. However, by construction, the geometric semantic crossover operator generates individuals that grow exponentially with the number of generations, resulting in solutions with limited use. This paper presents a new method for individual simplification named GSGP with Reduced trees (GSGP-Red)."
7. Differential Evolution - Vita and XGP already have this
XGP is at https://maxhalford.github.io/xgp/ and is powered by https://github.com/MaxHalford/eaopt . I already have XGP working at http://34.230.249.160:8888/notebooks/new/automl/earthquakes2019/earthquakes-eric.ipynb ). For info about DE see https://nathanrooy.github.io/posts/2017-08-27/simple-differential-evolution-with-python/
Code: https://github.com/nathanrooy/differential-evolution-optimization-with-python and https://github.com/search?l=Python&q=Differential++evolution&type=Repositories
More DE Info: http://107.167.189.191/~piak/teaching/ec/ec2012/das-de-sota-2011.pdf
Also see this related NAS paper: A Hybrid Differential Evolution Approach toDesigning Deep Convolutional Neural Networks for Image Classification - https://arxiv.org/pdf/1808.06661.pdf
8. Self-adaptation of Genetic Operators Through Genetic Programming Techniques
Paper: https://arxiv.org/abs/1712.06070
Code: https://github.com/afcruzs/AOEA
9. Grammar Variational Autoencoder: https://github.com/search?q=Grammar+Variational+Autoencoder
10. Mixed-Integer Non-Linear Programming - Code: https://github.com/minotaur-solver/minotaur Paper: https://arxiv.org/pdf/1710.10720.pdf
11. P-Tree Programming - Code: https://github.com/coesch/ptree Paper: https://arxiv.org/pdf/1707.03744.pdf
12. Strongly Typed GP: https://deap.readthedocs.io/en/master/examples/gp_spambase.html (see explanation at https://deap.readthedocs.io/en/master/tutorials/advanced/gp.html#strongly-typed-gp).
13. https://github.com/Decadz/Genetic-Programming-for-Symbolic-Regression - This work uses the Rademacher complexity and incorporates it into the fitness function of GP, utilizing it as a means of controlling the functional complexity of GP individuals.
14. Take a quick look at these GP programs, in case they do something the others we already have do not do:
https://github.com/marcovirgolin/GP-GOMEA
https://github.com/Ambrosys/gpcxx (see the symbolic regression example at http://gpcxx.com/doc/gpcxx/tutorial.html#gpcxx.tutorial.symbolic_regression )
https://github.com/LLNL/SoRa
https://github.com/marcovirgolin/GP-GOMEA
https://github.com/degski/CGPPP
https://github.com/gchoinka/gpm
https://github.com/kpot/cartgp
https://github.com/AndrewJamesTurner/CGP-Library
https://github.com/kstaats/karoo_gp
https://github.com/ViktorWase/Cgpy
https://github.com/Jarino/cgp-wrapper
- Stacking/Ensemble Programs To Try:
StackNet - https://github.com/h2oai/pystacknet
Random Rotation Ensembles: https://github.com/tmadl/sklearn-random-rotation-ensembles
Simple and Scalable Predictive Uncertainty estimation using Deep Ensembles: https://github.com/vvanirudh/deep-ensembles-uncertainty
Coupled Ensembles of Neural Networks: https://github.com/vabh/coupled_ensembles
Deep Incremental Boosting: https://arxiv.org/pdf/1708.03704.pdf Code is part of https://github.com/nitbix/toupee (a Keras ensembler)
Stack Keras on top of scikit learn model: https://hergott.github.io/deep-learning-model-augmentation/
- New types of neural networks and algorithms to try:
List of Keras Classification Models: https://github.com/titu1994/Keras-Classification-Models Capsule Networks: https://github.com/XifengGuo/CapsNet-Fashion-MNIST and and https://github.com/shinseung428/CapsNet_Tensorflow and https://github.com/naturomics/CapsNet-Tensorflow and https://github.com/Sarasra/models/tree/master/research/capsules (https://arxiv.org/abs/1710.09829)
DenseNet: https://github.com/titu1994/DenseNet (https://arxiv.org/pdf/1608.06993v3.pdf)
Highway Networks: https://github.com/trangptm/HighwayNetwork (https://arxiv.org/abs/1505.00387)
CliqueNet: https://github.com/iboing/CliqueNet (https://arxiv.org/abs/1802.10419)
Equilibrium Propagation: https://github.com/StephanGrzelkowski/EquilibriumPropagation (https://www.frontiersin.org/articles/10.3389/fncom.2017.00024/full)
Session-based Recommendations With Recurrent Neural Networks (GRU4Rec): https://github.com/hidasib/GRU4Rec or this improved PyTorch version: https://github.com/yhs-968/pyGRU4REC (https://arxiv.org/pdf/1511.06939.pdf)
Siamese Networks: https://github.com/sorenbouma/keras-oneshot (https://www.cs.cmu.edu/%7Ersalakhu/papers/oneshot1.pdf)
Nested LSTMs: https://github.com/titu1994/Nested-LSTM (https://arxiv.org/abs/1801.10308)
E-=Swish activation function: https://github.com/EricAlcaide/E-swish (https://arxiv.org/abs/1801.07145v1)
Energy Prederving Neural Networks: https://github.com/akanimax/energy-preserving-neural-network (https://github.com/akanimax/energy-preserving-neural-network)
Training Neural Networks Without Gradients: A Scalable ADMM Approach - https://github.com/dongzhuoyao/admm_nn (https://arxiv.org/abs/1605.02026)
Inverse Compositional Spatial Transformer Networks: https://github.com/chenhsuanlin/inverse-compositional-STN
Ladder Networks: http://bair.berkeley.edu/blog/2018/01/23/kernels/ Code: https://github.com/search?l=Python&q=ladder+network&type=Repositories&utf8=%E2%9C%93
RWA: https://gist.github.com/shamatar/55b804cf62b8ee0fa23efdb3ea5a4701 - Machine Learning on Sequential Data Using a Recurrent Weighted Average - https://arxiv.org/abs/1703.01253
Neural Turing Machines - Now officially part of Tensorflow, see https://www.scss.tcd.ie/joeran.beel/blog/2019/05/25/google-integrates-our-neural-turing-machine-implementation-in-tensorflow/
ELM: https://github.com/dclambert/Python-ELM
PyDLM Bayesian dynamic linear model: https://github.com/wwrechard/pydlm - A python library for Bayesian dynamic linear models for time series data.
Symbolic Aggregate Approximation - https://github.com/nphoff/saxpy - See stock trading paper at at https://www.researchgate.net/publication/275235449_A_Stock_Trading_Recommender_System_Based_on_Temporal_Association_Rule_Mining
Hidden Markov Models: https://github.com/hmmlearn/hmmlearn (see stock example at http://hmmlearn.readthedocs.io/en/latest/auto_examples/plot_hmm_stock_analysis.html#sphx-glr-auto-examples-plot-hmm-stock-analysis-py ). Also see https://github.com/larsmans/seqlearn
Neural Decision Forests: https://github.com/jingxil/Neural-Decision-Forests
Deep Generative Models: https://github.com/AtreyaSh/deep-generative-models
PyGAM: https://github.com/dswah/pyGAM - Generalized Additive Models in Python.
ANFIS - https://github.com/twmeggs/anfis - Adaptive neuro fuzzy inference system
Particle Swarm Optimization: https://github.com/ljvmiranda921/pyswarms
Various Keras NNs for stocks:
Deep CNN: https://github.com/rosdyana/Going-Deeper-with-Convolutional-Neural-Network-for-Stock-Market-Prediction
Meta learning - https://github.com/seba-1511/pytorch-hacks - MAML, FOMAML, MetaSGD
Contextual Bandits - https://devpost.com/software/unsupervised-representation-learning-for-contextual-bandits
TorchBrain - https://github.com/shoaibahmed/torchbrain - Spiking Neural Networks.
Continual Learning: https://github.com/GMvandeVen/continual-learning
- Try one or both of these 2 programs for dealing with outliers/anomalies:
Try one or both of these 2 programs for dealing with outliers/anomalies:
https://github.com/yzhao062/pyod
https://github.com/hendrycks/outlier-exposure
- Order Book Imbalances:
There is good potential to make money from order book imbalances, in both Bitcoin and stocks/FX. See this Github program, which is old but uses no nns, only sklearn, so the age should not matter much:
https://github.com/rorysroes/SGX-Full-OrderBook-Tick-Data-Trading-Strategy
Also, Alpaca (https://alpaca.markets/), the commission free stock trading brokerage/API, has demo code for something similar for stocks at https://github.com/alpacahq/example-hftish , using only Numpy and Pandas. Their link to the paper they based it on is broken, but you can read that paper at https://www.palmislandtraders.com/econ136/hftois.pdf. And there are other similar papers on this topic such as:
https://arxiv.org/pdf/1809.01506.pdf
https://core.ac.uk/download/pdf/146502703.pdf
http://www.smallake.kr/wp-content/uploads/2015/11/SSRN-id2668277.pdf
There are also these programs for it:
https://github.com/durdenclub/Algorithmic-Trading
https://github.com/timothyyu/gdax-orderbook-ml
- News Trading - Sentiment Analysis:
Kaggle Two Sigma: https://github.com/silvernine209/stock_price_prediction
Sentiment Analysis for Event-Driven Stock Prediction - https://github.com/WayneDW/Sentiment-Analysis-in-Event-Driven-Stock-Price-Movement-Prediction
Stocksight - https://github.com/shirosaidev/stocksight
https://github.com/Avhirup/Stock-Market-Prediction-Challenge/blob/master/Predicting%20Stock%20Prices%20Challenge.ipynb
https://www.dlology.com/blog/simple-stock-sentiment-analysis-with-news-data-in-keras/
https://github.com/jasonyip184/StockSentimentTrading/blob/master/Stock%20Sentiment%20Algo%20Trading.ipynb
https://github.com/Sadden/PredictStock/blob/master/PredictBasedOnNews.ipynb
Bert for Sentiment analysis: https://github.com/search?q=bert+sentiment+analysis&type=Repositories
ERNIE: https://github.com/PaddlePaddle/ERNIE
XLNet: https://github.com/zihangdai/xlnet
Sources of news and sentiment:
https://api.tiingo.com/about/pricing
https://newsapi.org
https://rapidapi.com/Nuzzel/api/news-search/pricing
https://cloud.benzinga.com/cloud-product/bz-newswires/
https://www.decisionforest.com
http://sentdex.com
https://psychsignal.com/
- Reinforcement Learning (RL):
TensorTrade: https://github.com/notadamking/tensortrade
RL Trading - https://www.reddit.com/r/algotrading/comments/cjuzh7/how_to_create_a_concurrent_stochastic/and more comments at https://www.reddit.com/r/reinforcementlearning/comments/cjutsr/how_to_create_a_concurrent_stochastic/
RL Trading with Ray: https://github.com/Draichi/cryptocurrency_algotrading
Additional Frameworks To Add (we are already using Stable Baselines at https://github.com/hill-a/stable-baselines):
Coach: https://github.com/NervanaSystems/coach
Ray: https://ray.readthedocs.io/en/latest/rllib-algorithms.html
RL Kit: https://github.com/vitchyr/rlkit
MediPixel: https://github.com/medipixel/rl_algorithms - For TD3 and Rainbow IQN
PARL: https://github.com/PaddlePaddle/PARL - Parallel versions of algos
Meta Learning for RL:
https://github.com/tristandeleu/pytorch-maml-rl
https://github.com/navneet-nmk/Hierarchical-Meta-Reinforcement-Learning
https://github.com/quantumiracle/Benchmark-Efficient-Reinforcement-Learning-with-Demonstrations/tree/master/MAML
Reptile for RL (maybe?): https://github.com/alok/rl_implementations/tree/master/reptile
Evolution-Guided Policy Gradient in Reinforcement Learning: Code: https://github.com/ShawK91/erl_paper_nips18 Paper: https://arxiv.org/abs/1805.07917
Tsallis: Code: https://github.com/kyungjaelee/tsallis_actor_critic_mujoco Paper: https://arxiv.org/abs/1902.00137
Some unusual RL Algos:
https://github.com/WilsonWangTHU/mbbl
and
https://github.com/dennybritz/reinforcement-learning
Has good Tensorflow versions of RL agents: https://github.com/tensorflow/agents
Uber Research Programs: https://github.com/uber-research
Selective memory: https://github.com/FitMachineLearning/FitML/tree/master/SelectiveMemory
Random Network Distillation: https://github.com/search?q=Random+Network+Distillation
RL Graph (uses Ray): https://github.com/rlgraph/rlgraph
Huskarl: https://github.com/danaugrs/huskarl
Also see these RL trading frameworks:
https://github.com/notadamking/tensortrade
https://medium.com/@kevinhill_96608/how-to-create-a-concurrent-and-parallel-stochastic-reinforcement-learning-environment-for-crypto-3756d78b7a8e
https://github.com/Kismuz/btgym
- Misc.:
AdaBound (PyTorch) optimizer - Like Adam
Daily Forex Patterns
Neuro-Evolution Trading Bot (in Keras)
Gluon Time Series
NeuroEvolution Trader (Tensorflow)
Somebody in the RL Trader discussion group suggested using https://github.com/vaexio/vaex Also, there is this: Pystore - Fast Data Store for Pandas Time-Series Data - https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2
EA Trader: https://github.com/EA31337/EA31337 - Open source FX trading robot for MT4/MT5
Click Here to see what we offer on GitHub.
If you have any questions, email eric@impulsecorp