The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
This sounds all awesome. Is there an update on the status?
I have a working genetic optimization system here:
https://github.com/jameschch/LeanOptimization/tree/geneticSharp
This is based on the old LeanOptimization project, but I have made it much more configureable, ported it to a different genetic framework and enabled parallel backtests. This is not an official release as it requires some minor and subtle changes to Lean that may not be compatible with the master branch. As you imagine, work is ongoing and this is not the officially approved Lean optimization solution, it's just a tool some might find useful.
Great work, James. I've been playing with your LeanOptimization fork for a couple weeks now. A quick pointer for anyone else looking at toying with it: set a few conditions in your algo that Quit() early if things aren't looking like you want.
Quitting at a certain drawdown is a fairly obvious one. I am doing a fair amount of filtering of signals in the project I'm working on, which can lead to little-to-no trades being executed if the filter parameters aren't tuned properly, so I've added a scheduled check to ensure a minimum number have trades have been opened after a month or so of testing.
This can add quite a speed-up, especially early on in the optimization process before it begins honing in. Probably not a bad idea to add those even for non-optimizing backtests, but it never really crossed my mind until I was looking at running 10k tests in one go. I've seen up to a 50% reduction in the time required to get past the first few generations when I hadn't really figured out sane ranges for parameters yet.
One other thing I found helpful was tweaking the output to optimizer.txt so that it spits out things already CSV formatted -- it makes it a bit less tedious to dump results into R for further examination.
Unfortunately, the results I'm getting tend to be fairly brittle -- I think overfitting is a real concern on two fronts. First, you're only optimizing against a single backtest period. Being able to easily run the results of the top n results against an out-of-sample window would be nice. I think there may be a way to prepopulate genes which would take care of that, but I haven't looked into it much. The second thing I think could be helpful would be some sort of fuzzing on parameters, so you could determine if you've found something that is in a solid range or if you just happened to stumble on the perfect combination (I've had things that went from Sharpe around 2 to -1 when I changed a single int parameter by 1).
I'll think some more on all that, and send a PR your way if I come up with anything.
Really pleased you took the time to get this going. This is fairly rough code as I hadn't given much thought to general consumption.
I'd agree having the algorithm quit early speeds up the discovery especially where I'm using wide param ranges to discover a new alpha by brute force. Improperly tuned params is a general problem though, so am looking for different solutions.
I think over fitting is endemic to genetic approaches. The method I have been using to counteract this is to run optimisations separately on multiple periods. I then test these candidates against out of sample data and select the best. Another method is to run multiple optimisations but on the second run changing the in-sample period and predefining the gene with the results from the previous run. You can preloaded the first generation with fixed values by setting the "actual" for a gene in the file "optimisation.json".
It would also be possible to create a chromosome that comprises of two gene sets that target two different sample periods. The fitness is then a function of the Sharpe of these two periods, however, the average of a high with low Sharpe hides the overfit of the high. I have also thought about random selection of in-sample period but not progressed this.
I've tried a few different selections, crossovers and mutations from the genetic sharp library. The build includes the options I've found most suitable but am considering making these configurable.
Go ahead and suggest anything else that comes to mind.
I've been thinking further about your remarks on fuzzing, It would be possible to create an optimzation.json with a small value range and then run the optimizer for a few generations. If you see a large deviation in the sharpe, you will know the main proposition is flawed.
To automate this kind of thing we're getting into the domain of multiple optimization batches and a recursive/introspective execution model.
That is pretty much what I've been doing. I'll take the results, select a handful of the top ones that are sufficiently different from one another, and rerun each of them with a small range. When I find one that continues to be successful, I change the backtesting period and repeat. That seems to do a pretty good job of weeding out good and bad ideas (and implementations -- I have found a few bugs in my code that I don't know I would have spotted without results from thousands of runs readily at hand). I've also had a few cases where all but one parameter is pretty robust to changes, which has helped narrow my focus on areas that need improvement -- it was worth the cost of admission just for those things alone.
Yes, running 24000 backtests automatically is occasionally useful.
I've now settled on a stable build of the LEAN genetic optimizer and have added all the features that I had planned:
https://github.com/jameschch/LeanOptimization
Please feel free to provide any feedback if you get it running.
Any update on when this will be added?
Currently looking forward to this as well, because it's going to help me do what I'm already doing more effectively.
If you're able to get LEAN up and runnning on a local desktop or server(s), you have the option of using the LeanOptimization Genetic Optimizer that's available here:
https://github.com/jameschch/LeanOptimization
In order to run parallel backtests, a small change is required to the QC master build. I have made available a compatible branch of LEAN here:
https://github.com/jameschch/Lean/tree/optimizer
The initial setup and configuration should not be very time consuming, but I'm in the process of simplifying things.
Anyone interested in algorithm optimization would be welcome to make suggestions. I'm currently considering several new feature areas:
-different ways to calculate algorithm fitness scores (apart from Sharpe Ratio)
-improving processing times with machine learning classifiers
-zero code configuration installer
I am new to Lean and C#, can anyone show me how to use the lean optimization. I have downloaded leanoptimization but I dont know how to merge with the lean source code to compile the solution.
Hi KC chu, nice that you've taken interest in this. I think the first step is going to be getting familiar with Lean https://github.com/QuantConnect/Lean
The main developers of the project will respond here and on github and they're all very helpful. You might ease yourself in by developing some algorithms through the QC console before moving onto a local environment.
In terms of parameter optimization, I've recently been working on this and have provided a means to automate parallel execution of backtests with a genetic optimizer. This is available here:
https://github.com/jameschch/LeanOptimization
If you wish to enable parallel logging, you will need to modify your lean build as per this commit:
https://github.com/jameschch/Lean/commit/0115dabd16b0d6e88794b20f7415d76702349c6
If you're familiar with running algorithms in Lean, the optimizer should be fairly familiar. I would suggest you run the example algorithm first and then change the configuration to suit your needs:
https://github.com/jameschch/LeanOptimization/blob/master/Optimization/optimization.json
The supported config settings are documented here:
https://github.com/jameschch/LeanOptimization/blob/master/Optimization/OptimizerConfiguration.cs
This project is still being actively developed, so if you have a feature idea or encounter an issue, please submit your feedback through github. I think that currently, there is a fairly steep curve for a new user to get started with the optimizer, which is something I hope to address sometime soon.
James Smith - i am also interested in contributing, i am bit lost as to how this solution may be used as a robo-trader and/or exchange.
Is there documentation as to data flow and/or sequence diagrams including primary system interfaces?
I found one PDF..
I have tried the examples in the leanoptimization but unsuccessful. I get OutofMemeoryException at above 7th generation and no results are output.
The followings are recorded at the optimizer.txt
2017-03-30 23:26:00Z Algorithm: ParameterizedAlgorithm, Generation: 1, Fitness: 0, : -10
2017-03-30 23:26:09Z Algorithm: ParameterizedAlgorithm, Generation: 2, Fitness: 0, : -10
2017-03-30 23:26:21Z Algorithm: ParameterizedAlgorithm, Generation: 3, Fitness: 0, : -10
2017-03-30 23:26:35Z Algorithm: ParameterizedAlgorithm, Generation: 4, Fitness: 0, : -10
2017-03-30 23:26:51Z Algorithm: ParameterizedAlgorithm, Generation: 5, Fitness: 0, : -10
2017-03-30 23:27:04Z Algorithm: ParameterizedAlgorithm, Generation: 6, Fitness: 0, : -10
2017-03-30 23:27:21Z Algorithm: ParameterizedAlgorithm, Generation: 7, Fitness: 0, : -10
Can james provide a more precise step by step guide/screenshot showing how to config and compile the example?
@Kc chu
Its good you were able to execute: this means the referenced assemblies were built and found. It may be you have not configured the dataFolder or configPath in "optimization.json". These should be paths from the optimization executable to the relevant locations in your local lean clone. The configPath refers to the "config.json" which should be in the launcher bin folder. The dataFolder is the folder named data that holds the market hours, symbol properties and historical price data. This will be a relative path in the lean clone, but needs to be changed so that the relative path is from the optimization.exe bin folder.
If you get stuck you can run in debug mode and step from the optimizer into the lean classes and on into the algorithm. A simplified initial setup process and docs are one of the areas being worked on currently.
I guess you're talking about Lean in general: Jared would be the best person to help. The thing I'm referring to is a tool that allows you to optimize algorithm parameters with a genetic learning pattern.
Edit: last comment should have been @Larry Smith
Hi there,
Based in the awesome work made by James Smith and this tutorial, I implemented a project called LeanSTP, which is able to run multiple Lean instances in parallel.
The only significant difference respect to the work done by James is that the parallelization is outside the genetic algorithm environment, the output is saved as JSON and the Log file is saved for each run. The default folder is Public Documents
The core to configure the parameters lays in this method. In this the example, the algorithm runs some combinations of EMA periods and runs multiple instances of the ParameterizedAlgorithm.
But IMHO, optimizing backtests is a bad idea. Is relatively easy to find an optimal strategy, even for a pure random walk, and The higher the number of configurations tried, the greater is the probability that the backtest is overfit. I guess you can calibrate some parameters without hurting so much the out of sample performance.
Anyway, I hope someone find this helpful.
The papers you've highlighted are very interesting, and cross over into some of the research areas I've been tackling. I think the GeneticSharp library that my optimizer references uses a similar approach to the smart thread pool in order to achieve parallel backtests. It's useful to have a trimmed down execution model for just this task.
In terms of the concept of backtest overfitting in general, this is certainly a problem. I would not recommend trusting an optimal result from just in sample data: rather its advisable to take multiple in samples and/or validate against an out of sample set. It's for this purpose that I've recently completed work on an optimization Batcher, that will automate the process of walk forward testing across multiple periods.
On a more general note, I think whilst there is not a (proven) efficient alternative, backtesting against historical data produces the most useful results. The main area of concern for me is that Sharpe ratio alone does not sufficiently account for overfitting. It may be that a balanced compromise can be found in moderating Sharpe ratio by (for instance) the volatility of returns. This is one of the reasons that I'm trying optimizations against some of the other Lean statistics using the ConfiguredFitness class that's now been provided inthe optimizer project. It may be that customized fitness measures could be the the arena for genuine innovation in this field.
I like the use of “multiple out of sample” periods. Do you have any idea of how to implement it? Thinking out loud, I imagine the following scenario: there are many periods of out of sample data (OOSD), say, a random month for each year in the optimization period). Then every time you run an OOD test, a month is selected randomly, or maybe many. Maybe in that way we can avoid (in part) the problem of systematic use of the OOAD, that ends up in more overfitting by incorporating the OOSD as in sample data, but in a second stage.
I know backtests are useful, the point is not to fool yourself.
Respect to the statisitcs and the fitness formula, I can’t agree more, is the truly art of the whole process... and the secret sauce ;)
Its been quite clearly established that the singular Sharpe ratio and naive in-sample backtesting are probably a pestilence at the level of an epidemic.
My personal position is that the accepted terms in use already precludes the possibility of a "holy grail". We’re drawn to such illusions by the bewitching flaws of language and the beguiling form of mathematical notation.
Either you can build firmer foundations on this unsteady ground, or you can dig deeper and discard the entire existing edifice.
Is % profit not a good measure? I understand drawdown etc, but with forward-testing you've implemented, this would be my first choice. Money is the point right?
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Return can easily be misleading when an optimization fixates on a single rare event. This is less an issue when drawdown is taken into account.
Got it, thanks guys :-)
Any progress on running parallel backtest in the cloud?
I'm very new here and to trading in general, so please excuse my ignorance. Is this basically equivalent to MetaTrader4's ability to run a backtest with inputs set to a range (e.g. start, step, end) and run through each combination of that range to identify the settings with the best outcome? I was really hoping this was part of QC/Lean out of the box. Is running it on a local machine really that intensive that it should be done on a cloud instance instead? MT4's optimization didn't seem to be too slow as a desktop client. I know MT4 has a "genetic" optimization option that I haven't looked into yet. Does LeanOptimization not have a non-genetic optimization option? What am I missing?
James Candan, I'm kind of new to QuantConnect, but as far as I can tell, it's not too hard to run LEAN locally. The problem is that you'll have to constantly download new data if you want to be backtesting on current data.
Everyone, please correct me if I'm wrong.
^That's accurate. I've had to resort to doing some things locally (e.g. intraday charting) in cases where data is available for download.
Jared Broad you said `we're just finishing interactive brokers live trading and then will launch it :) ` back in March of 2015. Did this ever get pushed out? I'd love to be able to run a backtest with a range of variables and see which works the best.
of course, check the main website and scroll down
Michael Manus I meant the optimzation features, not the interactive brokers features... I don't see anything about that on the home page.
oh sorry I misunderstood you
i think it got delayed because of the many problems of overfitting strategies and more urgent things
but it seems james is still working on it when i check the commits on the optimiztion project
Its a work in progress; we recently changed our cloud architecture to make this possible so we'll be adding it in 2018 -- please feel free to follow along with the dedicated project here. Our general motto is bugs before features so it has been delayed slightly.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Noted Jared. If that matters, my opinion would be to put this feature in top priority from a business perspective.
Otherwise, I have a question concerning the following statement on the page https://www.quantconnect.com/lean/docs#topic14.html
"Optimization / Parameterized Algorithms
Variables can be denoted as parameters and set via configuration files on launch. This opens up the possibility for optimization of algorithms and searching parameter sets."
Is this true? Can we actually run parameter optimizations in local? If yes, could you update this documentation page to explain the process in more details.
Thank you. Cordially,
Benjamin.
"Can we actually run parameter optimizations in local"
Ben, I've provided some information regarding this previously in this thread.
Thanks Jared. I am being a bit lost between the different docs (QC Online / LEAN), what the open source and staff developers do, and the forum threads that can extend on very long timelines ;).
To end on a piece of wisdom, as simple backtesting in itself cannot show any statistical significance, there are currently no tools available to properly test an algorithmic strategy in the crypto asset universe (such as MC analysis provided by TradeStation: http://help.tradestation.com/09_01/tswfo/topics/monte_carlo_wfo.htm). I am looking forward to QC and other players to navigate towards such functionalities, and particularly to QC since you already support realistic brokerage models.
I meant James, not Jared. Not sure how we can edit a comment when it is submitted.
Any progress on this? I've downloaded Visual studio and the lean master and lean optimazation master. It would be nice if there was a clear walkthrough on how to get this installed and running. Once I have one of the folders open in VS I have no idea what to do. I have an algo written in python on quantconnect that I would like to convert into this optomizer to automatically backtest on about 3 variables so I dont have to manually do it once every week or month.
Thanks for any info you can help with
So I was able to open lean master and lean optimazation master. Where can i download pricing data to use for offline backtesting?
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can
continue your Boot Camp training progress from the terminal. We
hope to see you in the community soon!