I left basic out of the title this time...check out the last post on getting started with TensorFlow and deep learning on QC!

What if we wanted to code trading systems without ANY code, and evolve populations of algorithms from which the best are selected to trade for us? No problem, we can do that in ~500 lines of code...

“DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. ” -DEAP docs**DEAP Source**: https://github.com/DEAP/deap**DEAP Docs**: https://deap.readthedocs.io/en/master/**DEAP Notebooks**, not to be confused with the examples, has some GREAT learning resources as notebooks: https://github.com/DEAP/notebooks

Note: As with the prior TF example post, this is a more academic algo attempting to apply an intro ML example on QC in backtest form. Some may want to try using Lean+DEAP locally for now if they hope to take such work live with persisted evolutionary models/populations.

This messy pair of scripts may be a bit heavy for people brand-new to ML/Genetic Programming/Python. I highly recommend reading over the DEAP docs above and the simple examples/tutorials before taking a look at how I smashed those things into a QCAlgorithm with classification in mind. If you want to jump right to the topic of Genetic Programming and symbolic regression, you can start here.

**Genetic Programming**:

Our goal is to evolve small snippets of logic which generate trading signals like a binary classifier using symbolic regression; a great introductory use of genetic programming.

Inputs: Trivial samples of feature inputs are used to power the symbolic regression. In this case, the percent returns of the past 10 day's open to close prices are fed in as inputs for each next-day's inference. The system buys in the morning and sells before close, attempting to classify and profit each intraday change in the SPY. A fixed length of history is stored, so we use a rolling training set; not ideal but I just went with that choice for performance and over-fitting sake(keep the target moving despite nasty overlapping samples).

Primitives are kept basic here like the DEAP symbol regression example, opting for just logic and math operations. You can use pretty much anything for Primitives and Terminals though which is the real fun part of building such genetic programming systems.

Multi Objective optimization is used in this example to evaluate individuals. Here is a great learning resource on that! I have found it to be very useful when working to constrain the problem and drive the evolution toward a constructive solution. Overfitting can be a nightmare in such systems, like with reinforcement systems, where the optimization objective is law so what you ask for is what you get. This is where the evaluation function has to shine.

The biggest change when working with a multi objective function is selecting a 'best' individual for use in trading. Instead of an easily ranked list, we are given a pareto frontier which can include any number(1+) of non-dominated individuals per generation. I didn't supply a way to pick from this frontier, instead I use the whole thing as an ensemble model to generate a single Long or Short signal each morning.

Fitness/Optimization:

We take the number of samples we want to evaluate and split that into batches in order to calculate a fitness tuple for each individual each generation, the fitnesses to be minimized in this example are:

- average of the classification log loss across batches

- max log loss across batches, what is our worst performing batch?

- negative of the consistency score, or the fraction of batches with a log loss lower than ~-ln(0.5) to prevent crazy over fitting of some samples. Since I am using a sliding historical window of data this seems pretty important to prevent overfitting of older data and skewing of average log loss. Due to low number of samples this gives little input and could be more strict(goes to 100% easily).

This is just one fitness tuple that I tried and is based on my current use of a similar system on Numer.ai which easily places in the top 100-200 with few generations; not great, but it sorta works. That said you can optimize anything!

Results:

The algo needs time to evolve a not-terrible solution, so we expect 'poor performance' to start as well as little variance in the signal. This is because long only strategies are easy to figure out(just needs to return 0.5-1.0) for an appreciating asset such as the SPY during the backtest period. With time more complex behavior can emerge.

**QCAlgorithm:**

The algo buys in the morning, sells before close. Minute mode on a single asset(SPY). Optional relative stop risk management if you un-comment in OnData.

For performance sake history data is downloaded once in bulk, and then appended to daily. Done as simply as possible and could benefit from some beefing up/error checking.

One reason the algo is slow is because of the lack of numpy, tensorflow, or some other form of vectorized primitives which allow for faster individuals and or batch wide evaluations to vastly increase performance(see DEAP numpy tutorial).

DEAP works great with many cpu in case you go local; there we can perform the evolution as a separate process fired from OnData which allows for the use of checkpointing to persist the current generation, evolve across all cpu or a cluster, and then save for use in the Lean algo as it continues to run.

**Charts explained**:

Charts are used to monitor the evolution process. Here are the charts explained:

Signal - 1 or -1 to signify direction

Mean Loss – Generation's fitness #1, mean average batch loss

Max Loss - Generation's fitness #2, max average batch loss (worst batch)

Consistency - Generation's fitness #3, frac of batch under -ln(0.5). Note we return a negative consistence score in evaluation to make this a double minimization problem(mentioned above in multi objective) but we chart a positive score.

Size – Length of individuals(min, max, mean) to track bloating and variety among the population

Label-Dist – Since essentially a binary classifier, what percent of each class do we have to ensure diverse training samples? Important as we do not want to bias the evaluation(aka just buy the SPY and wohoo profit!). *Note **again** how when starting only long-only individuals survive and get selected as the best, probably due to the higher average returns for SPY. It takes some effort(generations) to learn how to short!*

Leverage – Current leverage of portfolio

Ram – Memory use (for performance monitoring)

Evo Time – Time per generation (for performance monitoring

**Here is what our starting and ending classification programs looked like:**

First Generation Selected Individual

# a modest start

1 : fmin(1.0, 0.5)

Final Generation Selected Individual:

# I think we all guessed this would be the solution for generation #540...

540 : fmin(mul(absolute(absolute(sign(mul(ARG0, fmax(ARG1, mul(absolute(sign(mul(ARG0, fmax(ARG1, ARG7)))), 0.6063372498263256)))))), 0.6063372498263256), fmax(ARG4, fmax(sub(sin(fmax(fmax(fmax(cos(protectedDiv(sin(ARG2), protectedDiv(ARG0, -0.5))), sub(fmax(ARG1, protectedDiv(fmax(fmax(ARG1, -0.5), protectedDiv(mul(ARG0, fmax(ARG1, ARG7)), -0.5)), ARG0)), sin(ARG2))), protectedDiv(fmax(fmax(protectedDiv(abs(sign(ARG4)), -0.5), ARG7), protectedDiv(ite_float(or_(True, False), square(ARG6), protectedDiv(0.0, ARG3)), -0.5)), ARG6)), ARG2)), ARG2), ARG4)))