## Abstract

In recent years, factor investing gained significant popularity among global institutional investors. In this tutorial, we first developed a factor selection model to test if factors have the ability to differentiate potential winners and losers in the stock market. Then we use those preselected factors to implement the factor ranking stock selection algorithm based on *Factor Based Stock Selection Model for Turkish Equities*, 2015, Ayhan Yüksel.

## Factor Selection

QuantConnect provides Morningstar fundamentals data for US Equities. Valuation Ratios is daily data. For others like operation ratios and financial statements data are available for multiple periods depending on the property. To view the fundamental factors that are availalbe, see Data Point Attributes.

The algorithm is designed to test the significance of one factor each time.

```
def Initialize(self):
self.SetStartDate(2005,01,01) #Set Start Date
self.SetEndDate(2012,03,01) #Set End Date
self.SetCash(50000) #Set Strategy Cash
self.UniverseSettings.Resolution = Resolution.Daily
self.AddUniverse(self.CoarseSelectionFunction, self.FineSelectionFunction)
self.AddEquity("SPY") # add benchmark
self.numOfCourseSymbols = 200
self.numOfPortfolio = 5
self._changes = None
self.flag1 = 1 # variable to control the monthly rebalance of coarse and fine selection function
self.flag2 = 0 # variable to control the monthly rebalance of OnData function
self.flag3 = 0 # variable to record the number of rebalancing times
# store the monthly returns of different portfolios in a dataframe
self.df_return = pd.DataFrame(index = range(self.numOfPortfolio+1))
# schedule an event to fire at the first trading day of SPY
self.Schedule.On(self.DateRules.MonthStart("SPY"), self.TimeRules.AfterMarketOpen("SPY"), Action(self.Rebalancing))
```

#### Step 1: Ranking the stocks by factor values

First, we sort the stocks by daily dollar volume and take the top stocks with the highest dollar volumes as our candidates. There is a convenient way using our universe selection API. Universes are refreshed every day by default. Here we use Scheduled events API to trigger code to run at the first trading day each month and use three flag variables to control the rebalancing of `CoarseSelection`

, `FineSelection`

and `Ondata`

functions.

Coarse universe selection is the built-in universe data provided by QuantConnect which allows you to filter the universe of over 16,000 symbols to perform rough filtering before your algorithm. Because coarse selection function takes all the Equities including ETFs which have no fundamental data into account, we need the property `x.HasFundamentalData`

to exclude them from our candidate stocks pool.

```
# sort the data by daily dollar volume and take the top entries
def CoarseSelectionFunction(self, coarse):
if self.flag1:
CoarseWithFundamental = [x for x in coarse if x.HasFundamentalData]
sortedByVolume = sorted(CoarseWithFundamental, key=lambda x: x.DollarVolume, reverse=True)
top = sortedByVolume[:self.numOfCourseSymbols]
return [i.Symbol for i in top]
else:
return []
```

We extract the factor values of candidate stocks at the beginning of each month and sort the stocks in ascending order of their factor values. Here we use 12-months' total risk-based capital data

```
x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths
```

as an example.
It is the sum of Tier 1 and Tier 2 Capital. `x.Symbol.Value`

can give the string symbol of selected stock `x`

. Then we save those sorted symbols as `self.symbol`

.

```
def FineSelectionFunction(self, fine):
if self.flag1:
self.flag1 = 0
self.flag2 = 1
# filter the fine by deleting equities wit zero factor value
filtered_fine = [x for x in fine if x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths != 0 ]
# sort the fine by reverse order of factor value
sorted_fine = sorted(filtered_fine, key=lambda x: x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths, reverse=True)
self.symbol = [str(x.Symbol.Value) for x in sorted_fine]
# factor_value = [x.ValuationRatios.PERatio for x in sorted_fine]
self.flag3 = self.flag3 + 1
return []
else:
return []
```

#### Step 2: Compute the monthly return of portfolios

At the end of each month, we extract the one-month history close prices of each stock and compute the monthly returns.

```
sorted_symbol = self.symbol
self.AddEquity("SPY") # add benchmark
for x in sorted_symbol:
self.AddEquity(x)
history = self.History(20,Resolution.Daily)
monthly_return =[]
new_symbol_list =[]
for j in range(len(sorted_symbol)):
try:
daily_price = []
for slice in history:
bar = slice[sorted_symbol[j]]
daily_price.append(float(bar.Close))
new_symbol_list.append(sorted_symbol[j])
monthly_return.append(daily_price[-1] / daily_price[0] - 1)
except:
self.Log("No history data for " + str(sorted_symbol[j]))
del daily_price
# the length of monthly_return list should be divisible by the number of portfolios
monthly_return = monthly_return[:int(math.floor(len(monthly_return) / self.numOfPortfolio) * self.numOfPortfolio)]
```

We divide the stocks into 5 portfolios and compute the average monthly returns of each portfolio. Then we add the monthly return of benchmark "SPY" at the last line of the data frame `df_return`

.

```
reshape_return = np.reshape(monthly_return, (self.numOfPortfolio, len(monthly_return)/self.numOfPortfolio))
# calculate the average return of different portfolios
port_avg_return = np.mean(reshape_return,axis=1).tolist()
# add return of "SPY" as the benchmark to the end of the return list
benchmark_syl = self.AddEquity("SPY").Symbol
history_benchmark = self.History(20,Resolution.Daily)
benchmark_daily_price = [float(slice[benchmark_syl].Close) for slice in history_benchmark]
benchmark_monthly_return = (benchmark_daily_price[-1]/benchmark_daily_price[0]) - 1
port_avg_return.append(benchmark_monthly_return)
self.df_return[str(self.flag3)] = port_avg_return
```

#### Step 3: Generate the metrics to test the factor significance

After getting the monthly returns of portfolios and the benchmark, we compute the average annual return and excess return over benchmark of each portfolio across the whole backtesting period. Then we generate three metrics to judge the significance of each factor.

- The first metrics is the correlation between the portfolio' returns and their rank. The absolute value of the correlation coefficient should larger than 0.8.
- If the return of the rank first portfolio larger than the portfolio at the bottom of the return rankings, we define it the win portfolio and the loss portfolio and vice versa. The win probability is the probability that the win portfolio return outperform the benchmark return. The loss probability is the probability that the loss portfolio return underperform the benchmark. If the factor is significant, both loss and win probability should greater than 0.4.
- The excess return of win portfolio should be greater than 0.25, while the excess return of loss portfolio should be lower than 0.05.

```
def calculate_criteria(self,df_port_return):
total_return = (df_port_return + 1).T.cumprod().iloc[-1,:] - 1
annual_return = (total_return+1)**(1./6)-1
excess_return = annual_return - np.array(annual_return)[-1]
correlation = annual_return[0:5].corr(pd.Series([5,4,3,2,1],index = annual_return[0:5].index))
# higher factor with higher return
if np.array(total_return)[0] > np.array(total_return)[-2]:
loss_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
win_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
win_port_excess_return = np.array(excess_return)[0]
loss_port_excess_return = np.array(excess_return)[-2]
# higher factor with lower return
else:
loss_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
win_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
win_port_excess_return = np.array(excess_return)[-2]
loss_port_excess_return = np.array(excess_return)[0]
test_result = {}
test_result["correelation"]=correlation
test_result["win probality"]=win_prob
test_result["loss probality"]=loss_prob
test_result["win portfolio excess return"]=win_port_excess_return
test_result["loss portfolio excess return"]=loss_port_excess_return
return test_result
```

The follow tables shows the factor significance testing result:

Factor | FCFYield | BuyBackYield | PriceChange1M | TrailingDividendYield | EVToEBITDA | RevenueGrowth | BookValuePerShare |
---|---|---|---|---|---|---|---|

The correlation | -0.936 | -0.987 | 0.918 | -0.981 | 0.939 | 0.89 | -0.92 |

Win Probability | 0.630 | 0.639 | 1 | 0.667 | 0.722 | 0.69 | 0.69 |

Loss probability | 0.426 | 0.472 | 1 | 0.518 | 0.472 | 0.42 | 0.40 |

Excess Return(Win) | 0.324 | 0.212 | 0.303 | 0.225 | 0.414 | 0.23 | 0.27 |

Excess Return(Loss) | 0.060 | 0.037 | -1.67 | 0.043 | 0.042 | 0.07 | 0.06 |

We choose 4 factors: `FCFYield`

, `PriceChange1M`

, `BookValuePerShare`

and `RevenueGrowth`

.

## Stock Selection

Next we will select the stocks.

#### Step 1: Rank the stocks by factor values

First, we remove the stocks without fundamental data or have zero factor value. For each pre-selected factor, we rank the stocks by those factor values. The order is descending if the factor correlation is negative, it is ascending if the factor correlation is positive.

#### Step 2: Calculate equally weighted composite factor scores

The second step is using different selected factor variables to calculate an equally weighted composite factor score for each stock.

- First, according to the factor order, we place our universe of stocks into 5 distinct quintile portfolios, named P1, P2, P3, P4 and P5. The ranking of portfolios sets out the preference of the factor model, i.e. the first portfolio (P1) corresponds to the “most preferred” stocks, while the fifth (P5) corresponds to the “least preferred” stocks. Suppose there are \(n\) stocks in total. Then the stocks fall into the first rank portfolio will have score \(p\), the stocks fall into the second rank portfolio will get score \(p-1\) and so on. Then we can get a score for every stock. We did the same calculation for each factor.
- Second, we calculate a “Composite Factor Score” by combining the six-factor scores and using an equal weighting scheme. Then we get composite factor score for each stock.
- Third, we then rank the stocks in our universe according to their Composite Factor Scores and choose the highest ranked 20 stocks to construct our portfolios at the beginning of each month.
- At the end of each month, we repeat the above steps to construct the new portfolio and adjust the holding stocks.

## Reference

- Factor Based Stock Selection Model for Turkish Equities, 2015, Ayhan Yüksel Online Copy

Jing Wu

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

To unlock posting to the community forums please complete at least 30% of Boot Camp.

You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!