book
Checkout our new book! Hands on AI Trading with Python, QuantConnect, and AWS Learn More arrow

Brain

Brain Language Metrics on Company Filings

Introduction

The Brain Language Metrics on Company Filings dataset provides the results of an NLP system that monitors several language metrics on 10-K and 10-Q company reports for US Equities. The data covers 5,000 US Equities, starts in January 2010, and is delivered on a daily frequency. The dataset is made of two parts; the first one includes the language metrics of the most recent 10-K or 10-Q report for each firm, namely:

  1. Financial sentiment
  2. Percentage of words belonging to financial domain classified by language types (e.g. “litigious” or “constraining” language)
  3. Readability score
  4. Lexical metrics such as lexical density and richness
  5. Text statistics such as the report length and the average sentence length

The second part includes the differences between the two most recent 10-Ks or 10-Qs reports of the same period for each company, namely:

  1. Difference of the various language metrics (e.g. delta sentiment, delta readability score, delta percentage of a specific language type etc.)
  2. Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)

The analysis is available for the whole report and for specific sections of the report (e.g. Risk Factors and MD&A).

For more information, refer to Brain's summary paper.

This dataset depends on the US Equity Security Master dataset because the US Equity Security Master dataset contains information on splits, dividends, and symbol changes.

For more information about the Brain Language Metrics on Company Filings dataset, including CLI commands and pricing, see the dataset listing.

About the Provider

Brain is a Research Company that creates proprietary datasets and algorithms for investment strategies, combining experience in financial markets with strong competencies in Statistics, Machine Learning, and Natural Language Processing. The founders share a common academic background of research in Physics as well as extensive experience in Financial markets.

Getting Started

The following snippet demonstrates how to request data from the Brain Language Metrics on Company Filings dataset:

self.aapl = self.add_equity("AAPL", Resolution.DAILY).symbol
self.dataset_10k_symbol = self.add_data(BrainCompanyFilingLanguageMetrics10K , self.aapl).symbol
self.dataset_all_symbol = self.add_data(BrainCompanyFilingLanguageMetricsAll, self.aapl).symbol

self.universe_10k = self.add_universe(BrainCompanyFilingLanguageMetricsUniverse10K, self.universe_selection)
self.universe_all = self.add_universe(BrainCompanyFilingLanguageMetricsUniverseAll, self.universe_selection)
_symbol = AddEquity("AAPL", Resolution.Daily).Symbol;
_dataset10KSymbol = AddData<BrainCompanyFilingLanguageMetrics10K>(_symbol).Symbol;
_datasetAllSymbol = AddData<BrainCompanyFilingLanguageMetricsAll>(_symbol).Symbol;

_universe10k = AddUniverse<BrainCompanyFilingLanguageMetricsUniverse10K>(UniverseSelection);
_universeAll = AddUniverse<BrainCompanyFilingLanguageMetricsUniverseAll>(UniverseSelection);

Data Summary

The following table describes the dataset properties:

PropertyValue
Start DateJanuary 2010
Asset Coverage*5,000 US Equities
Data DensitySparse
ResolutionDaily
TimezoneUTC
The coverage includes all assets since the start date. It increases over time.

Requesting Data

To add Brain Language Metrics on Company Filings data to your algorithm, call the AddDataadd_data method. Save a reference to the dataset Symbol so you can access the data later in your algorithm.

class BrainCompanyFilingNLPDataAlgorithm(QCAlgorithm):
    def initialize(self) -> None:
        self.set_start_date(2010, 1, 1)
        self.set_end_date(2021, 7, 8)
        self.set_cash(100000)
        
        self.aapl = self.add_equity("AAPL", Resolution.DAILY).symbol
        self.dataset_10k_symbol = self.add_data(BrainCompanyFilingLanguageMetrics10K, self.aapl).symbol
        self.dataset_all_symbol = self.add_data(BrainCompanyFilingLanguageMetricsAll, self.aapl).symbol
public class BrainCompanyFilingNLPDataAlgorithm : QCAlgorithm
{
    private Symbol _symbol, _dataset10KSymbol, _datasetAllSymbol;
    
    public override void Initialize()
    {
        SetStartDate(2010, 1, 1);
        SetEndDate(2021, 7, 8);
        SetCash(100000);
        
        _symbol = AddEquity("AAPL", Resolution.Daily).Symbol;
        _dataset10KSymbol= AddData<BrainCompanyFilingLanguageMetrics10K>(_symbol).Symbol;
        _datasetAllSymbol= AddData<BrainCompanyFilingLanguageMetricsAll>(_symbol).Symbol;
    }
}

Accessing Data

To get the current Brain Language Metrics on Company Filings data, index the current Slice with the dataset Symbol. Slice objects deliver unique events to your algorithm as they happen, but the Slice may not contain data for your dataset at every time step. To avoid issues, check if the Slice contains the data you want before you index it.

def on_data(self, slice: Slice) -> None:
    if slice.contains_key(self.dataset_10k_symbol):
        data_point = slice[self.dataset_10k_symbol]
        self.log(f"{self.dataset_10k_symbol} report sentiment at {slice.time}: {data_point.report_sentiment.sentiment}")

    if slice.contains_key(self.dataset_all_symbol):
        data_point = slice[self.dataset_all_symbol]
        self.log(f"{self.dataset_all_symbol} report sentiment at {slice.time}: {data_point.report_sentiment.sentiment}")
public override void OnData(Slice slice)
{
    if (slice.ContainsKey(_dataset10KSymbol))
    {
        var dataPoint = slice[_dataset10KSymbol];
        Log($"{_dataset10KSymbol} report sentiment at {slice.Time}: {dataPoint.ReportSentiment.Sentiment}");
    }

    if (slice.ContainsKey(_datasetAllSymbol))
    {
        var dataPoint = slice[_datasetAllSymbol];
        Log($"{_datasetAllSymbol} report sentiment at {slice.Time}: {dataPoint.ReportSentiment.Sentiment}");
    }
}

To iterate through all of the dataset objects in the current Slice, call the Getget method.

def on_data(self, slice: Slice) -> None:
    for dataset_symbol, data_point in slice.get(BrainCompanyFilingLanguageMetrics10K).items():
        self.log(f"{dataset_symbol} report sentiment at {slice.time}: {data_point.report_sentiment.sentiment}")

    for dataset_symbol, data_point in slice.get(BrainCompanyFilingLanguageMetricsAll).items():
        self.log(f"{dataset_symbol} report sentiment at {slice.time}: {data_point.report_sentiment.sentiment}")
public override void OnData(Slice slice)
{
    foreach (var kvp in slice.Get<BrainCompanyFilingLanguageMetrics10K>())
    {
        var datasetSymbol = kvp.Key;
        var dataPoint = kvp.Value;
        Log($"{datasetSymbol} report sentiment at {slice.Time}: {dataPoint.ReportSentiment.Sentiment}");
    }

    foreach (var kvp in slice.Get<BrainCompanyFilingLanguageMetricsAll>())
    {
        var datasetSymbol = kvp.Key;
        var dataPoint = kvp.Value;
        Log($"{datasetSymbol} report sentiment at {slice.Time}: {dataPoint.ReportSentiment.Sentiment}");
    }
}

Historical Data

To get historical Brain Language Metrics on Company Filings data, call the Historyhistory method with the dataset Symbol. If there is no data in the period you request, the history result is empty.

# DataFrames
ten_k_history_df = self.history(self.dataset_10k_symbol, 100, Resolution.DAILY)
all_history_df = self.history(self.dataset_all_symbol, 100, Resolution.DAILY)
history_df = self.history([self.dataset_10k_symbol, self.dataset_all_symbol], 100, Resolution.DAILY)

# Dataset objects
ten_k_history_bars = self.history[BrainCompanyFilingLanguageMetrics10K](self.dataset_10k_symbol, 100, Resolution.DAILY)
all_history_bars = self.history[BrainCompanyFilingLanguageMetricsAll](self.dataset_all_symbol, 100, Resolution.DAILY)
// Dataset objects
var tenKHistory = History<BrainCompanyFilingLanguageMetrics10K>(_dataset10KSymbol, 100, Resolution.Daily);
var allHistory = History<BrainCompanyFilingLanguageMetricsAll>(_datasetAllSymbol, 100, Resolution.Daily);

// Slice objects
var history = History(new[] {_dataset10KSymbol, _datasetAllSymbol}, 100, Resolution.Daily);

For more information about historical data, see History Requests.

Universe Selection

To select a dynamic universe of US Equities based on Brain Language Metrics on Company Filings data, call the AddUniverseadd_universe method with the BrainCompanyFilingLanguageMetricsUniverseAll class or the BrainCompanyFilingLanguageMetricsUniverse10K class and a selection function.

def initialize(self) -> None:
    self._universe = self.add_universe(BrainCompanyFilingLanguageMetricsUniverseAll, self.universe_selection)

def universe_selection(self, alt_coarse: List[BrainCompanyFilingLanguageMetricsUniverseAll]) -> List[Symbol]:
    return [d.symbol for d in alt_coarse \
                if d.report_sentiment.sentiment > 0 \
                and d.management_discussion_analyasis_of_financial_condition_and_results_of_operations.sentiment > 0]
private Universe _universe;
public override void Initialize()
{
    _universe = AddUniverse<BrainCompanyFilingLanguageMetricsUniverseAll>(altCoarse =>
    {
        return from d in altCoarse.OfType<BrainCompanyFilingLanguageMetricsUniverseAll>()
            where d.ReportSentiment.Sentiment > 0m && 
                       d.ManagementDiscussionAnalyasisOfFinancialConditionAndResultsOfOperations.Sentiment > 0m
            select d.Symbol;
    });
}

For more information about dynamic universes, see Universes.

Universe History

You can get historical universe data in an algorithm and in the Research Environment.

Historical Universe Data in Algorithms

To get historical universe data in an algorithm, call the Historyhistory method with the Universe object and the lookback period. If there is no data in the period you request, the history result is empty.

var universeHistory = History(_universe, 30, Resolution.Daily);
foreach (var universeDay in universeHistory)
{
    foreach (BrainCompanyFilingLanguageMetricsUniverse10K languageMetrics in universeDay)
    {
        Log($"{languageMetrics.Symbol} sentiment at {languageMetrics.EndTime}: {languageMetrics.ReportSentiment.Sentiment}");
    }
}
# DataFrame example where the columns are the universe object attributes: 
history_df = self.history(self._universe, 30, Resolution.DAILY, flatten=True)

# Series example where the values are lists of the universe objects: 
universe_history = self.history(self._universe, 30, Resolution.DAILY)
for (_, time), universeDay in universe_history.items():
    for language_metrics in universeDay:
        self.log(f"{language_metrics.symbol} sentiment at {language_metrics.end_time}: {language_metrics.report_sentiment.sentiment}")

Historical Universe Data in Research

To get historical universe data in research, call the UniverseHistoryuniverse_history method with the Universe object, a start date, and an end date. This method returns the filtered universe. If there is no data in the period you request, the history result is empty.

var universeHistory = qb.UniverseHistory(universe, qb.Time.AddDays(-30), qb.Time);
foreach (var universeDay in universeHistory)
{
    foreach (BrainCompanyFilingLanguageMetricsUniverse10K languageMetrics in universeDay)
    {
        Console.WriteLine($"{languageMetrics.Symbol} sentiment at {languageMetrics.EndTime}: {languageMetrics.ReportSentiment.Sentiment}");
    }
}
# DataFrame example where the columns are the universe object attributes: 
history_df = qb.universe_history(universe, qb.time-timedelta(30), qb.time, flatten=True)

# Series example where the values are lists of the universe objects: 
universe_history = qb.universe_history(universe, qb.time-timedelta(30), qb.time)
for (_, time), universeDay in universe_history.items():
    for language_metrics in universeDay:
        print(f"{language_metrics.symbol} sentiment at {language_metrics.end_time}: {language_metrics.report_sentiment.sentiment}")

You can call the Historyhistory method in Research.

Remove Subscriptions

To remove a subscription, call the RemoveSecurityremove_security method.

self.remove_security(self.dataset_10k_symbol)
self.remove_security(self.dataset_all_symbol)
RemoveSecurity(_dataset10KSymbol);
RemoveSecurity(_datasetAllSymbol);

If you subscribe to Brain Language Metrics on Company Filings data for assets in a dynamic universe, remove the dataset subscription when the asset leaves your universe. To view a common design pattern, see Track Security Changes.

Example Applications

The Brain Language Metrics on Company Filings dataset enables you to test strategies using language metrics and their differences gathered from 10K and 10Q reports. Examples include the following strategies:

Classic Algorithm Example

The following example algorithm buys Apple when the sentiment of their 10K report is positive. Otherwise, it holds cash.

from AlgorithmImports import *
from QuantConnect.DataSource import *

class BrainCompanyFilingNLPDataAlgorithm(QCAlgorithm):
    def initialize(self):
        self.set_start_date(2010, 1, 1)
        self.set_end_date(2021, 7, 8)
        self.set_cash(100000) 
        
        # Requesting data -- we aim to obtain a sentiment score from the company filings
        # Combining both fundamental and sentiment factor, as well as past performance and future provision
        self.aapl = self.add_equity("AAPL", Resolution.DAILY).symbol
        self.dataset_symbol = self.add_data(BrainCompanyFilingLanguageMetrics10K , self.aapl).symbol
        
        # Historical data
        history = self.history(self.dataset_symbol, 365, Resolution.DAILY)
        self.debug(f"We got {len(history)} items from our history request for {self.dataset_symbol}")
        
        
    def on_data(self, data):
        # Trade base on the updated report sentiment
        if data.contains_key(self.dataset_symbol):
            sentiment = data[self.dataset_symbol].report_sentiment.sentiment
            # Buy for a positive sentiment score for the positive return projection
            self.set_holdings(self.symbol, int(sentiment > 0))
public class BrainCompanyFilingNLPDataAlgorithm : QCAlgorithm
{
    private Symbol _symbol;
    private Symbol _datasetSymbol;
    
    public override void Initialize()
    {
        SetStartDate(2010, 1, 1);
        SetEndDate(2021, 7, 8);
        SetCash(100000);
        
        // Requesting data -- we aim to obtain a sentiment score from the company filings
        // Combining both fundamental and sentiment factor, as well as past performance and future provision
        _symbol = AddEquity("AAPL", Resolution.Daily).Symbol;
        _datasetSymbol = AddData<BrainCompanyFilingLanguageMetrics10K>(_symbol).Symbol;
        
        // Historical data
        var history = History<BrainCompanyFilingLanguageMetrics10K>(_datasetSymbol, 365, Resolution.Daily);
        Debug($"We got {history.Count()} items from our history request for {_datasetSymbol}");
    }

    public override void OnData(Slice slice)
    {
        // Trade base on the updated report sentiment
        if (slice.ContainsKey(_datasetSymbol))
        {
            // Buy for a positive sentiment score for the positive return projection
            var sentiment = slice[_datasetSymbol].ReportSentiment.Sentiment;
            SetHoldings(_symbol, sentiment > 0 ? 1 : 0);
        }
    }
}

Framework Algorithm Example

The following example algorithm creates a dynamic universe of US Equities that have positive sentiment in their 10K report and then forms an equal-weighted portfolio:

from AlgorithmImports import *
from QuantConnect.DataSource import *

class BrainCompanyFilingNLPDataAlgorithm(QCAlgorithm):
    def initialize(self) -> None:
        self.set_start_date(2010, 1, 1)
        self.set_end_date(2021, 7, 8)
        self.set_cash(100000) 
        
        self.universe_settings.resolution = Resolution.DAILY
        # Filter based on company filing data
        self.add_universe(BrainCompanyFilingLanguageMetricsUniverse10K, self.universe_selection)
        
        self.add_alpha(BrainCompanyFilingNLPAlphaModel())
        
        self.set_portfolio_construction(EqualWeightingPortfolioConstructionModel())
         
        self.add_risk_management(NullRiskManagementModel())
        
        self.set_execution(ImmediateExecutionModel())

    def universe_selection(self, alt_coarse: List[BrainCompanyFilingLanguageMetricsUniverse10K]) -> List[Symbol]:
        # Filter for the stocks with positive sentiment score for the positive return projection
        # Combining both fundamental and sentiment factor, as well as past performance and future provision
        return [d.symbol for d in alt_coarse \
                if d.report_sentiment.sentiment is not None and d.report_sentiment.sentiment > 0]
        
class BrainCompanyFilingNLPAlphaModel(AlphaModel):
    
    def update(self, algorithm: QCAlgorithm, slice: Slice):
        insights = []

        # Signal to invest in the selected universe, which are expected to have a positive return due to positive fundamentals and future prospect
        for symbol in algorithm.active_securities.keys:
            if not slice.contains_key(symbol):
                continue
            insights.append(Insight.price(symbol, timedelta(days=1), InsightDirection.UP))
        
        return insights
public class BrainCompanyFilingNLPDataAlgorithm : QCAlgorithm
{
    public override void Initialize()
    {
        SetStartDate(2010, 1, 1);
        SetEndDate(2021, 7, 8);
        SetCash(100000);
        
        // Filter based on company filing data
        UniverseSettings.Resolution = Resolution.Daily;
        AddUniverse<BrainCompanyFilingLanguageMetricsUniverse10K>(altCoarse =>
        {
            // Filter for the stocks with positive sentiment score for the positive return projection
            // Combining both fundamental and sentiment factor, as well as past performance and future provision
            return from d in altCoarse.OfType<BrainCompanyFilingLanguageMetricsUniverse10K>() 
                where d.ReportSentiment.Sentiment > 0m
                select d.Symbol;
        });
        
        AddAlpha(new BrainCompanyFilingNLPAlphaModel());
        
        SetPortfolioConstruction(new EqualWeightingPortfolioConstructionModel());
        
        AddRiskManagement(new NullRiskManagementModel());
        
        SetExecution(new ImmediateExecutionModel());
    }
}
    
public class BrainCompanyFilingNLPAlphaModel : AlphaModel
{
    public override IEnumerable<Insight> Update(QCAlgorithm algorithm, Slice slice)
    {
        var insights = new List<Insight>();
        
        // Signal to invest in the selected universe, which are expected to have a positive return due to positive fundamentals and future prospect
        foreach (var symbol in algorithm.ActiveSecurities.Keys)
        {
            if (!slice.ContainsKey(symbol))
            {
                continue;
            }
            insights.Add(Insight.Price(symbol, TimeSpan.FromDays(1), InsightDirection.Up));
        }
        return insights;
    }
}

Research Example

The following example lists US Equities having the highest 2-day rank.

#r "../QuantConnect.DataSource.BrainSentiment.dll"
using QuantConnect.DataSource;

var qb = new QuantBook();

// Requesting data
var aapl = qb.AddEquity("AAPL", Resolution.Daily).Symbol;
var symbol = qb.AddData<BrainCompanyFilingLanguageMetrics10K>(aapl).Symbol;

// Historical data
var history = qb.History<BrainCompanyFilingLanguageMetrics10K>(symbol, 180, Resolution.Daily);
foreach (BrainCompanyFilingLanguageMetrics10K languageMetrics in history)
{
    Console.WriteLine($"{languageMetrics} at {languageMetrics.EndTime}");
}

// Add Universe Selection
IEnumerable<Symbol> UniverseSelection(IEnumerable<BaseData> altCoarse)
{
    var symbols = (from d in altCoarse.OfType<BrainCompanyFilingLanguageMetricsUniverse10K>()
        orderby d.ReportSentiment.Sentiment descending select d.Symbol).Take(10);
    return symbols;
}
var universe = qb.AddUniverse<BrainCompanyFilingLanguageMetricsUniverse10K>(UniverseSelection);

// Historical Universe data
var universeHistory = qb.UniverseHistory(universe, qb.Time.AddDays(-10), qb.Time);
foreach (var universeDay in universeHistory)
{
    foreach (BrainCompanyFilingLanguageMetricsUniverse10K languageMetrics in universeDay)
    {
        Console.WriteLine($"{languageMetrics.Symbol} sentiment at {languageMetrics.EndTime}: {languageMetrics.ReportSentiment.Sentiment}");
    }
}
qb = QuantBook()

# Requesting Data
aapl = qb.AddEquity("AAPL", Resolution.Daily).Symbol
symbol = qb.AddData(BrainCompanyFilingLanguageMetrics10K, aapl).Symbol

# Historical data
history = qb.History(BrainCompanyFilingLanguageMetrics10K, symbol, 180, Resolution.Daily)
for (symbol, time), row in history.iterrows():
    print(f"{symbol} sentiment at {time}: {row['reportsentiment'].Sentiment}")

# Add Universe Selection
def UniverseSelection(alt_coarse: List[BrainCompanyFilingLanguageMetricsUniverse10K]) -> List[Symbol]:
    return [d.Symbol for d in sorted([x for x in alt_coarse if x.ReportSentiment.Sentiment],
        key=lambda x: x.ReportSentiment.Sentiment, reverse=True)[:10]]

universe = qb.AddUniverse(BrainCompanyFilingLanguageMetricsUniverse10K, UniverseSelection)

# Historical Universe data
universe_history = qb.UniverseHistory(universe, qb.Time-timedelta(10), qb.Time)
for (_, time), universeDay in universe_history.items():
    for language_metrics in universeDay:
        print(f"{language_metrics.Symbol} sentiment at {language_metrics.EndTime}: {language_metrics.ReportSentiment.Sentiment}")

Data Point Attributes

The Brain Language Metrics on Company Filings dataset provides BrainCompanyFilingLanguageMetrics and BrainCompanyFilingLanguageMetricsUniverse objects.

BrainCompanyFilingLanguageMetrics Attributes

BrainCompanyFilingLanguageMetrics objects have the following attributes:

BrainCompanyFilingLanguageMetricsUniverse Attributes

BrainCompanyFilingLanguageMetricsUniverse objects have the following attributes:

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: