About Brain Language Metrics on Company Filings

The Brain Language Metrics on Company Filings dataset provides the results of an NLP system that monitors several language metrics on 10-K and 10-Q company reports for US Equities. The data covers 5,000 US Equities, starts in January 2010, and is delivered on a daily frequency. The dataset is made of two parts; the first one includes the language metrics of the most recent 10-K or 10-Q report for each firm, namely:

Financial sentiment
Percentage of words belonging to financial domain classified by language types (e.g. “litigious” or “constraining” language)
Readability score
Lexical metrics such as lexical density and richness
Text statistics such as the report length and the average sentence length

The second part includes the differences between the two most recent 10-Ks or 10-Qs reports of the same period for each company, namely:

Difference of the various language metrics (e.g. delta sentiment, delta readability score, delta percentage of a specific language type etc.)
Similarity metrics between documents, also with respect to a specific language type (for example similarity with respect to “litigious” language or “uncertainty” language)

The analysis is available for the whole report and for specific sections of the report (e.g. Risk Factors and MD&A).

For more information, refer to Brain's summary paper.

This dataset depends on the US Equity Security Master dataset because the US Equity Security Master dataset contains information on splits, dividends, and symbol changes.

About Brain

Brain is a Research Company that creates proprietary datasets and algorithms for investment strategies, combining experience in financial markets with strong competencies in Statistics, Machine Learning, and Natural Language Processing. The founders share a common academic background of research in Physics as well as extensive experience in Financial markets.

Add Brain Language Metrics on Company Filings

Add Dataset Create Free QuantConnect Account

About QuantConnect

QuantConnect was founded in 2012 to serve quants everywhere with the best possible algorithmic trading technology. Seeking to disrupt a notoriously closed-source industry, QuantConnect takes a radically open-source approach to algorithmic trading. Through the QuantConnect web platform, more than 50,000 quants are served every month.

Algorithm Example

from AlgorithmImports import *
from QuantConnect.DataSource import *

class BrainCompanyFilingNLPDataAlgorithm(QCAlgorithm):
    def initialize(self):
        self.set_start_date(2010, 1, 1)
        self.set_end_date(2021, 7, 8)
        self.set_cash(100000) 
        
        # Requesting data -- we aim to obtain a sentiment score from the company filings
        # Combining both fundamental and sentiment factor, as well as past performance and future provision
        self.aapl = self.add_equity("AAPL", Resolution.DAILY).symbol
        self.dataset_symbol = self.add_data(BrainCompanyFilingLanguageMetrics10K , self.aapl).symbol
        
        # Historical data
        history = self.history(self.dataset_symbol, 365, Resolution.DAILY)
        self.debug(f"We got {len(history)} items from our history request for {self.dataset_symbol}")
        
        
    def on_data(self, data):
        # Trade base on the updated report sentiment
        if data.contains_key(self.dataset_symbol):
            sentiment = data[self.dataset_symbol].report_sentiment.sentiment
            # Buy for a positive sentiment score for the positive return projection
            self.set_holdings(self.symbol, int(sentiment > 0))

Example Applications

The Brain Language Metrics on Company Filings dataset enables you to test strategies using language metrics and their differences gathered from 10K and 10Q reports. Examples include the following strategies:

Using the similarity among reports to determine position sizing of securities. Some examples are discussed in Lazy Prices, Cohen et al. 2018 and The Positive Similarity of Company Filings and the Cross-section of Stock Returns, M. Padyšák 2020.
Using the sentiment of the latest report to determine the portfolio allocation to give to each security in the universe.
Using levels of uncertainty, readability, or litigious language in the report to determine position sizing of securities.

Pricing

Cloud Access

Harness Brain Company Filing NLP data in the QuantConnect Cloud for your backtesting and live trading purposes.

Curated, clean data
Natutal language processed company fillings
Updated nightly at 4am
Mapped to US Equity data with full US SIP feed

PRICE

$25/mo

Documentation

Cloud Access Universe

Harness Brain Company Filing NLP Universe data in the QuantConnect Cloud for your backtesting and live trading purposes.

Curated, clean data
Natutal language processed company fillings
Updated nightly at 4am
Mapped to US Equity data with full US SIP feed

PRICE

$25/mo

Documentation

On Premise Download

Brain NLP Filing archived in LEAN format for on premise backtesting and research. One file per ticker.

Ownership of the data for internal use
Data in LEAN format
Local compute resources

PRICE

100 QCC/file

LEAN CLI

Explore Other Datasets

Cross Asset Model

Dataset by ExtractAlpha

View Dataset

Bitfinex Crypto Price Data

Dataset by CoinAPI

View Dataset

Upcoming Dividends

Dataset by EOD Historical Data

View Dataset

Brain Language Metrics on Company Filings

Dataset by Brain

About Brain Language Metrics on Company Filings

About Brain

Add Brain Language Metrics on Company Filings

About QuantConnect

Algorithm Example

Example Applications

Pricing

Cloud Access

PRICE

$25/mo

Cloud Access Universe

PRICE

$25/mo

On Premise Download

PRICE

100 QCC/file

Explore Other Datasets

Cross Asset Model

Bitfinex Crypto Price Data

Upcoming Dividends

Pricing

What is a Dataset?

What is Quant Trading?

What is Quantconnect?

TECHNOLOGY

COMPANY