Backtest a Profitable Trend-Following Strategy using Python

In this article, we provide a detailed walkthrough of the Python code used to reproduce the backtest of the trend-following strategy discussed in our paper, A Century of Profitable Industry Trends.

We recommend reading the paper first, which you can access here.

Using Kenneth French’s freely available database, we built an industry-based long-only trend-following portfolio. This portfolio is based on daily data from 48 industry portfolios, covering the period from 1926 to 2024. Our study compares the performance of this momentum-based strategy with a passive Buy & Hold approach over the past century.

Our findings demonstrate the superior performance of momentum-based portfolios. The strategy includes several parameters, none of which were optimized in-sample, meaning the performance statistics could be improved by fine-tuning these parameters.

To enhance the readability and make the code more intuitive for novice quant researchers, we intentionally avoided more complex coding procedures that may result in improved computational efficiency.

Step-by-Step Guide Through the Backtesting Process

Below, we give an overview of the main building blocks behind the backtesting procedure.

Step 1: Download and Process Industry Data

Step 2: Process CSV Data for Indicators and Signals

Step 3: Backtesting the Strategy

Step 4: Aggregate Portfolio Level Analysis

Step 5: Plotting the Portfolio AUM

Step 6: Statistical Analysis and Performance Metrics

Tools Needed

Python: Python is a versatile programming language widely used for data analysis and visualization. It’s free and open-source.
- Pandas: Pandas is a Python library for data manipulation and analysis, ideal for working with numerical tables and time series.
- Matplotlib: Matplotlib is a Python library for creating a wide variety of static, animated, and interactive visualizations.
- Statsmodels: Statsmodels provides tools for statistical modeling in Python, offering numerous methods for regression and time series analysis.

How to Read This Post

For each step, we will provide the full code first, then explain it step by step.

We recommend opening the following Google Colab Notebook and follow explanations in this article when needed.

Click on the Following Link:

Google Colab : Backtesting_A_Century_of_Profitable_Industry_Trends.ipynb

If you are only interested in changing the parameters, this abstract version allows you to adjust the parameters to see the results!

Google Colab : Backtesting_A_Century_of_Profitable_Industry_Trends_Easy_Parameter.ipynb

Needed Imports for All Steps:

Click to see the Python code for needed imports

Python

import os
import zipfile
import requests
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import ScalarFormatter, FuncFormatter,LogLocator
from statsmodels.api import OLS, add_constant

If you have missing libraries, just run this:

Click to see the command prompts

Python

pip install requests
pip install pandas
pip install numpy
pip install matplotlib
pip install statsmodels

Step 1: Download and Process Industry Data

Overview

This step involves downloading daily return data for 48 industry portfolios from Kenneth French‘s database, processing the data to prepare it for backtesting, and saving the necessary variables for later use. The process includes downloading, unzipping, reading CSV data, cleaning, and structuring the data.

Step 1 Code:

Click to see the Python code for Step 1

Python

# Define URLs for downloading data
INDUSTRY_PORTFOLIOS_URL = 'https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/48_Industry_Portfolios_daily_CSV.zip'
FRENCH_DATA_FACTORS_URL = 'https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_daily_CSV.zip'
TEMP_FOLDER = 'temp_folder'

def download_and_extract_zip(url, extract_to=TEMP_FOLDER):
    """
    Downloads a ZIP file from a URL and extracts its contents.
    
    Args:
    url (str): The URL to download the ZIP file from.
    extract_to (str): The folder to extract the contents to.
    
    Returns:
    str: Path to the first CSV file found in the extracted contents.
    """
    zip_file = 'temp.zip'
    os.makedirs(extract_to, exist_ok=True)

    # Download the file
    with open(zip_file, 'wb') as f:
        f.write(requests.get(url).content)
    
    # Extract the ZIP file
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall(extract_to)
    
    os.remove(zip_file)

    # Get the path to the extracted CSV file
    extracted_files = os.listdir(extract_to)
    csv_files = [f for f in extracted_files if f.lower().endswith('.csv')]
    
    if not csv_files:
        raise FileNotFoundError("No CSV file found in the extracted folder.")
    
    return os.path.join(extract_to, csv_files[0])

def cleanup_temp_files(folder=TEMP_FOLDER):
    """
    Removes temporary files and folder.
    
    Args:
    folder (str): The folder to clean up.
    """
    for f in os.listdir(folder):
        os.remove(os.path.join(folder, f))
    os.rmdir(folder)

def process_csv_file(csv_file_path):
    """
    Reads and processes the CSV file.
    
    Args:
    csv_file_path (str): Path to the CSV file.
    
    Returns:
    DataFrame: Processed data.
    DatetimeIndex: Dates.
    """
    csv_data = pd.read_csv(csv_file_path, skiprows=9, low_memory=False)
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    
    industry_names = csv_data.columns[1:]
    rows_nan = csv_data[csv_data.iloc[:, 0].isna()].index
    from_row = 0
    until_row = rows_nan[0]
    data = csv_data.iloc[from_row:until_row, :].to_numpy()
    
    ret = data[:, 1:] / 100
    caldt = pd.to_datetime(data[:, 0].astype(int).astype(str), format='%Y%m%d')
    ret[ret <= -0.99] = np.nan
    
    return pd.DataFrame(data=ret, index=caldt, columns=industry_names), caldt

def market_french_reconciled(caldt):
    """
    Reconciles market returns using Fama-French data.
    
    Args:
    caldt (DatetimeIndex): Dates.
    
    Returns:
    Series: Market returns.
    """
    csv_file_path = download_and_extract_zip(FRENCH_DATA_FACTORS_URL)
    csv_data = pd.read_csv(csv_file_path, skiprows=3, header=None, low_memory=False)
    
    # Coerce non-numeric data to NaN and then drop rows with NaN in the date column
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    csv_data = csv_data.dropna(subset=[0])  # Drop rows where the date column is NaN
    
    # Convert the first column to datetime
    caldt_mkt = pd.to_datetime(csv_data.iloc[:, 0].astype(int).astype(str), format='%Y%m%d')
    
    # Calculate market returns by adding Mkt-RF and RF and dividing by 100 to convert to decimal form
    ret_mkt = (csv_data.iloc[:, 1] + csv_data.iloc[:, 4]) / 100
    
    # Create a series filled with NaN values for the length of 'caldt'
    y = np.full(len(caldt), np.nan)
    
    # Find indices where dates match and assign values accordingly
    idx = np.where(np.in1d(caldt_mkt, caldt))[0]
    y[np.in1d(caldt, caldt_mkt)] = ret_mkt.iloc[idx]
    
    cleanup_temp_files()
    
    return pd.Series(data=y, index=caldt)

def tbill_french_reconciled(caldt):
    """
    Reconciles T-Bill returns using Fama-French data.

    Args:
    caldt (DatetimeIndex): Dates.

    Returns:
    Series: T-Bill returns.
    """
    csv_file_path = download_and_extract_zip(FRENCH_DATA_FACTORS_URL)
    csv_data = pd.read_csv(csv_file_path, skiprows=3, header=None, low_memory=False)
    
    # Coerce non-numeric data to NaN and then drop rows with NaN in the date column
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    csv_data = csv_data.dropna(subset=[0])  # Drop rows where the date column is NaN
    
    # Convert the first column to datetime
    caldt_tbill = pd.to_datetime(csv_data.iloc[:, 0].astype(int).astype(str), format='%Y%m%d')
    
    # Extract T-Bill returns and divide by 100 to convert to decimal form
    ret_tbill = csv_data.iloc[:, 4] / 100
    
    # Create a series filled with NaN values for the length of 'caldt'
    y = np.full(len(caldt), np.nan)
    
    # Find indices where dates match and assign values accordingly
    idx = np.where(np.in1d(caldt_tbill, caldt))[0]
    y[np.in1d(caldt, caldt_tbill)] = ret_tbill.iloc[idx]
    
    cleanup_temp_files()
    
    return pd.Series(data=y, index=caldt)

# Main code
csv_file_path = download_and_extract_zip(INDUSTRY_PORTFOLIOS_URL)
data, caldt = process_csv_file(csv_file_path)
cleanup_temp_files()

data['mkt_ret']   = market_french_reconciled(caldt)
data['tbill_ret'] = tbill_french_reconciled(caldt)

new_columns = [f'ret_{i+1}' for i in range(len(data.columns)-2)] + ['mkt_ret', 'tbill_ret']
data.columns = new_columns
data['caldt'] = caldt
data = data[['caldt'] + new_columns]
data.to_csv('industry48.csv', index=False)

1.1 Download and Store Data

This code downloads a ZIP file from the specified URL, which contains CSV files for 48 industry portfolios. The requests library is used to download and save the file as ‘temp.zip’. The ZIP file is then extracted into a folder named ‘temp_folder‘.

Click to see the Python code

Python

def download_and_extract_zip(url, extract_to=TEMP_FOLDER):
    zip_file = 'temp.zip'
    os.makedirs(extract_to, exist_ok=True)

    # Download the file
    with open(zip_file, 'wb') as f:
        f.write(requests.get(url).content)
    
    # Extract the ZIP file
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall(extract_to)
    
    os.remove(zip_file)

    # Get the path to the extracted CSV file
    extracted_files = os.listdir(extract_to)
    csv_files = [f for f in extracted_files if f.lower().endswith('.csv')]
    
    if not csv_files:
        raise FileNotFoundError("No CSV file found in the extracted folder.")
    
    return os.path.join(extract_to, csv_files[0])

1.2 Clean Up Temporary Files

This function removes all files and the temporary folder created during the extraction process.

Click to see the Python code

Python

def cleanup_temp_files(folder=TEMP_FOLDER):
    for f in os.listdir(folder):
        os.remove(os.path.join(folder, f))
    os.rmdir(folder)

1.3 Process CSV File

This function reads and processes the extracted CSV file. It converts all columns to numeric values and handles missing data. It extracts date and return data, then formats and returns them.

Click to see the Python code

Python

def process_csv_file(csv_file_path):
    csv_data = pd.read_csv(csv_file_path, skiprows=9, low_memory=False)
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    
    industry_names = csv_data.columns[1:]
    rows_nan = csv_data[csv_data.iloc[:, 0].isna()].index
    from_row = 0
    until_row = rows_nan[0]
    data = csv_data.iloc[from_row:until_row, :].to_numpy()
    
    ret = data[:, 1:] / 100
    caldt = pd.to_datetime(data[:, 0].astype(int).astype(str), format='%Y%m%d')
    ret[ret <= -0.99] = np.nan
    
    return pd.DataFrame(data=ret, index=caldt, columns=industry_names), caldt

1.4 Reconcile Market Returns

This function downloads and processes market returns from Fama-French data. It matches and returns the market returns for the given dates.

Click to see the Python code

Python

def market_french_reconciled(caldt):
    csv_file_path = download_and_extract_zip(FRENCH_DATA_FACTORS_URL)
    csv_data = pd.read_csv(csv_file_path, skiprows=3, header=None, low_memory=False)
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    
    rows_nan = csv_data[csv_data.iloc[:, 0].isna()].index
    from_row = rows_nan[0]
    until_row = rows_nan[1] - 1
    data = csv_data.iloc[from_row:until_row, :].dropna().to_numpy()
    
    ret_mkt = data[:, 1] / 100 + data[:, 4] / 100
    caldt_mkt = pd.to_datetime(data[:, 0].astype(int).astype(str), format='%Y%m%d')
    
    y = np.full(len(caldt), np.nan)
    idx = np.where(np.in1d(caldt_mkt, caldt))[0]
    y[np.in1d(caldt, caldt_mkt)] = ret_mkt[idx]
    
    cleanup_temp_files()
    
    return pd.Series(data=y, index=caldt)

1.5 Reconcile T-Bill Returns

This function downloads and processes T-Bill returns from Fama-French data. It matches and returns the T-Bill returns for the given dates.

Click to see the Python code

Python

def tbill_french_reconciled(caldt):
    csv_file_path = download_and_extract_zip(FRENCH_DATA_FACTORS_URL)
    csv_data = pd.read_csv(csv_file_path, skiprows=3, header=None, low_memory=False)
    csv_data = csv_data.apply(pd.to_numeric, errors='coerce')
    
    rows_nan = csv_data[csv_data.iloc[:, 0].isna()].index
    from_row = rows_nan[0]
    until_row = rows_nan[1] - 1
    data = csv_data.iloc[from_row:until_row, :].dropna().to_numpy()
    
    ret_tbill = data[:, 4] / 100
    caldt_tbill = pd.to_datetime(data[:, 0].astype(int).astype(str), format='%Y%m%d')
    
    y = np.full(len(caldt), np.nan)
    idx = np.where(np.in1d(caldt_tbill, caldt))[0]
    y[np.in1d(caldt, caldt_tbill)] = ret_tbill[idx]
    
    cleanup_temp_files()
    
    return pd.Series(data=y, index=caldt)

1.6 Main Code

This part of the code coordinates the entire process:
- Downloads and processes industry portfolio data.
- Reconciles market and T-Bill returns.
- Renames columns and saves the final DataFrame to a CSV file.

Click to see the Python code

Python

# Main code
csv_file_path = download_and_extract_zip(INDUSTRY_PORTFOLIOS_URL)
data, caldt = process_csv_file(csv_file_path)
cleanup_temp_files()

data['mkt_ret'] = market_french_reconciled(caldt)
data['tbill_ret'] = tbill_french_reconciled(caldt)

new_columns = [f'ret_{i+1}' for i in range(len(data.columns)-2)] + ['mkt_ret', 'tbill_ret']
data.columns = new_columns
data['caldt'] = caldt
data = data[['caldt'] + new_columns]
data.to_csv('industry48.csv', index=False)

Step 2: Process CSV Data for Indicators and Signals

Overview

In this step, we load the previously saved CSV data containing daily returns for industry portfolios, convert dates to datetime format, and set them as the index. We then calculate various technical indicators including volatility, exponential moving averages (EMA), Donchian channels, and Keltner bands. These indicators are stored in a single DataFrame for further analysis.

Step 2 Code:

Click to see the Python code

Python

# Load the CSV data
data = pd.read_csv("industry48.csv")

# Convert `caldt` to datetime and set as index
data['caldt'] = pd.to_datetime(data['caldt'])
data.set_index('caldt', inplace=True)

# Determine the number of portfolios dynamically
num_portfolios = data.shape[1] - 2  # Subtracting 2 to account for 'mkt_ret' and 'tbill_ret'

# Indicator parameters
UP_DAY = 20
DOWN_DAY = 40
ADR_VOL_ADJ = 1.4  # ATR is usually 1.4x Vol(close2close)
KELT_MULT = 2 * ADR_VOL_ADJ

# Fill NaN values with 0 and calculate cumulative product
price = (1 + data.iloc[:, :num_portfolios].fillna(0)).cumprod()

# Define rolling functions
def rolling_vol(df, window):
    return df.rolling(window=window).std(ddof=0)

def rolling_ema(df, window):
    return df.ewm(span=window, adjust=False).mean()

def rolling_max(df, window):
    return df.rolling(window=window).max()

def rolling_min(df, window):
    return df.rolling(window=window).min()

def rolling_mean(df, window):
    return df.rolling(window=window, min_periods=window-1).mean()

# Calculate rolling volatility of daily returns
vol = rolling_vol(data.iloc[:, :num_portfolios], UP_DAY)

# Technical indicators
ema_down = rolling_ema(price, DOWN_DAY)
ema_up = rolling_ema(price, UP_DAY)

# Donchian channels
donc_up = rolling_max(price, UP_DAY)
donc_down = rolling_min(price, DOWN_DAY)

# Keltner bands
price_change = price.diff(periods=1).abs()
kelt_up = ema_up + KELT_MULT * rolling_mean(price_change, UP_DAY)
kelt_down = ema_down - KELT_MULT * rolling_mean(price_change, DOWN_DAY)

# Model bands
long_band = pd.DataFrame(np.minimum(donc_up.values, kelt_up.values), index=donc_up.index, columns=donc_up.columns)
short_band = pd.DataFrame(np.maximum(donc_down.values, kelt_down.values), index=donc_down.index, columns=donc_down.columns)

# Model long signal
long_band_shifted = long_band.shift(1)
short_band_shifted = short_band.shift(1)
long_signal = (price >= long_band_shifted) & (long_band_shifted > short_band_shifted)

# Create a dictionary of DataFrames for indicators
indicator_dfs = {
    f'ret_{i+1}': data.iloc[:, i] for i in range(num_portfolios)
}
indicator_dfs.update({
    f'price_{i+1}': price.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'vol_{i+1}': vol.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'ema_down_{i+1}': ema_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'ema_up_{i+1}': ema_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'donc_up_{i+1}': donc_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'donc_down_{i+1}': donc_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'kelt_up_{i+1}': kelt_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'kelt_down_{i+1}': kelt_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'long_band_{i+1}': long_band.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'short_band_{i+1}': short_band.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'long_signal_{i+1}': long_signal.iloc[:, i] for i in range(num_portfolios)
})

# Concatenate all indicator columns into a single DataFrame
indicators_df = pd.concat(indicator_dfs.values(), axis=1)
indicators_df.columns = indicator_dfs.keys()

# Add market and tbill returns
indicators_df['mkt_ret'] = data['mkt_ret']
indicators_df['tbill_ret'] = data['tbill_ret']

2.1 Load the CSV Data

This code loads the previously saved CSV file industry48.csv into a pandas DataFrame.

Click to see the Python code

Python

# Load the CSV data
data = pd.read_csv("industry48.csv")

2.2 Convert `caldt` to Datetime and Set as Index

Convert the caldt column to datetime format and set it as the index of the DataFrame.

Click to see the Python code

Python

# Convert `caldt` to datetime and set as index
data['caldt'] = pd.to_datetime(data['caldt'])
data.set_index('caldt', inplace=True)

2.3 Determine the Number of Portfolios Dynamically

Calculate the number of portfolios based on the shape of the DataFrame.

Click to see the Python code

Python

# Determine the number of portfolios dynamically
num_portfolios = data.shape[1] - 2  # Subtracting 2 to account for 'mkt_ret' and 'tbill_ret'

2.4 Indicator Parameters

Parameters for the technical indicators are defined here. UP_DAY and DOWN_DAY specify the periods for calculating moving averages and other metrics. The Keltner Channel multiplier (KELT_MULT) is multiplied by an adjustment factor (ADR_VOL_ADJ) since the database we are using does not include high and low prices (more information can be found on page 7 of our paper)

Click to see the Python code

Python

# Indicator parameters
UP_DAY = 20
DOWN_DAY = 40
ADR_VOL_ADJ = 1.4  # ATR is usually 1.4x Vol(close2close)
KELT_MULT = 2 * ADR_VOL_ADJ

2.5 Calculate Cumulative Product of Prices

Fill NaN values with 0 and calculate the cumulative product of the returns to get the price series.

Click to see the Python code

Python

# Fill NaN values with 0 and calculate cumulative product
price = (1 + data.iloc[:, :num_portfolios].fillna(0)).cumprod()

2.6 Define Rolling Functions

Define functions to calculate rolling volatility, EMA, max, min, and mean.

Click to see the Python code

Python

# Define rolling functions
def rolling_vol(df, window):
    return df.rolling(window=window).std(ddof=0)

def rolling_ema(df, window):
    return df.ewm(span=window, adjust=False).mean()

def rolling_max(df, window):
    return df.rolling(window=window).max()

def rolling_min(df, window):
    return df.rolling(window=window).min()

def rolling_mean(df, window):
    return df.rolling(window=window, min_periods=window-1).mean()

2.7 Calculate Rolling Volatility

Calculate the rolling volatility of daily returns using the defined rolling_vol function.

Click to see the Python code

Python

# Calculate rolling volatility of daily returns
vol = rolling_vol(data.iloc[:, :num_portfolios], UP_DAY)

2.8 Calculate Technical Indicators

Calculate the exponential moving averages (EMA), Donchian channels, and Keltner bands.

Click to see the Python code

Python

# Technical indicators
ema_down = rolling_ema(price, DOWN_DAY)
ema_up = rolling_ema(price, UP_DAY)

# Donchian channels
donc_up = rolling_max(price, UP_DAY)
donc_down = rolling_min(price, DOWN_DAY)

# Keltner bands
price_change = price.diff(periods=1).abs()
kelt_up = ema_up + KELT_MULT * rolling_mean(price_change, UP_DAY)
kelt_down = ema_down - KELT_MULT * rolling_mean(price_change, DOWN_DAY)

2.9 Calculate Model Bands and Signals

Calculate the model bands (long and short) and the long signal.

Click to see the Python code

Python

# Model bands
long_band = pd.DataFrame(np.minimum(donc_up.values, kelt_up.values), index=donc_up.index, columns=donc_up.columns)
short_band = pd.DataFrame(np.maximum(donc_down.values, kelt_down.values), index=donc_down.index, columns=donc_down.columns)

# Model long signal
long_band_shifted = long_band.shift(1)
short_band_shifted = short_band.shift(1)
long_signal = (price >= long_band_shifted) & (long_band_shifted > short_band_shifted)

2.10 Create Dictionary of DataFrames for Indicators

Create a dictionary of DataFrames to store the indicators.

Click to see the Python code

Python

# Create a dictionary of DataFrames for indicators
indicator_dfs = {
    f'ret_{i+1}': data.iloc[:, i] for i in range(num_portfolios)
}
indicator_dfs.update({
    f'price_{i+1}': price.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'vol_{i+1}': vol.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'ema_down_{i+1}': ema_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'ema_up_{i+1}': ema_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'donc_up_{i+1}': donc_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'donc_down_{i+1}': donc_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'kelt_up_{i+1}': kelt_up.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'kelt_down_{i+1}': kelt_down.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'long_band_{i+1}': long_band.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'short_band_{i+1}': short_band.iloc[:, i] for i in range(num_portfolios)
})
indicator_dfs.update({
    f'long_signal_{i+1}': long_signal.iloc[:, i] for i in range(num_portfolios)
})

# Concatenate all indicator columns into a single DataFrame
indicators_df = pd.concat(indicator_dfs.values(), axis=1)
indicators_df.columns = indicator_dfs.keys()

# Add market and tbill returns
indicators_df['mkt_ret'] = data['mkt_ret']
indicators_df['tbill_ret'] = data['tbill_ret']

Step 3: Backtesting the Strategy

Overview

This step involves backtesting the trading strategy using the previously calculated indicators. The strategy allocates assets dynamically based on signal indicators, manages exposure, and applies trailing stops to manage risk. The goal is to simulate trading over the historical data period and observe the performance of the strategy.

Step 3 Code:

Click to see the Python code

Python

# Initial settings
AUM_0 = 1
invest_cash = "YES"
target_vol = 0.02
max_leverage = 2
max_not_trade = 0.20

N_ind = num_portfolios  # Number of industries in the database
T = len(indicators_df['price_1'])  # Length of the time series

# Pre-allocate arrays with more specific initial values
exposure = np.zeros((T, N_ind))
ind_weight = np.zeros((T, N_ind))
trail_stop_long = np.full((T, N_ind), np.nan)

# Vectorized indicator data
rets = indicators_df[[f'ret_{j+1}' for j in range(N_ind)]].values
long_signals = indicators_df[[f'long_signal_{j+1}' for j in range(N_ind)]].values
long_bands = indicators_df[[f'long_band_{j+1}' for j in range(N_ind)]].values
short_bands = indicators_df[[f'short_band_{j+1}' for j in range(N_ind)]].values
prices = indicators_df[[f'price_{j+1}' for j in range(N_ind)]].values
vols = indicators_df[[f'vol_{j+1}' for j in range(N_ind)]].values

for t in range(1, T):
    valid_entries = ~np.isnan(rets[t]) & ~np.isnan(long_bands[t])
    
    prev_exposure = exposure[t - 1]
    current_exposure = exposure[t]
    current_trail_stop = trail_stop_long[t]
    current_long_signals = long_signals[t]
    current_short_bands = short_bands[t]
    current_prices = prices[t]
    current_vols = vols[t]

    new_long_condition = (prev_exposure <= 0) & (current_long_signals == 1)
    confirm_long_condition = (prev_exposure == 1) & (current_prices > np.maximum(trail_stop_long[t - 1], current_short_bands))
    exit_long_condition = (prev_exposure == 1) & (current_prices <= np.maximum(trail_stop_long[t - 1], current_short_bands))

    # Process new long positions
    new_longs = valid_entries & new_long_condition
    current_exposure[new_longs] = 1
    current_trail_stop[new_longs] = current_short_bands[new_longs]

    # Process confirmed long positions
    confirm_longs = valid_entries & confirm_long_condition
    current_exposure[confirm_longs] = 1
    current_trail_stop[confirm_longs] = np.maximum(trail_stop_long[t - 1, confirm_longs], current_short_bands[confirm_longs])

    # Process exit long positions
    exit_longs = valid_entries & exit_long_condition
    current_exposure[exit_longs] = 0
    ind_weight[t, exit_longs] = 0

    # Update leverage and weights for active long positions
    active_longs = current_exposure == 1
    lev_vol = np.divide(target_vol, current_vols, out=np.zeros_like(current_vols), where=current_vols != 0)
    ind_weight[t, active_longs] = lev_vol[active_longs]

# Update the indicators dataframe
# Collect new columns in a dictionary
new_columns = {}
for j in range(N_ind):
    new_columns[f'exposure_{j+1}'] = exposure[:, j]
    new_columns[f'ind_weight_{j+1}'] = ind_weight[:, j]
    new_columns[f'trail_stop_long_{j+1}'] = trail_stop_long[:, j]

# Convert the dictionary to a DataFrame and concatenate with the original DataFrame
new_columns_df = pd.DataFrame(new_columns, index=indicators_df.index)
indicators_df = pd.concat([indicators_df, new_columns_df], axis=1)

3.1 Initial Settings

Set the initial parameters for the backtest, including initial assets under management (AUM), target volatility, and maximum leverage.

Click to see the Python code

Python

# Initial settings
AUM_0 = 1
invest_cash = "YES"
target_vol = 0.02
max_leverage = 2
max_not_trade = 0.20

N_ind = num_portfolios  # Number of industries in the database
T = len(indicators_df['price_1'])  # Length of the time series

3.2 Pre-allocate Arrays

Pre-allocate arrays for exposure, individual weights, and trailing stops with initial values.

Click to see the Python code

Python

# Pre-allocate arrays with more specific initial values
exposure = np.zeros((T, N_ind))
ind_weight = np.zeros((T, N_ind))
trail_stop_long = np.full((T, N_ind), np.nan)

3.3 Vectorized Indicator Data

Extract the necessary indicator data from the DataFrame into numpy arrays for vectorized operations.

Click to see the Python code

Python

# Vectorized indicator data
rets = indicators_df[[f'ret_{j+1}' for j in range(N_ind)]].values
long_signals = indicators_df[[f'long_signal_{j+1}' for j in range(N_ind)]].values
long_bands = indicators_df[[f'long_band_{j+1}' for j in range(N_ind)]].values
short_bands = indicators_df[[f'short_band_{j+1}' for j in range(N_ind)]].values
prices = indicators_df[[f'price_{j+1}' for j in range(N_ind)]].values
vols = indicators_df[[f'vol_{j+1}' for j in range(N_ind)]].values

3.4 Backtest Loop

In this loop, we iterate through each time step (each day in the dataset) and apply the trading logic. Here’s a detailed breakdown of the logic applied at each step:
- Identify Valid Entries:
  - Determine which entries (industries) have valid data for returns and long bands on the current day.
- Retrieve Previous and Current Data:
  - Get the previous day’s exposure and trailing stop.
  - Initialize the current day’s exposure and trailing stop to be updated within the loop.
- Define Trading Conditions:
  - New Long Condition: Indicates new long positions where no previous exposure existed and a long signal is present.
  - Confirm Long Condition: Validates long positions where there was exposure, and the current price is above the trailing stop or short band.
  - Exit Long Condition: Indicates conditions to exit long positions where the current price falls below the trailing stop or short band.
- Process Trading Conditions:
  - New Long Positions: Update exposure and trailing stops for new long entries.
  - Confirmed Long Positions: Maintain exposure and update trailing stops for existing long entries.
  - Exit Long Positions: Reset exposure and weights for exited long positions.
- Update Leverage and Weights:
  - Calculate leverage based on target volatility and current volatility for active long positions.
  - Assign weights accordingly.

Click to see the Python code

Python

for t in range(1, T):
    valid_entries = ~np.isnan(rets[t]) & ~np.isnan(long_bands[t])
    
    prev_exposure = exposure[t - 1]
    current_exposure = exposure[t]
    current_trail_stop = trail_stop_long[t]
    current_long_signals = long_signals[t]
    current_short_bands = short_bands[t]
    current_prices = prices[t]
    current_vols = vols[t]

    new_long_condition = (prev_exposure <= 0) & (current_long_signals == 1)
    confirm_long_condition = (prev_exposure == 1) & (current_prices > np.maximum(trail_stop_long[t - 1], current_short_bands))
    exit_long_condition = (prev_exposure == 1) & (current_prices <= np.maximum(trail_stop_long[t - 1], current_short_bands))

    # Process new long positions
    new_longs = valid_entries & new_long_condition
    current_exposure[new_longs] = 1
    current_trail_stop[new_longs] = current_short_bands[new_longs]

    # Process confirmed long positions
    confirm_longs = valid_entries & confirm_long_condition
    current_exposure[confirm_longs] = 1
    current_trail_stop[confirm_longs] = np.maximum(trail_stop_long[t - 1, confirm_longs], current_short_bands[confirm_longs])

    # Process exit long positions
    exit_longs = valid_entries & exit_long_condition
    current_exposure[exit_longs] = 0
    ind_weight[t, exit_longs] = 0

    # Update leverage and weights for active long positions
    active_longs = current_exposure == 1
    lev_vol = np.divide(target_vol, current_vols, out=np.zeros_like(current_vols), where=current_vols != 0)
    ind_weight[t, active_longs] = lev_vol[active_longs]

Detailed Explanation of Backtest Loop:

Identify Valid Entries:
- The variable valid_entries checks which entries have non-NaN values for returns and long bands for the current time step t.
Retrieve Previous and Current Data:
- prev_exposure: Previous day’s exposure to check if there were any positions.
- current_exposure: Initialized for the current day to be updated based on trading logic.
- current_trail_stop: Initialized for the current day to be updated based on price movements.
- current_long_signals: Long signals for the current day.
- current_short_bands: Short bands for the current day.
- current_prices: Prices for the current day.
- current_vols: Volatility values up to the current day.
Define Trading Conditions:
- new_long_condition: Checks if there was no previous exposure (prev_exposure <= 0) and if the current long signal is present (current_long_signals == 1).
- confirm_long_condition: Ensures continued exposure if the price is above the trailing stop or short band.
- exit_long_condition: Exits the long position if the price falls below the trailing stop or short band.
Process Trading Conditions:
- New Long Positions:
  - new_longs: Entries that meet the new long condition.
  - Update current_exposure to 1 (indicating a long position) for new long entries.
  - Set current_trail_stop to the corresponding short band value.
- Confirmed Long Positions:
  - confirm_longs: Entries that meet the confirm long condition.
  - Maintain current_exposure to 1.
  - Update current_trail_stop to the maximum of the previous trailing stop or the current short band.
- Exit Long Positions:
  - exit_longs: Entries that meet the exit long condition.
  - Reset current_exposure to 0 (indicating exit from the position).
  - Reset ind_weight to 0 for these positions.
Update Leverage and Weights:
- active_longs: Checks which positions are currently active.
- lev_vol: Calculates leverage based on target volatility and current volatility.
- Update ind_weight for active long positions with the calculated leverage.

3.5 Update the Indicators DataFrame

After processing the backtest loop, we need to update the indicators_df DataFrame to include the new columns for exposure, individual weights, and trailing stops.

Collect New Columns:
- Create a dictionary to hold the new columns for each industry.
Convert and Concatenate:
- Convert the dictionary to a DataFrame.
- Concatenate the new DataFrame with the original indicators_df.

Click to see the Python code

Python

# Update the indicators dataframe
# Collect new columns in a dictionary
new_columns = {}
for j in range(N_ind):
    new_columns[f'exposure_{j+1}'] = exposure[:, j]
    new_columns[f'ind_weight_{j+1}'] = ind_weight[:, j]
    new_columns[f'trail_stop_long_{j+1}'] = trail_stop_long[:, j]

# Convert the dictionary to a DataFrame and concatenate with the original DataFrame
new_columns_df = pd.DataFrame(new_columns, index=indicators_df.index)
indicators_df = pd.concat([indicators_df, new_columns_df], axis=1)

Step 4: Aggregate Portfolio Level Analysis

Overview

This step involves aggregating the individual industry-level results to the portfolio level. It calculates the overall portfolio exposure, weights, returns, and cumulative assets under management (AUM). The goal is to obtain the daily returns of the aggregate trend-following portfolio.

Step 4 Code:

Click to see the Python code

Python

# Initialize a DataFrame to store the results at the aggregate portfolio level
port = pd.DataFrame(index=indicators_df.index)
port['caldt'] = indicators_df.index
port['available'] = indicators_df.filter(like='ret_').notna().sum(axis=1)  # How many industries were available each day

ind_weight_df = indicators_df.filter(like='ind_weight_')
port_weights = ind_weight_df.div(port['available'], axis=0)

# Limit the exposure of each industry at "max_not_trade"
port_weights = port_weights.clip(upper=max_not_trade)

port['sum_exposure'] = port_weights.sum(axis=1)
idx_above_max_lev = port[port['sum_exposure'] > max_leverage].index

port_weights.loc[idx_above_max_lev] = port_weights.loc[idx_above_max_lev].div(
    port['sum_exposure'][idx_above_max_lev], axis=0
).mul(max_leverage)

port['sum_exposure'] = port_weights.sum(axis=1)

for i in range(N_ind):
    port[f'weight_{i+1}'] = port_weights.iloc[:, i]

ret_long_components = [
    port[f'weight_{i+1}'].shift(1).fillna(0) * indicators_df[f'ret_{i+1}'].fillna(0) for i in range(N_ind)
]
port['ret_long'] = sum(ret_long_components)

port['ret_tbill'] = (1 - port[[f'weight_{i+1}' for i in range(N_ind)]].shift(1).sum(axis=1)) * indicators_df['tbill_ret']

if invest_cash == "YES":
    port['ret_long'] += port['ret_tbill']

port['AUM'] = AUM_0 * (1 + port['ret_long']).cumprod()
port['AUM_SPX'] = AUM_0 * (1 + indicators_df['mkt_ret']).cumprod()

4.1 Initialize Portfolio DataFrame

Create a DataFrame port to store the aggregate portfolio-level results.

Click to see the Python code

Python

# Initialize a DataFrame to store the results at the aggregate portfolio level
port = pd.DataFrame(index=indicators_df.index)
port['caldt'] = indicators_df.index
port['available'] = indicators_df.filter(like='ret_').notna().sum(axis=1)  # How many industries were available each day

4.2 Calculate Portfolio Weights

Calculate the weights of each industry in the portfolio, adjusting for the number of available industries and limiting the exposure.

Click to see the Python code

Python

ind_weight_df = indicators_df.filter(like='ind_weight_')
port_weights = ind_weight_df.div(port['available'], axis=0)

# Limit the exposure of each industry at "max_not_trade"
port_weights = port_weights.clip(upper=max_not_trade)

port['sum_exposure'] = port_weights.sum(axis=1)
idx_above_max_lev = port[port['sum_exposure'] > max_leverage].index

port_weights.loc[idx_above_max_lev] = port_weights.loc[idx_above_max_lev].div(
    port['sum_exposure'][idx_above_max_lev], axis=0
).mul(max_leverage)

port['sum_exposure'] = port_weights.sum(axis=1)

Detailed Explanation of Portfolio Weights Calculation:

Initialize Weights:
- ind_weight_df: Extracts individual weights from indicators_df.
- port_weights: Divides individual weights by the number of available industries to get the initial weights.
Limit Exposure:
- Use clip to limit each industry’s weight to max_not_trade.
Sum Exposure:
- Calculate the total exposure for each day in port['sum_exposure'].
Adjust Weights for Leverage:
- Identify days where the total exposure exceeds the maximum leverage (max_leverage).
- Adjust the weights on those days to ensure the total exposure does not exceed max_leverage.

4.3 Assign Weights to Portfolio DataFrame

Assign the calculated weights to the port DataFrame.

Click to see the Python code

Python

for i in range(N_ind):
    port[f'weight_{i+1}'] = port_weights.iloc[:, i]

4.4 Calculate Portfolio Returns

Calculate the returns of the portfolio based on the calculated weights and individual industry returns.

Click to see the Python code

Python

ret_long_components = [
    port[f'weight_{i+1}'].shift(1).fillna(0) * indicators_df[f'ret_{i+1}'].fillna(0) for i in range(N_ind)
]
port['ret_long'] = sum(ret_long_components)

port['ret_tbill'] = (1 - port[[f'weight_{i+1}' for i in range(N_ind)]].shift(1).sum(axis=1)) * indicators_df['tbill_ret']

if invest_cash == "YES":
    port['ret_long'] += port['ret_tbill']

Detailed Explanation of Portfolio Returns Calculation:

Calculate Long Returns:
- ret_long_components: Calculate the weighted return for each industry by multiplying the lagged weights (shift(1)) with the industry returns.
- port['ret_long']: Sum the weighted returns for all industries to get the portfolio’s long returns.
Calculate T-Bill Returns:
- port['ret_tbill']: Calculate the returns from T-Bills for the cash portion of the portfolio. This is based on the proportion of the portfolio not invested in industries.
Include T-Bill Returns if Cash is Invested:
- If invest_cash is set to “YES”, add the T-Bill returns to the portfolio’s returns. When the strategy borrows capital to leverage the overall exposure, we also account for the cost of borrowing.

4.5 Calculate Assets Under Management (AUM)

Calculate the cumulative AUM of the portfolio and compare it with the S&P 500 index.

Click to see the Python code

Python

port['AUM'] = AUM_0 * (1 + port['ret_long']).cumprod()
port['AUM_SPX'] = AUM_0 * (1 + indicators_df['mkt_ret']).cumprod()

Step 5: Plotting the Portfolio AUM

Overview

This step involves plotting the cumulative assets under management (AUM) of the portfolio and comparing it with the AUM of the S&P 500 index. To better visualize changes over a wide range of values, we will use a logarithmic scale for the y-axis. The plot will include grid lines, formatted axis labels, and a legend for clarity.

Step 5 Code:

Click to see the Python code

Python

# Ensure 'caldt' is in datetime format
port['caldt'] = pd.to_datetime(port['caldt'])

# Set the frequency for plotting
freq = 21

# Create the plot
fig, ax = plt.subplots(figsize=(10, 6))

# Plot the AUM
ax.plot(
    pd.concat([port['caldt'].iloc[::freq], port['caldt'].iloc[[-1]]]), 
    pd.concat([port['AUM'].iloc[::freq], port['AUM'].iloc[[-1]]]), 
    linewidth=2, color='k', label='Timing Ind.'
)

# Plot the AUM_SPX
ax.plot(
    pd.concat([port['caldt'].iloc[::freq], port['caldt'].iloc[[-1]]]), 
    pd.concat([port['AUM_SPX'].iloc[::freq], port['AUM_SPX'].iloc[[-1]]]), 
    linewidth=1, color='r', label='Market'
)

# Set y-axis to logarithmic scale
ax.set_yscale('log')

# Set y-axis ticks to double each time
ax.yaxis.set_major_locator(LogLocator(base=10.0, numticks=10))
ax.yaxis.set_minor_locator(LogLocator(base=10.0, subs=[2, 3, 4, 5], numticks=10))
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: "${:,.0f}".format(x)))

# Ensure the y-axis labels are visible
ax.tick_params(axis='y', which='major', labelsize=10)
ax.tick_params(axis='y', which='minor', labelsize=8)

# Set x-axis date format
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
ax.xaxis.set_major_locator(mdates.YearLocator(10))  # Show x-ticks every 10 years

# Add grid
ax.grid(True, which='both', linestyle='--', linewidth=0.5)

# Add legend
ax.legend(loc='upper left', fontsize=10)

# Set title
ax.set_title(f'Timing 48 Industries Strategy', fontweight='bold', fontsize=14)

# Set x and y limits
ax.set_xlim([port['caldt'].iloc[0], port['caldt'].iloc[-1]])

# Show the plot
plt.tight_layout()
plt.show()

Plot:

Step 6: Statistical Analysis and Performance Metrics

Overview

This step involves calculating various statistical and performance metrics for the portfolio and market. These metrics include the hit ratio, skewness, Sortino ratio, maximum drawdown, and regression analysis to compute alpha and beta. The results are summarized in a table for easy comparison.

Step 6 Code:

Click to see the Python code

Python

def compute_hit_ratio(returns, timeframe='daily'):
    if timeframe not in ['daily', 'monthly', 'yearly']:
        raise ValueError('Invalid timeframe. Choose "daily", "monthly", or "yearly".')

    if timeframe != 'daily':
        resample_rule = {'monthly': 'MS', 'yearly': 'YS'}[timeframe]
        returns = returns.resample(resample_rule).apply(lambda x: np.prod(1 + x) - 1)
    
    return round((returns > 0).mean() * 100, 0)

def compute_skewness(returns, timeframe='daily'):
    if timeframe not in ['daily', 'monthly', 'yearly']:
        raise ValueError('Invalid timeframe. Choose "daily", "monthly", or "yearly".')

    if timeframe != 'daily':
        resample_rule = {'monthly': 'MS', 'yearly': 'YS'}[timeframe]
        returns = returns.resample(resample_rule).apply(lambda x: np.prod(1 + x) - 1)
    
    return round(returns.skew(), 2)

def sortino_ratio(returns):
    downside_risk = returns[returns < 0].std()
    return returns.mean() / downside_risk

def max_drawdown(aum):
    cumulative = np.log1p(aum).cumsum()
    max_cumulative = cumulative.cummax()
    drawdown = np.expm1(cumulative - max_cumulative)
    return drawdown.min()

def table_of_stats(aum, n_assets, names, caldt, mkt_ret, tbill_ret):
    table_statistics = []

    for i in range(aum.shape[1]):
        ret = aum.iloc[:, i].pct_change(fill_method=None).fillna(0)  # Use fillna(0) to handle initial NA values
        strategy_stats = {
            'strategy': names[i],
            'assets': n_assets[i],
            'irr': round((np.prod(1 + ret) ** (252 / len(ret)) - 1) * 100, 1),
            'vol': round(ret.std() * np.sqrt(252) * 100, 1),
            'sr': round(ret.mean() / ret.std() * np.sqrt(252), 2),
            'sortino': round(sortino_ratio(ret) * np.sqrt(252), 2),
            'ir_d': compute_hit_ratio(ret, 'daily'),
            'ir_m': compute_hit_ratio(ret, 'monthly'),
            'ir_y': compute_hit_ratio(ret, 'yearly'),
            'skew_d': round(ret.skew(), 2),
            'skew_m': compute_skewness(ret, 'monthly'),
            'skew_y': compute_skewness(ret, 'yearly'),
            'mdd': round(max_drawdown(aum.iloc[:, i]) * 100, 0),
        }

        # Regression using statsmodels
        X = (mkt_ret - tbill_ret).values
        Y = (ret - tbill_ret).values
        valid_idx = ~np.isnan(Y)
        X = add_constant(X[valid_idx])
        Y = Y[valid_idx]
        model = OLS(Y, X).fit()
        intercept, coef = model.params

        strategy_stats['alpha'] = round(intercept * 100 * 252, 2)
        strategy_stats['beta'] = round(coef, 2)
        strategy_stats['worst_ret'] = round(ret.min() * 100, 2)
        strategy_stats['worst_day'] = caldt[ret.idxmin()].strftime('%Y-%m-%d')
        strategy_stats['best_ret'] = round(ret.max() * 100, 2)
        strategy_stats['best_day'] = caldt[ret.idxmax()].strftime('%Y-%m-%d')

        table_statistics.append(strategy_stats)

    return pd.DataFrame(table_statistics)

# Calculate statistics for the portfolio and market
table_statistics = table_of_stats(
    port[['AUM', 'AUM_SPX']], 
    [len(port.columns) - 2, 1], 
    ["Timing Ind.", "Market"], 
    port['caldt'], 
    indicators_df['mkt_ret'], 
    indicators_df['tbill_ret']
)


table_statistics.set_index('strategy', inplace=True)
table_statistics = table_statistics.transpose()
print(table_statistics)

Results:

Final Words

If you have questions or Ideas feel free to contact us on twitter/X or contact us directly here

Teaming up with well-known researcher and industry expert @GaryAntonacci , we are proud to announce the publication of our new research piece

A Century of Profitable Industry Trends

This study delves into the profitability of a simple, low-frequency trend-following model… pic.twitter.com/1j7etbXdHq
— Concretum Research (@ConcretumR) June 7, 2024

Backtest a Profitable Trend-Following Strategy using Python

Step-by-Step Guide Through the Backtesting Process

Tools Needed

How to Read This Post

Google Colab : Backtesting_A_Century_of_Profitable_Industry_Trends.ipynb

Needed Imports for All Steps:

Step 1: Download and Process Industry Data

Overview

Step 1 Code:

1.1 Download and Store Data

1.2 Clean Up Temporary Files

1.3 Process CSV File

1.4 Reconcile Market Returns

1.5 Reconcile T-Bill Returns

1.6 Main Code

Step 2: Process CSV Data for Indicators and Signals

Overview

Step 2 Code:

2.1 Load the CSV Data

2.2 Convert caldt to Datetime and Set as Index

2.3 Determine the Number of Portfolios Dynamically

2.4 Indicator Parameters

2.5 Calculate Cumulative Product of Prices

2.6 Define Rolling Functions

2.7 Calculate Rolling Volatility

2.8 Calculate Technical Indicators

2.9 Calculate Model Bands and Signals

2.10 Create Dictionary of DataFrames for Indicators

Step 3: Backtesting the Strategy

Overview

Step 3 Code:

3.1 Initial Settings

3.2 Pre-allocate Arrays

3.3 Vectorized Indicator Data

3.4 Backtest Loop

Detailed Explanation of Backtest Loop:

3.5 Update the Indicators DataFrame

Step 4: Aggregate Portfolio Level Analysis

Overview

Step 4 Code:

4.1 Initialize Portfolio DataFrame

4.2 Calculate Portfolio Weights

Detailed Explanation of Portfolio Weights Calculation:

4.3 Assign Weights to Portfolio DataFrame

4.4 Calculate Portfolio Returns

Detailed Explanation of Portfolio Returns Calculation:

4.5 Calculate Assets Under Management (AUM)

Step 5: Plotting the Portfolio AUM

Overview

Step 5 Code:

Plot:

Step 6: Statistical Analysis and Performance Metrics

Overview

Step 6 Code:

Results:

Final Words

2.2 Convert `caldt` to Datetime and Set as Index