Analyzing historical stock market data is essential for backtesting trading strategies, identifying tradable edges, and conducting in-depth financial research. However, gathering comprehensive datasets, especially those that include delisted stocks (i.e., stocks that no longer trade due to acquisitions, mergers, buyouts, or bankruptcy) can be challenging. This is where Norgate Data comes in—offering a powerful solution that provides not only data for active stocks but also for delisted ones. It also includes vital metadata like index membership and sector classifications, allowing a quant researcher to build a comprehensive database on which to run their backtests.

In this guide, we’ll walk you through how to use Python with the Norgate Data API to download U.S. stock market data—both listed and delisted. We’ll start with some well-known tickers to get familiar with how Norgate Data integrates with Python, and then expand to downloading the entire U.S. stock market data.

By the end of this tutorial, you will be able to:

  1. Set up a connection to Norgate Data and retrieve stock data using Python.
  2. Download historical price data for individual tickers.
  3. Enrich the data with company names and sector information.
  4. Check index membership to see whether a stock was part of a major index like the S&P 500, Russell 3000 or other.
  5. Retrieve both active and delisted stock symbols for broader historical analysis.
  6. Download complete market data for both listed and delisted stocks.

To make the process accessible to everyone, we’ve provided a Jupyter Notebook that you can run locally. This step-by-step approach ensures that even beginners can follow along and build a complete market-wide dataset.

Setting Up Norgate Data

Before we dive into the code, there are a few steps to follow to ensure you can access the data using Python:

  1. Create a Norgate Data account – Norgate offers a 21-day free trial with access to two years’ worth of historical data. You can sign up here.
  2. Install Norgate Data Updater – Download and install the Norgate Data Updater from here. This software connects to the Norgate servers and allows you to download financial data. You must have this application running in the background to use the Python API.
Note: Currently, Norgate Data only supports Python on Windows, as stated in their documentation. Make sure you’re using a Windows environment to follow along.

Ensure Database is Downloaded and Active

Once Norgate Data Updater is installed, you must ensure that the necessary databases are downloaded and active before proceeding with the Python integration.

After logging into the Norgate Data Updater, navigate to the Database section and ensure that the following databases are marked as Active:

If any databases are missing, select the checkbox next to them and click the Download button at the bottom of the window. Make sure these databases are fully active before proceeding with the Python integration.

Disclaimer

This tutorial is for educational purposes only. We are not affiliated with or sponsored by Norgate Data.

Let’s Dive In!

With the Norgate Data Updater set up and running, you’re now ready to start retrieving U.S. stock market data using Python. We will begin by downloading data for specific tickers (like AAPL, TSLA, and GOOG) to understand how the Norgate Data API works. Once that’s clear, we’ll move on to downloading the entire U.S. stock market data, including both active and delisted stocks.

Here are the steps we will cover in this guide:

Step 1: Import Required Libraries and Set Up Logging

We will start by importing the necessary Python libraries, including norgatedata, and set up logging to track the progress of data downloads and transformations.

Step 2: Download Historical Stock Data for Known Tickers (AAPL, TSLA, GOOG)

We will download daily stock market data for a few well-known companies to get comfortable with using the Norgate Data API.

Step 3: Add Company Names to the Downloaded Data

We’ll enrich the downloaded data by adding the full company names using Norgate’s company name information.

Step 4: Add Sector Information

We will add sector classifications (e.g., Technology, Consumer Discretionary) to the data to make the dataset more insightful.

Step 5: Check Index Constituency (S&P 500, Russell 3000)

We’ll check if the stocks were part of major indices such as the S&P 500 or the Russell 3000 at any given point in time and add that information to our dataset.

Step 6: Download Active and Delisted Market Symbols

We will retrieve the list of symbols for both active and delisted U.S. equities to expand our dataset to the entire market.

Step 7: Filter for Stocks Only

After downloading all symbols, we’ll filter out non-equity instruments (like ETFs and funds) to focus on stocks only.

Step 8: Download Complete Stock Data (Listed and Delisted)

Finally, we will download the complete dataset, which includes both active and delisted stocks, for the entire U.S. stock market.

Step 9: Add Company Names to the Entire Market Data

We will enrich the entire market dataset (both active and delisted stocks) by adding the full company names using Norgate’s company name information.

Step 10: Add Sector Information to the Entire Market Data

We will add sector classifications (e.g., Technology, Consumer Discretionary) to the entire market dataset to categorize stocks into industry groups.

Step 11: Check Index Constituency for the Entire Market (S&P 500, Russell 3000)

We will check if each stock in the entire market (both active and delisted) was part of major indices such as the S&P 500 or the Russell 3000 at any given point in time, and add that information to the dataset.

Step 12: Merge Active and Delisted Stocks into One Dataset

In this step, we merge the data for all active and delisted stocks into a single dataset. This ensures that our database contains a complete record of all stocks, whether they are still trading or have been delisted. The merged dataset is then sorted by date to maintain chronological order, and finally, it is saved as a .parquet file for efficient storage and retrieval.

Steps Overview and Explanation

Let’s go through each step in more detail, explaining the code that will be used and showing how to handle the data.

Step 1: Import Required Libraries and Set Up Logging

The first step is to import the necessary libraries. We’ll need norgatedata to interface with Norgate Data, pandas for handling and manipulating the data, and logging to monitor the process.

Click to see the Python code for needed imports
Python

import norgatedata
import logging
import pandas as pd

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

Also, you have to install norgatedata and pandas if they are not already installed by using the following commands:

Click to see the required libraries
Bash

pip install norgatedata
pip install pandas

Explanation: This imports the necessary libraries. Logging will help track the progress and debug any issues during data downloading and processing.

Step 2: Download Historical Stock Data for Known Tickers (AAPL, TSLA, GOOG)

Now, let’s download the daily stock price data for Apple (AAPL), Tesla (TSLA), and Google (GOOG) to understand how the Norgate Data API works.

Click to see the Python code for downloading data
Python

def download_stock_data(symbol, start_date, end_date):
    logging.info(f"Downloading data for {symbol} from {start_date} to {end_date}...")
    priceadjust = norgatedata.StockPriceAdjustmentType.TOTALRETURN
    padding_setting = norgatedata.PaddingType.NONE
    timeseriesformat = 'pandas-dataframe'
    
    pricedata = norgatedata.price_timeseries(
        symbol,
        stock_price_adjustment_setting=priceadjust,
        padding_setting=padding_setting,
        start_date=start_date,
        end_date=end_date,
        timeseriesformat=timeseriesformat
    )
    
    return pricedata

start_date = pd.Timestamp('2023-01-01')
end_date = pd.Timestamp('2024-01-01')

aapl_data = download_stock_data('AAPL', start_date, end_date)
tsla_data = download_stock_data('TSLA', start_date, end_date)
goog_data = download_stock_data('GOOG', start_date, end_date)

aapl_data.head()

Explanation: This function downloads the historical price data for the given symbol (AAPL, TSLA, GOOG) over the specified date range. We use total return adjusted price data, which accounts for dividends and stock splits.

Output: An example of AAPL data.

Step 3: Add Company Names to the Downloaded Data

Next, we’ll add the full company name for each stock using Norgate’s security_name function.

Click to see the Python code for adding security names
Python

def add_security_name(symbol, data):
    security_name = norgatedata.security_name(symbol)
    data['Security_Name'] = security_name
    logging.info(f"Added security name for {symbol}: {security_name}")
    return data

aapl_data = add_security_name('AAPL', aapl_data)
tsla_data = add_security_name('TSLA', tsla_data)
goog_data = add_security_name('GOOG', goog_data)

aapl_data.head()

Explanation: We fetch the full company name for each ticker (AAPL, TSLA, GOOG) and add it as a new column (Security_Name) in the data.

Output: An example of AAPL data after adding Security_Name

Step 4: Add Sector Information

We can further enrich our data by adding the GICS sector classification for each stock.

Click to see the Python code for adding sector names
Python

def add_sector_info(symbol, data):
    sector = norgatedata.classification_at_level(symbol, 'GICS', 'Name', level=1)
    data['Sector'] = sector
    logging.info(f"Added sector information for {symbol}: {sector}")
    return data

aapl_data = add_sector_info('AAPL', aapl_data)
tsla_data = add_sector_info('TSLA', tsla_data)
goog_data = add_sector_info('GOOG', goog_data)

aapl_data.head()

Explanation: This step adds each stock’s sector using the GICS (Global Industry Classification Standard), which categorizes stocks by industry.

Output: An example of AAPL data after adding Sector

Step 5: Check Index Constituency (S&P 500, Russell 3000)

In this step, we check whether a stock (e.g., AAPL, TSLA, GOOG) was part of major indices such as the S&P 500 or Russell 3000 at any point in time. We use Norgate Data’s index_constituent_timeseries function to fetch the relevant data and append it to our existing DataFrame with clearly named columns.

Click to see the Python code for adding Index Constituency
Python

def check_index_constituency(symbol, data):
    indices_to_check = ['S&P 500', 'Russell 3000']
    
    for indexname in indices_to_check:
        logging.info(f"Checking if {symbol} was part of {indexname}...")

        # Fetch the index membership timeseries directly using the data DataFrame
        index_data = norgatedata.index_constituent_timeseries(
            symbol,
            indexname,
            padding_setting=norgatedata.PaddingType.NONE,
            pandas_dataframe=data.copy(),  # Pass in a copy of the existing DataFrame
            timeseriesformat='pandas-dataframe'
        )
        
        
        column_name = f'In_{indexname.replace(" ", "_")}'
        
        if column_name in data.columns:
            logging.info(f"Column {column_name} already exists. Skipping...")
            continue 

        if 'Index Constituent' in index_data.columns:
            # Rename the 'Index Constituent' column to reflect index membership
            index_data.rename(columns={'Index Constituent': column_name}, inplace=True)
            
            # Assign the renamed column back to the original data DataFrame
            data[column_name] = index_data[column_name]
        else:
            logging.warning(f"'Index Constituent' column not found for {symbol} in {indexname}")
    
    return data

aapl_data = check_index_constituency('AAPL', aapl_data)
tsla_data = check_index_constituency('TSLA', tsla_data)
goog_data = check_index_constituency('GOOG', goog_data)

aapl_data.head(5)

Explanation:

Output: An example of TSLA data after adding In_S&P_500, In_Russell_3000

Step 6: Download Active and Delisted Market Symbols

Once we understand how to retrieve stock data for a few specific tickers, we can scale up by downloading all active and delisted symbols from Norgate Data. This is crucial for retrieving comprehensive stock market data that includes not just the current active stocks, but also companies that have been delisted.

Click to see the Python code for Retrieving Symbols
Python

def get_all_market_symbols():
    logging.info("Retrieving all symbols from US Equities and US Equities Delisted...")
    
    us_equities = norgatedata.database_symbols('US Equities')
    us_delisted = norgatedata.database_symbols('US Equities Delisted')
    
    logging.info(f"Retrieved {len(us_equities)} active symbols and {len(us_delisted)} delisted symbols.")
    return us_equities, us_delisted

active_symbols, delisted_symbols = get_all_market_symbols()

print("Active symbols:", active_symbols[:5])
print("Delisted symbols:", delisted_symbols[:5])

Explanation:

Output: Number of retrieved active and delisted symbols and an example of symbols.

Step 7: Filter for Stocks Only

Now that we have a complete list of symbols, we want to filter out non-equity instruments (such as ETFs or REITs) so that we’re left with only stocks.

Click to see the Python code for filtering stocks
Python

def filter_stocks(symbols):
    stock_symbols = [symbol for symbol in symbols if norgatedata.subtype1(symbol) == 'Equity']
    logging.info(f"Filtered {len(stock_symbols)} stocks from {len(symbols)} total symbols.")
    return stock_symbols

active_stocks = filter_stocks(active_symbols)
delisted_stocks = filter_stocks(delisted_symbols)

print(f"Active stocks: {len(active_stocks)}, Delisted stocks: {len(delisted_stocks)}")

Explanation:

We use norgatedata.subtype1(symbol) to filter out anything that is not classified as an equity (e.g., ETFs, REITs). This ensures we’re only dealing with stock data.

Output: Filtering for Stocks only.

Step 8: Download Complete Stock Data (Listed and Delisted)

In this step, we move from working with individual tickers to downloading the complete U.S. stock market data, including both active and delisted stocks. This gives us a full dataset of the market, allowing for comprehensive analysis and backtesting across historical and current equities.

Click to see the Python code for downloading listed and delisted data
Python

def download_all_stock_data(symbols, start_date, end_date):

    all_data = []
    
    for symbol in symbols:
        logging.info(f"Downloading data for {symbol}...")
        data = download_stock_data(symbol, start_date, end_date)
        
        if data is not None:
            data['Symbol'] = symbol  # Add the symbol as a column
            all_data.append(data)
    
    # Combine all data into a single DataFrame (keeping Date as index by default)
    return pd.concat(all_data, ignore_index=False) if all_data else pd.DataFrame()

# Download all active and delisted stock data
logging.info("Downloading active stock data...")
all_active_data = download_all_stock_data(active_stocks, start_date, end_date)

logging.info("Downloading delisted stock data...")
all_delisted_data = download_all_stock_data(delisted_stocks, start_date, end_date)

# Display the number of active and delisted stocks
if not all_active_data.empty:
    print(f"Downloaded data for {len(all_active_data['Symbol'].unique())} active stocks.")
else:
    print("No active stock data downloaded.")

if not all_delisted_data.empty:
    print(f"Downloaded data for {len(all_delisted_data['Symbol'].unique())} delisted stocks.")
else:
    print("No delisted stock data downloaded.")

Explanation:

Output: Data fetched for both active and delisted stocks.

In case you print out all_active_data, you will see the following:

Output: Data fetched for active stocks.

Step 9: Adding Security Names to the Data

In this step, we will add the security names for each stock in the dataset

Click to see the Python code for adding security names
Python

def add_security_name(data):
    unique_symbols = data['Symbol'].unique() 
    
    security_names = {}
    
    for symbol in unique_symbols:
        logging.info(f"Fetching security name for {symbol}...")
        
        security_name = norgatedata.security_name(symbol)
        
        security_names[symbol] = security_name
    
    data['Security_Name'] = data['Symbol'].map(security_names)
    
    return data

all_active_data = add_security_name(all_active_data)
all_delisted_data = add_security_name(all_delisted_data)

all_active_data.head()

Explanation:

Output: Security_Name added to all downloaded data

Step 10: Adding Sector Information to the Data

In this step, we will add the sector classification for each stock.

Click to see the Python code for adding sector names
Python

def add_sector_info(data):
    unique_symbols = data['Symbol'].unique() 
    
    sectors = {}
    
    for symbol in unique_symbols:
        logging.info(f"Fetching sector information for {symbol}...")
        
        sector = norgatedata.classification_at_level(symbol, 'GICS', 'Name', level=1)
        
        sectors[symbol] = sector
    
    data['Sector'] = data['Symbol'].map(sectors)
    
    return data

all_active_data = add_sector_info(all_active_data)
all_delisted_data = add_sector_info(all_delisted_data)

all_active_data.head()

Explanation:

Output: Sector added to all downloaded data

Step 11: Checking Index Constituency for Active and Delisted Stocks

In this step, we will check whether each stock was part of major indices like the S&P 500 or the Russell 3000.

Click to see the Python code for adding index constituency
Python

def check_index_constituency(symbol, data):
    indices_to_check = ['S&P 500', 'Russell 3000']
    
    for indexname in indices_to_check:
        logging.info(f"Checking if {symbol} was part of {indexname}...")

        try:
            index_data = norgatedata.index_constituent_timeseries(
                symbol,
                indexname,
                pandas_dataframe=data.copy(),  # Pass in a copy of the existing DataFrame
                padding_setting=norgatedata.PaddingType.NONE,
                timeseriesformat='pandas-dataframe'
            )
        except Exception as e:
            logging.error(f"Error fetching index data for {symbol} in {indexname}: {e}")
            index_data = data.copy()
            index_data['Index Constituent'] = 0
        
        column_name = f"In_{indexname.replace(' ', '_')}"
        
        if column_name in data.columns:
            logging.info(f"Column {column_name} already exists for {symbol}. Skipping...")
            continue 

        if 'Index Constituent' in index_data.columns:
            index_data.rename(columns={'Index Constituent': column_name}, inplace=True)
            
            data[column_name] = index_data[column_name]
        else:
            logging.warning(f"'Index Constituent' column not found for {symbol} in {indexname}")
            data[column_name] = 0 
    
    return data

active_symbols = all_active_data['Symbol'].unique()
delisted_symbols = all_delisted_data['Symbol'].unique()

logging.info("Checking index constituency for active stocks...")
processed_active_data = []

for symbol in active_symbols:
    symbol_data = all_active_data[all_active_data['Symbol'] == symbol].copy()
    symbol_data = check_index_constituency(symbol, symbol_data)
    processed_active_data.append(symbol_data)

all_active_data = pd.concat(processed_active_data)

logging.info("Checking index constituency for delisted stocks...")
processed_delisted_data = []

for symbol in delisted_symbols:
    symbol_data = all_delisted_data[all_delisted_data['Symbol'] == symbol].copy()
    symbol_data = check_index_constituency(symbol, symbol_data)
    processed_delisted_data.append(symbol_data)

all_delisted_data = pd.concat(processed_delisted_data)

all_active_data.tail(5)

Explanation:

Output: The processed data now contains index membership information for both active and delisted stocks in the entire U.S. stock market.

Step 12: Merge Active and Delisted Stocks into One Dataset

In this step, we combine active and delisted stock data into one dataset and export it as a .parquet file.

Click to see the Python code for merging data
Python

merged_data = pd.concat([all_active_data, all_delisted_data])
merged_data.sort_index(inplace=True)

output_path = './Survivorship_bias_free_Database.parquet'
merged_data.to_parquet(output_path, engine='pyarrow')


logging.info(f"Merged data has been saved as {output_path}")

Explanation:

Output: The merged dataset of active and delisted stocks is saved as a .parquet file

Conclusion

Now we have successfully constructed a comprehensive, survivorship bias-free database of both listed and delisted stocks for the specified period, from start_date to end_date. This database can be used for various types of financial analysis, backtesting trading strategies, and academic research while ensuring that delisted stocks are not excluded, providing a more accurate reflection of historical market performance.

With this database, you can avoid the common pitfall of survivorship bias, which often leads to overly optimistic results in research that focuses only on active companies. Including both listed and delisted stocks allows for a more balanced, real-world view of the stock market, making your analysis much more robust.

You can download the full Jupyter Notebook for this process by clicking here and easily adapt it for your own research or projects

For further assistance, feel free to contact us.