FinFeedAPI Blog - Tutorial: Extracting Key Information from SEC Filings with FinFeedAPI's /v1/extractor Endpoint

SEC filings are packed with details about public companies, but finding specific pieces of information can sometimes feel like searching for a needle in a haystack. The FinFeedAPI's /v1/extractor endpoint offers a way to get structured, itemized content directly from these filings, making your analysis more efficient.

This tutorial will guide you through using this endpoint, particularly useful for forms like 8-K (which report significant corporate events) and 10-K (annual reports) where information is organized into specific items.

What You'll Learn:

How to make requests to the /v1/extractor endpoint using Python.
The structure of the JSON data returned by the API.
How to access the content of specific items within a filing (e.g., Item 5.02 from an 8-K or Item 1A from a 10-K).

Prerequisites:

Python 3.x installed.
The requests library is generally useful, but this tutorial will use the api-bricks-sec-api-rest Python library provided for the FinFeedAPI.
Your FinFeedAPI key.

Let's begin.

Step 1: Setup Your Python Environment

First, ensure you have the api-bricks-sec-api-rest library installed. If not, you can install it using pip:

1# Step 1: Setup Your Python Environment (Installation)
2pip install api-bricks-sec-api-rest

Now, let's import the necessary libraries and configure the API client with your key.

1# Step 1: Setup Your Python Environment (Imports and Configuration)
2import api_bricks_sec_api_rest
3import json # For pretty printing JSON responses
4import pandas as pd # For potentially displaying lists of items
5
6# --- API Configuration ---
7# IMPORTANT: Replace "YOUR_API_KEY_HERE" with your actual FinFeedAPI key.
8API_KEY = "YOUR_API_KEY_HERE"
9api_client_config = api_bricks_sec_api_rest.Configuration()
10api_client_config.api_key['Authorization'] = API_KEY
11
12# Initialize the API client object
13api_client = api_bricks_sec_api_rest.ApiClient(configuration=api_client_config)
14
15print("API client configured.")

Explanation: This section handles the initial setup. We install the FinFeedAPI client library and then import it along with json for easier viewing of API responses and pandas for potentially structuring lists of items. The api_client is initialized with your personal API key.

Step 2: Obtain an Accession Number

The /v1/extractor endpoint requires an accession_number to identify the specific SEC filing you want to process. An accession number is a unique identifier assigned by the SEC to each filing.

For this tutorial, we'll use a couple of example accession numbers:

An 8-K filing (e.g., for Microsoft, CIK: 789019): 0000950170-24-058043 (This 8-K reports Item 5.02 - Departure of Directors or Certain Officers; Election of Directors; Appointment of Certain Officers; Compensatory Arrangements of Certain Officers).
A 10-K filing (e.g., for Apple, CIK: 320193): 0000320193-23-000106 (This is Apple's annual report for the fiscal year ended September 30, 2023).

In a real application, you might get accession numbers by first querying the /v1/filings endpoint based on company CIK, form type, and date ranges.

1# Step 2: Obtain an Accession Number
2# Example Accession Numbers
3accession_number_8k = "0000950170-24-058043" # Microsoft 8-K
4accession_number_10k = "0000320193-23-000106" # Apple 10-K
5
6print(f"Using 8-K Accession Number: {accession_number_8k}")
7print(f"Using 10-K Accession Number: {accession_number_10k}")

Explanation: We've defined two example accession numbers. You can replace these with any valid accession number for a filing you're interested in.

Step 3: Using the `/v1/extractor` Endpoint

The /v1/extractor endpoint is a GET request that takes the accession_number as a query parameter. It retrieves the HTML content of the filing and classifies it into relevant item categories.

Let's make a call to this endpoint for our example 8-K filing.

1# Step 3: Using the /v1/extractor Endpoint (for 8-K)
2# Initialize the ContentExtractionApi
3extraction_api = api_bricks_sec_api_rest.ContentExtractionApi(api_client)
4
5filing_extract_result_8k = None
6print(f"\nAttempting to extract full structure for 8-K filing: {accession_number_8k}")
7
8try:
9    # Call the API
10    api_response_8k = extraction_api.v1_extractor_get(accession_number=accession_number_8k)
11    
12    if api_response_8k:
13        filing_extract_result_8k = api_response_8k
14        print(f"Successfully extracted data for {filing_extract_result_8k.accession_number}, Form Type: {filing_extract_result_8k.form_type}")
15        if filing_extract_result_8k.items:
16            print(f"Found {len(filing_extract_result_8k.items)} items in this filing.")
17        else:
18            print("No items found in the extracted data for this filing.")
19    else:
20        print(f"No data returned from extractor for {accession_number_8k}.")
21
22except api_bricks_sec_api_rest.ApiException as e:
23    print(f"API Exception when calling /v1/extractor for 8-K: {e}")

Explanation: We instantiate ContentExtractionApi and then call the v1_extractor_get method with the accession_number of the 8-K filing. The response, if successful, will be a DTO.FilingExtractResultDto object.

Step 4: Understanding the Extracted Data Structure

The API returns a DTO.FilingExtractResultDto object (represented as filing_extract_result_8k in our code). This object has the following main attributes:

accession_number: The accession number of the processed filing.
form_type: The type of the filing (e.g., "8-K", "10-K").
items: This is a list where each element is a DTO.FilingItemDto object. Each FilingItemDto represents a classified section or item from the filing and contains:
- item_number: The identifier of the item (e.g., "1.01", "5.02" for an 8-K; "1A", "7" for a 10-K).
- item_title: A descriptive title for the item.
- content: The HTML content of that specific item.

Let's inspect the items found in our 8-K example:

1# Step 4: Understanding the Extracted Data Structure (for 8-K)
2if filing_extract_result_8k and filing_extract_result_8k.items:
3    print("\nItems found in the 8-K filing:")
4    for item in filing_extract_result_8k.items:
5        print("----")
6        print(f"  Item Number: {item.item_number}")
7        print(f"  Item Title: {item.item_title}")
8        # Print a small preview of the content
9        content_preview = item.content[:200] + "..." if item.content and len(item.content) > 200 else item.content
10        print(f"  Content Preview (HTML): {content_preview}")
11    print("----")
12else:
13    print("\nNo items to display from the 8-K filing extract.")

Explanation: This code iterates through the items list from the API response and prints the item_number, item_title, and a short preview of the HTML content for each item.

Step 5: Practical Example - Accessing Specific Item Content

With the structured data, you can now easily access the content of a specific item you are interested in.

Example 1: Get content of Item 5.02 from the 8-K

Item 5.02 in an 8-K often relates to changes in directors or principal officers.

1# Step 5: Practical Example - Accessing Specific Item Content (Item 5.02 from 8-K)
2if filing_extract_result_8k and filing_extract_result_8k.items:
3    target_item_number_8k = "5.02" 
4    item_5_02_content = None
5    for item in filing_extract_result_8k.items:
6        if item.item_number == target_item_number_8k:
7            item_5_02_content = item.content
8            print(f"\nSuccessfully found Item {target_item_number_8k}: {item.item_title}")
9            break
10    
11    if item_5_02_content:
12        print(f"\nFull HTML Content of Item {target_item_number_8k} (first 1000 characters):")
13        print(item_5_02_content[:1000] + "..." if len(item_5_02_content) > 1000 else item_5_02_content)
14    else:
15        print(f"\nItem {target_item_number_8k} not found in the 8-K filing {accession_number_8k}.")
16else:
17    print("\nNo extracted items from 8-K to search within.")

Example 2: Get content of Item 1A (Risk Factors) from the 10-K

Let's repeat the extraction for our Apple 10-K example and get "Item 1A".

1# Step 5: Practical Example - Accessing Specific Item Content (Item 1A from 10-K)
2# (Assumes extraction_api is already initialized from Step 3)
3filing_extract_result_10k = None
4print(f"\nAttempting to extract full structure for 10-K filing: {accession_number_10k}")
5try:
6    api_response_10k = extraction_api.v1_extractor_get(accession_number=accession_number_10k)
7    if api_response_10k:
8        filing_extract_result_10k = api_response_10k
9        print(f"Successfully extracted data for {filing_extract_result_10k.accession_number}, Form Type: {filing_extract_result_10k.form_type}")
10    else:
11        print(f"No data returned from extractor for {accession_number_10k}.")
12except api_bricks_sec_api_rest.ApiException as e:
13    print(f"API Exception when calling /v1/extractor for 10-K: {e}")
14
15if filing_extract_result_10k and filing_extract_result_10k.items:
16    target_item_number_10k = "1A" # Item 1A for Risk Factors in 10-K
17    item_1a_content = None
18    item_1a_title = ""
19
20    for item in filing_extract_result_10k.items:
21        # Check for "1A" or "Item 1A" to be more flexible
22        if item.item_number and (item.item_number.strip().upper() == "1A" or "ITEM 1A" in item.item_number.strip().upper()):
23            item_1a_content = item.content
24            item_1a_title = item.item_title
25            print(f"\nSuccessfully found Item {item.item_number}: {item_1a_title}")
26            break
27            
28    if item_1a_content:
29        print(f"\nHTML Content of {item_1a_title} (first 1000 characters):")
30        print(item_1a_content[:1000] + "..." if len(item_1a_content) > 1000 else item_1a_content)
31    else:
32        print(f"\nItem 1A (Risk Factors) not found in the 10-K filing {accession_number_10k}.")
33else:
34    print("\nNo extracted items from 10-K to search within.")

Explanation: These examples show how to loop through the extracted items and select one based on its item_number. The content attribute gives you the HTML for that section. For the 10-K, we search for "1A" or "Item 1A" to be a bit more robust.

Step 6: Tips for Working with Extracted Content

HTML Parsing: The content field for each item is typically HTML. If you need to extract plain text, specific figures from tables, or further process the structure within an item, you'll need to use an HTML parsing library like BeautifulSoup in Python. This tutorial focuses on retrieving the itemized HTML; further parsing is a subsequent step.
Item Number Variations: While the API attempts to classify items, be aware that there can sometimes be slight variations in how item numbers are reported in filings (e.g., "1A" vs. "Item 1A"). Your code might need to handle such cases if you are looking for very specific items across many different filings. The /v1/extractor/item endpoint (covered in other FinFeedAPI documentation/tutorials) is designed to fetch a single item and might have more normalization for these variations.
Large Content: Some items, like MD&A or full financial statements within an exhibit, can be very large. Be mindful of this when processing or storing the content.

Conclusion

The FinFeedAPI's /v1/extractor endpoint is a valuable tool for developers needing to programmatically access specific, categorized sections of SEC filings. By providing a structured JSON output with itemized content, it greatly simplifies the task of pinpointing relevant information within these often lengthy and complex documents.

From here, you could:

Automate the extraction of specific items from a list of filings (obtained via the /v1/filings endpoint).
Feed the extracted HTML content into NLP pipelines for textual analysis.
Build applications that alert users to specific events based on the content of 8-K items.

Happy coding with the FinFeedAPI!

Tutorial: Extracting Key Information from SEC Filings with FinFeedAPI's /v1/extractor Endpoint