sec
This data source grabs information from quarterly SEC data archives.
DataSetCollector
Take care of downloading all the data sets and aggregate them into a single structure.
Source code in src/stocktracer/collector/sec.py
714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 |
|
get_data(sec_filter, ciks)
Collect data based on the provided filter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sec_filter |
Filter
|
SEC specific filter of how to filter the results |
required |
ciks |
frozenset[int]
|
CIK values to filter the datasets on |
required |
Raises:
Type | Description |
---|---|
ImportError
|
when the quarterly report is missing |
LookupError
|
when the filter returned no matches |
Returns:
Name | Type | Description |
---|---|---|
Results |
Results
|
filtered data results |
Source code in src/stocktracer/collector/sec.py
718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 |
|
DataSetReader
dataclass
Reads the data from a zip file retrieved from the SEC website.
Source code in src/stocktracer/collector/sec.py
466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 |
|
append(filtered_data, data)
classmethod
Append data to the filtered_data and return the updated filtered DataFrame.
df1 = pd.DataFrame({"A": ["A0", "A1", "A2", "A3"]},index=[0,1,2,3]) df2 = pd.DataFrame({"B": ["B0", "B1", "B2", "B3"]},index=[4,5,6,7]) DataSetReader.append(df1, df2) A B 0 A0 NaN 1 A1 NaN 2 A2 NaN 3 A3 NaN 4 NaN B0 5 NaN B1 6 NaN B2 7 NaN B3 DataSetReader.append(None, df1) A 0 A0 1 A1 2 A2 3 A3 DataSetReader.append(df1, df1) A 0 A0 1 A1 2 A2 3 A3 0 A0 1 A1 2 A2 3 A3
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filtered_data |
Optional[pd.DataFrame]
|
Existing Data |
required |
data |
pd.DataFrame
|
New data |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Filtered Data |
Source code in src/stocktracer/collector/sec.py
process_zip(sec_filter, ciks)
Process a zip archive with the provided filter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sec_filter |
Filter
|
results to filter out of the zip archive |
required |
ciks |
frozenset[int]
|
CIKs to filter data on |
required |
Raises:
Type | Description |
---|---|
LookupError
|
if the cache is missing the binary zip file |
Returns:
Type | Description |
---|---|
Optional[pd.DataFrame]
|
Optional[pd.DataFrame]: filtered data |
Source code in src/stocktracer/collector/sec.py
DownloadManager
This class is responsible for downloading and caching downloaded data sets from the SEC.
Source code in src/stocktracer/collector/sec.py
ticker_reader: TickerReader
property
Get the CIK ticker mappings. This must be done before processing reports.
The SEC stores the mappings of the CIK values to tickers in a JSON file. We can download and cache this information essentially for a year. We're not interested in companies that recently listed because they don't have a long regulated record of reported earnings. When we process the records, we can ignore cik values that are not in this list.
Typical json for these looks like the following (without spaces or line breaks):
{"0":{"cik_str":320193,"ticker":"AAPL","title":"Apple Inc."}, "1":{"cik_str":789019,"ticker":"MSFT","title":"MICROSOFT CORP"},
Returns:
Name | Type | Description |
---|---|---|
TickerReader |
TickerReader
|
maps cik to stock ticker |
get_quarterly_report(report_date)
Retrieve from a cache or make a request archived quarterly data.
This allows us to download data independent of actually processing it, allowing us to prefetch information we need if we like.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
report_date |
ReportDate
|
information specifying the quarterly dump to retrieve |
required |
Returns:
Type | Description |
---|---|
Optional[DataSetReader]
|
Optional[DataSetReader]: this object helps process the data received more granularly |
Source code in src/stocktracer/collector/sec.py
Filter
dataclass
Filter for SEC tools to scrape relevant information when processing records.
Source code in src/stocktracer/collector/sec.py
focus_period: frozenset[str]
property
Get the focus period for the report.
Companies file quarterly reports. The annual report replaces the quarterly report depending on when that is reported. Typically Q4 is replaced with FY for the annual reports.
Returns:
Type | Description |
---|---|
frozenset[str]
|
frozenset[str]: list of focus periods to use for the filter |
required_reports: list[ReportDate]
property
Get a list of required reports to download for all the quarters.
The list generated will include an extra quarter so that you will always be able to do analysis from the current quarter to the previous quarter.
Also note that it doesn't matter if you specify only_annual=True. Because companies don't have the same fiscal year, we have to check every quarterly report just to see if their annual report is in there.
Returns:
Type | Description |
---|---|
list[ReportDate]
|
list[ReportDate]: list of report dates to retrieve |
ReportDate
dataclass
ReportDate is used to select and identify archives created by the SEC.
Source code in src/stocktracer/collector/sec.py
Results
dataclass
Filtered data looks like this(in csv format):
Note that fp has the "Q" removed from the front so it can be stored as a simple number.
.. code-block:: text
ticker,tag,fy,fp,ddate,uom,value,period,title
AAPL,EntityCommonStockSharesOutstanding,2022,Q1,2023-01-31,shares,2000.0,2022-12-31,Apple Inc.
AAPL,FakeAttributeTag,2022,Q1,2023-01-31,shares,200.0,2022-12-31,Apple Inc.
Source code in src/stocktracer/collector/sec.py
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
|
ciks: set[np.int64]
property
Retrieves a list of CIK values corresponding to the tickers being looked up.
The SEC object will call populateCikList to generate this information. This helps with dependency injection by avoiding the Filter having to maintain references to these helper objects for temporary processing. It also lets us stub out the information provided without having to involve heavier utilities or network access.
Raises:
Type | Description |
---|---|
LookupError
|
description |
Returns:
Type | Description |
---|---|
set[np.int64]
|
set[int]: Set containing all the CIKs that are being filtered out |
Table
dataclass
This is the results from a Filter.select()
call.
The results table looks like the following:
tag AccountsPayableCurrent ... WeightedAverageNumberOfSharesOutstandingBasic
ticker fy ...
AAPL 2021.0 4.852950e+10 ... 1.750824e+10
2022.0 5.943900e+10 ... 1.675645e+10
MSFT 2021.0 1.384650e+10 ... 7.610000e+09
2022.0 1.708150e+10 ... 7.551000e+09
TMO 2021.0 2.521000e+09 ... 3.966667e+08
2022.0 3.124000e+09 ... 3.940000e+08
From here, you can call functions on this class like get_value()
or normalize()
.
Note
To get a list of all the tags, run the annual_reports
analysis module and search through the output for meaningful tags.
Source code in src/stocktracer/collector/sec.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 |
|
tags: np.ndarray
property
List of tags that can be used on this data set.
Returns:
Type | Description |
---|---|
np.ndarray
|
np.ndarray: array with results |
calculate_current_ratio(column_name)
Calculate the current ratio.
The current ratio is a liquidity ratio that measures a company’s ability to pay short-term obligations or those due within one year.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
description |
required |
Source code in src/stocktracer/collector/sec.py
calculate_debt_to_assets(column_name)
Calculates the current debt to assets ratio.
Having more debt than assets is a risk indicator that could indicate a potential for bankruptcy.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
name to assign to the column |
required |
Source code in src/stocktracer/collector/sec.py
calculate_delta(column_name, delta_of)
Calculate the change between the latest row and the one before it within a ticker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
name to give the calculated column |
required |
delta_of |
str
|
column name to calculate the delta of, such as ROI |
required |
Source code in src/stocktracer/collector/sec.py
calculate_net_income(column_name)
Calculates the net income stocks as a series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
name to assign to the column |
required |
Source code in src/stocktracer/collector/sec.py
calculate_return_on_assets(column_name)
Returns the ROA of stocks as a series.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name |
str
|
name to assign to the column |
required |
Source code in src/stocktracer/collector/sec.py
get_value(ticker, tag, year)
Retrieve the exact value of a table cell.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ticker |
str
|
ticker identifying the equity of interest. |
required |
tag |
str
|
attribute indicating the type of data to look at. |
required |
year |
int
|
The year this data applies to. |
required |
Returns:
Type | Description |
---|---|
int | float | np.int64
|
int | float | np.int64: value of result |
Source code in src/stocktracer/collector/sec.py
normalize()
slice(ticker=None, year=None, tags=None)
Slice the results by the specified values
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ticker |
Optional[str | list[str]]
|
description. Defaults to None. |
None
|
tags |
Optional[str]
|
description. Defaults to None. |
None
|
year |
Optional[int]
|
description. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: description |
Source code in src/stocktracer/collector/sec.py
select(aggregate_func='mean', tickers=None)
Select only a subset of the data matching the specified criteria.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
aggregate_func |
Optional[Callable | Literal['mean', 'std', 'var', 'sum', 'min', 'max', 'slope']]
|
Numpy function to use for aggregating the results. This should be a function like |
'mean'
|
tickers |
Optional[Sequence[str]]
|
ticker symbol for the company |
None
|
Returns:
Type | Description |
---|---|
Table
|
Results.Table: Object that represents a pivot table with the data requested |
Source code in src/stocktracer/collector/sec.py
TickerReader
This class provides translation services for CIK and Ticker values.
The SEC has a json
file that provides mappings from CIK values to Tickers.
The data providing this conversion is injected into this class and then
this class provides helper methods for performing the conversion on this data set.
This class is provided CSV data which is parsed upon initialization. So creating this object is the most expensive part.
Source code in src/stocktracer/collector/sec.py
map_of_cik_to_ticker: pd.DataFrame
property
Dataframe containing mapping of cik and ticker information.
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe with mapping information |
contains(tickers)
Check that the tickers provided exist.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tickers |
frozenset
|
tickers to check |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
if all the tickers are found |
Source code in src/stocktracer/collector/sec.py
convert_to_cik(ticker)
Get the Cik from the stock ticker.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ticker |
str
|
stock ticker. The case does not matter. |
required |
Raises:
Type | Description |
---|---|
LookupError
|
If ticker is not found |
Returns:
Type | Description |
---|---|
np.int64
|
np.int64: cik |
Source code in src/stocktracer/collector/sec.py
convert_to_ticker(cik)
Get the stock ticker from the Cik number.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cik |
int
|
Cik number for the stock |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
stock ticker |
Source code in src/stocktracer/collector/sec.py
get_ciks(tickers)
Populates the filter's CIK list to be used for filtering.
The Filter doesn't need the ticker symbols. If we expand to other data sources, we would have to repeat ticker symbols. For now, the only info we need in the report is the CIK values to find the corresponding stocks.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tickers |
frozenset[str]
|
ticker symbols to search for |
required |
Returns:
Type | Description |
---|---|
frozenset[int]
|
frozenset[int]: CIKs values translated from the tickers specified |
Source code in src/stocktracer/collector/sec.py
filter_data(tickers, sec_filter)
Initiate the retrieval of ticker information based on the provided filters.
Filtered data is stored with the filter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tickers |
list[str]
|
ticker symbols you want information about |
required |
sec_filter |
Filter
|
SEC specific data to scrape from the reports |
required |
Returns:
Name | Type | Description |
---|---|---|
Results |
Results
|
results with filtered data |
Source code in src/stocktracer/collector/sec.py
filter_data_nocache(tickers, sec_filter)
Same as filter_data but no caching is applied.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tickers |
frozenset[str]
|
ticker symbols you want information about |
required |
sec_filter |
Filter
|
SEC specific data to scrape from the reports |
required |
Returns:
Name | Type | Description |
---|---|---|
Results |
Results
|
results with filtered data |
Source code in src/stocktracer/collector/sec.py
slope(data, order=1)
Calculate the trend of a series.
import math math.isclose(slope(pd.Series((1,2,3))), 1) True
math.isclose(slope(pd.Series((3,2,1))), -1) True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.Series
|
description |
required |
order |
int
|
description. Defaults to 1. |
1
|
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
slope of the trend line |