Methodology

This page explains how we transform raw LCA public disclosure files into a searchable dataset. The goal is consistency and usability, while preserving important context from the original records.

Key disclaimers

Processing steps (high level)

  1. Import the raw disclosure file and normalize column names.
  2. Trim whitespace and convert blank strings to NULLs.
  3. Parse dates (e.g., received/decision) using safe parsing rules.
  4. Standardize text for search (case-folding; light cleanup for common punctuation/spacing).
  5. Compute derived fields such as year and (when possible) annualized wage.
  6. De-duplicate by case number when duplicates exist.

Annualized wage (when possible)

LCAs can report wages in different units. When a wage rate and unit are provided, we compute an approximate annual amount using common factors:

Notes: Annualization is an approximation and may not match actual compensation structures (bonuses, part-time schedules, unpaid leave, hourly assumptions, etc.).

How “year” is derived

We typically derive year from the decision date when it exists. If decision date is missing, we may fall back to another relevant date field (such as received date) depending on availability.

Known limitations