Methodology
This page explains how we transform raw LCA public disclosure files into a searchable dataset. The goal is consistency and usability, while preserving important context from the original records.
Key disclaimers
- Public disclosure data: Records originate from DOL LCA public disclosure datasets.
- Not legal advice: Nothing on this site should be interpreted as legal guidance.
- LCA does not equal approval: An LCA record is not the same as an approved petition or visa.
- Data may contain errors: We improve consistency, but cannot guarantee completeness or accuracy.
Processing steps (high level)
- Import the raw disclosure file and normalize column names.
- Trim whitespace and convert blank strings to NULLs.
- Parse dates (e.g., received/decision) using safe parsing rules.
- Standardize text for search (case-folding; light cleanup for common punctuation/spacing).
- Compute derived fields such as year and (when possible) annualized wage.
- De-duplicate by case number when duplicates exist.
Annualized wage (when possible)
LCAs can report wages in different units. When a wage rate and unit are provided, we compute an approximate annual amount using common factors:
- Hourly: rate × 2,080 (40 hours/week × 52 weeks)
- Weekly: rate × 52
- Bi-Weekly: rate × 26
- Monthly: rate × 12
- Yearly: rate × 1
Notes: Annualization is an approximation and may not match actual compensation structures (bonuses, part-time schedules, unpaid leave, hourly assumptions, etc.).
How “year” is derived
We typically derive year from the decision date when it exists. If decision date is missing, we may fall back to another relevant date field (such as received date) depending on availability.
Known limitations
- Worksite city/state formatting is not fully standardized in source files.
- Employer names may vary across filings for the same organization.
- Some records have missing or non-numeric wage fields.
- Search results are best-effort; use the raw fields and verify against official sources for decisions.