Help Build the Early Warning System
This project tracks convergence of independent signals that a county is being pitched for ICE detention. The more data sources we ingest, the earlier communities can respond. Every new scraper, every new data source, every FIPS-coded signal makes the system more effective.
The codebase is Python + Hugo + D3. No frameworks, no build tools beyond Hugo. If you can write a Python script that outputs JSON, you can contribute a data source.
How it works
→ county_heat_score.py → heat_data.json → generate_content.py → Hugo site (4,500+ pages)
Each ingestion script pulls from one external data source, applies keyword/signal filtering, and outputs a JSON array of entries. Each entry has a type (one of 10 signal types), a FIPS code (5-digit county identifier), and a signal strength. The heat scoring engine counts converging signal types per county — diversity of signals matters more than volume.
Entry JSON format
{
"entry_type": "commission-activity",
"title": "Broward County FL — BOCC 2026-03-26: ICE detention budget transfer",
"body": "Meeting: County Commission\nDate: 2026-03-26\n...",
"county": "Broward",
"state": "FL",
"fips": "12011",
"date": "2026-03-26",
"source": "Legistar (broward), Event 12345",
"signal_strength": "moderate",
"tags": ["commission-activity", "fl"]
}Required: entry_type, title, state. Strongly recommended: fips, county, signal_strength. The FIPS code is the join key that makes cross-referencing work.
Signal types you can contribute to
| Type | Weight | Status | Entry type value |
|---|---|---|---|
| IGSA Facility | 10 | Automated | igsa |
| ANC Contract | 8 | Automated | anc-contract |
| 287(g) Agreement | 7 | Automated | 287g-agreement |
| Commission Activity | 7 | Semi-auto (Legistar) | commission-activity |
| Job Posting | 7 | Partial | job-posting |
| Sheriff Network | 6 | Manual only | sheriff-network |
| Comms Discipline | 6 | Manual only | comms-discipline |
| Budget Distress | 5 | Automated | budget-distress |
| Real Estate Trace | 2 | Manual only | real-estate-trace |
| Legislative Trace | 1 | Manual only | legislative-trace |
Highest-impact things to build
The highest-heat counties (Palm Beach FL, Pinal AZ, Webb TX, Bradford FL, Charlton GA) don't use Legistar. Each county posts agendas on their own website. A scraper per county — pulling agendas, searching for detention keywords, outputting our JSON format — would fill the biggest gap in our coverage.
Pattern: see ingest_legistar.py for keyword matching logic. Output: same JSON format, same commission-activity entry type.
We have only 1 legislative trace entry. OpenStates and LegiScan have APIs that could scan all 50 state legislatures for bills mentioning IGSA, immigration detention, or 287(g). States introducing bans are states where the pitch is active.
Output: legislative-trace entries with bill number, sponsor, status, state.
Detention consultant job postings are a strong forward indicator — a posting in a new state means a new pitch. The script exists (ingest_jobs.py) but needs a search API key. Contributing a SerpAPI key or an alternative scraping approach would let us monitor Sabot Consulting, GEO Group, CoreCivic, and Akima job listings.
Warehouse purchases are the physical infrastructure of detention expansion. A scraper for LoopNet, Crexi, or county assessor records that flags large industrial properties (>100,000 sq ft) near existing correctional facilities would detect the warehouse model before purchases close.
Output: real-estate-trace entries with address, sqft, owner, FIPS.
Google News API, MediaCloud, or GDELT could monitor local newspapers for "detention facility," "ICE," "IGSA," and related terms. Local news coverage is often the first public sign that a county is being pitched. The challenge is FIPS resolution — matching news articles to counties.
Getting started
- Fork the repo: github.com/markramm/detention-pipeline
- Read an existing script: Start with kb/scripts/ingest_legistar.py — it shows the full pattern (API call → keyword match → JSON output).
- Write your script: Name it ingest_yourSource.py in kb/scripts/. Output a JSON array of entries matching the format above.
- FIPS codes: Use county_heat_score.py's FIPS lookup, or download the Census reference file. Every entry needs a 5-digit FIPS.
- Test with dry run: Add a --dry-run flag that previews matches without writing files.
- Submit a PR: Include the script, a sample of output, and a note about what data source it covers.
Repository layout
detention-pipeline/
├── kb/ # Knowledge base (markdown entries)
│ ├── 287g/ # 1,311 287(g) agreement entries
│ ├── anc/ # 244 ANC contract entries
│ ├── budget/ # 1,086 budget distress entries
│ ├── commission/ # Commission activity entries
│ ├── industry/ # Contractors, people, fights, orgs
│ ├── facilities/ # IGSA facility entries
│ ├── scripts/ # Ingestion scripts live here
│ │ ├── ingest_287g.py
│ │ ├── ingest_budget_distress.py
│ │ ├── ingest_jobs.py
│ │ ├── ingest_legistar.py
│ │ ├── ingest_usaspending.py
│ │ └── county_heat_score.py # Scoring engine
│ └── kb.yaml # Schema for all entry types
├── hugo/ # Static site generator
│ ├── generate_content.py # KB → Hugo content
│ ├── layouts/ # Hugo templates
│ └── hugo.toml # Hugo config
├── run_ingest.sh # Central pipeline runner
└── build.sh # Heat score generator