Help Build the Early Warning System

This project tracks convergence of independent signals that a county is being pitched for ICE detention. The more data sources we ingest, the earlier communities can respond. Every new scraper, every new data source, every FIPS-coded signal makes the system more effective.

The codebase is Python + Hugo + D3. No frameworks, no build tools beyond Hugo. If you can write a Python script that outputs JSON, you can contribute a data source.

How it works

External API/Websiteingest_*.py/tmp/*.jsonkb importKB entries (markdown + YAML)
county_heat_score.pyheat_data.jsongenerate_content.pyHugo site (4,500+ pages)

Each ingestion script pulls from one external data source, applies keyword/signal filtering, and outputs a JSON array of entries. Each entry has a type (one of 10 signal types), a FIPS code (5-digit county identifier), and a signal strength. The heat scoring engine counts converging signal types per county — diversity of signals matters more than volume.

Entry JSON format

{
  "entry_type": "commission-activity",
  "title": "Broward County FL — BOCC 2026-03-26: ICE detention budget transfer",
  "body": "Meeting: County Commission\nDate: 2026-03-26\n...",
  "county": "Broward",
  "state": "FL",
  "fips": "12011",
  "date": "2026-03-26",
  "source": "Legistar (broward), Event 12345",
  "signal_strength": "moderate",
  "tags": ["commission-activity", "fl"]
}

Required: entry_type, title, state. Strongly recommended: fips, county, signal_strength. The FIPS code is the join key that makes cross-referencing work.

Signal types you can contribute to

TypeWeightStatusEntry type value
IGSA Facility10Automatedigsa
ANC Contract8Automatedanc-contract
287(g) Agreement7Automated287g-agreement
Commission Activity7Semi-auto (Legistar)commission-activity
Job Posting7Partialjob-posting
Sheriff Network6Manual onlysheriff-network
Comms Discipline6Manual onlycomms-discipline
Budget Distress5Automatedbudget-distress
Real Estate Trace2Manual onlyreal-estate-trace
Legislative Trace1Manual onlylegislative-trace

Highest-impact things to build

County commission scrapers for non-Legistar counties

The highest-heat counties (Palm Beach FL, Pinal AZ, Webb TX, Bradford FL, Charlton GA) don't use Legistar. Each county posts agendas on their own website. A scraper per county — pulling agendas, searching for detention keywords, outputting our JSON format — would fill the biggest gap in our coverage.

Pattern: see ingest_legistar.py for keyword matching logic. Output: same JSON format, same commission-activity entry type.

State legislature bill tracker (OpenStates/LegiScan)

We have only 1 legislative trace entry. OpenStates and LegiScan have APIs that could scan all 50 state legislatures for bills mentioning IGSA, immigration detention, or 287(g). States introducing bans are states where the pitch is active.

Output: legislative-trace entries with bill number, sponsor, status, state.

Job board scraper (LinkedIn/Indeed API or SerpAPI)

Detention consultant job postings are a strong forward indicator — a posting in a new state means a new pitch. The script exists (ingest_jobs.py) but needs a search API key. Contributing a SerpAPI key or an alternative scraping approach would let us monitor Sabot Consulting, GEO Group, CoreCivic, and Akima job listings.

Commercial real estate monitor

Warehouse purchases are the physical infrastructure of detention expansion. A scraper for LoopNet, Crexi, or county assessor records that flags large industrial properties (>100,000 sq ft) near existing correctional facilities would detect the warehouse model before purchases close.

Output: real-estate-trace entries with address, sqft, owner, FIPS.

Local news monitor

Google News API, MediaCloud, or GDELT could monitor local newspapers for "detention facility," "ICE," "IGSA," and related terms. Local news coverage is often the first public sign that a county is being pitched. The challenge is FIPS resolution — matching news articles to counties.

Getting started

  1. Fork the repo: github.com/markramm/detention-pipeline
  2. Read an existing script: Start with kb/scripts/ingest_legistar.py — it shows the full pattern (API call → keyword match → JSON output).
  3. Write your script: Name it ingest_yourSource.py in kb/scripts/. Output a JSON array of entries matching the format above.
  4. FIPS codes: Use county_heat_score.py's FIPS lookup, or download the Census reference file. Every entry needs a 5-digit FIPS.
  5. Test with dry run: Add a --dry-run flag that previews matches without writing files.
  6. Submit a PR: Include the script, a sample of output, and a note about what data source it covers.

Repository layout

detention-pipeline/
├── kb/                          # Knowledge base (markdown entries)
│   ├── 287g/                    # 1,311 287(g) agreement entries
│   ├── anc/                     # 244 ANC contract entries
│   ├── budget/                  # 1,086 budget distress entries
│   ├── commission/              # Commission activity entries
│   ├── industry/                # Contractors, people, fights, orgs
│   ├── facilities/              # IGSA facility entries
│   ├── scripts/                 # Ingestion scripts live here
│   │   ├── ingest_287g.py
│   │   ├── ingest_budget_distress.py
│   │   ├── ingest_jobs.py
│   │   ├── ingest_legistar.py
│   │   ├── ingest_usaspending.py
│   │   └── county_heat_score.py # Scoring engine
│   └── kb.yaml                  # Schema for all entry types
├── hugo/                        # Static site generator
│   ├── generate_content.py      # KB → Hugo content
│   ├── layouts/                 # Hugo templates
│   └── hugo.toml                # Hugo config
├── run_ingest.sh                # Central pipeline runner
└── build.sh                     # Heat score generator
View on GitHub Fork the repo Open issues Data sources Methodology