Sources
Public data behind the explorer
The site is built from public NYC open-data releases plus a small amount of public census geography enrichment. This page separates sources that power the live app from the wider research inventory documented in the repo.
Current Web App Inputs
Sources actively surfaced in the live routes
These are the public inputs that directly feed the four Parquet files consumed by the web app.
| Source | Agency | Dataset | Coverage | Grain | Key | Role |
|---|---|---|---|---|---|---|
| NYC DOC Inmate Admissions | NYC Department of Correction | 6teu-xtgp | 2014-2026 | Jail admission event | INMATEID + ADMITTED_DT | Primary source for exact DOC episode histories and person rollups. |
| NYC DOC Inmate Discharges | NYC Department of Correction | 94ri-3ium | 2014-2026 | Jail discharge event | INMATEID + ADMITTED_DT | Supplies discharge dates and age-at-discharge for stay lengths and birth-year imputation. |
| NYPD Arrests Historic | New York City Police Department | 8h9b-rp9u | 2006-2024 | Arrest event | ARREST_KEY | Feeds the arrest-to-DOC bridge after penal-code parsing and demographic/date filtering. |
Current App Outputs
Derived datasets the site reads
The web app reads Parquet directly through DuckDB. Each file below has a specific role and confidence grade.
doc_recidivism_persons.parquet
Join DOC admissions to discharges on INMATEID + admit_date, sequence episodes, then aggregate to person-level metrics.
doc_recidivism_episodes.parquet
Exact DOC episode history with discharge date, stay_days, gap_days, episode order, and imputed birth year.
doc_cohort_recidivism.parquet
Build release cohorts from DOC episodes and mark whether each person returned within 1, 2, or 3 years when follow-up is observable.
arrest_doc_bridge.parquet
Match arrests to DOC admissions on same date, normalized sex, parsed penal code, and compatible age bucket, then keep only unique 1:1 pairs.
Repo Panel Inputs
Broader public event graph sources
These sources are part of the wider research workspace, even when they are not all surfaced on the current routes.
| Source | Dataset | Coverage | Key | Role |
|---|---|---|---|---|
| NYC DOC Inmate Admissions | 6teu-xtgp | 2014-2026 | INMATEID + ADMITTED_DT | Primary source for exact DOC episode histories and person rollups. |
| NYC DOC Inmate Discharges | 94ri-3ium | 2014-2026 | INMATEID + ADMITTED_DT | Supplies discharge dates and age-at-discharge for stay lengths and birth-year imputation. |
| NYPD Arrests Historic | 8h9b-rp9u | 2006-2024 | ARREST_KEY | Feeds the arrest-to-DOC bridge after penal-code parsing and demographic/date filtering. |
| NYPD Complaints Historic | qgea-i56i | 2006-2024 | cmplnt_num | Used in the broader public event panel and arrest-to-complaint candidate matching. |
| NYPD Summonses Historic | sv2w-rv3k | 2006-2024 | SUMMONS_KEY | Included as a standalone event source in the broader public event panel. |
| Census Batch Geocoder | coordinatesbatch endpoint | Current geography benchmark | Longitude + Latitude | Adds tract and block-group geography to NYPD event rows with coordinates. |
Join Surfaces
How sources touch each other
A source being public does not mean it is joinable. The limiting factor is whether the released fields expose a defensible path from one stage to another.
| Join | Fields | Status | Supports | Caveat |
|---|---|---|---|---|
| DOC admissions ↔ DOC discharges | INMATEID + admit_date | Exact | Stay lengths, gap lengths, ordered jail episodes, and DOC person histories. | Exact within the public DOC feeds, but only for the jail stage. |
| DOC episodes ↔ DOC person summaries | Aggregation over INMATEID | Exact | Repeat-admission counts, tiers, charge-change counts, and person profiles. | Still a DOC-only identity, not a citywide criminal-justice person key. |
| NYPD arrests ↔ DOC admissions | date + sex + parsed penal code + imputed age bucket | Candidate | A narrow arrest-to-jail bridge subset for mapped/contextual arrest detail. | Not ground truth. Only unique 1:1 matches are kept to favor precision over coverage. |
| NYPD arrests ↔ NYPD complaints | date + precinct + offense code + borough + demographics | Candidate | Broader repo event-graph analysis outside the current web explorer. | Ambiguous and incomplete, especially in earlier years. |
| Anything ↔ public court bulk data | None | Unsupported | No public person-level court linkage in this app. | The court extracts documented in the repo are intentionally de-identified. |
| Anything ↔ state prison person records | None | Unsupported | No person-level prison linkage in this app. | Public DOCCS releases are aggregate only. |
Documented Inventory
Public sources tracked in the repo but not yet surfaced here
These sources are documented because they matter for the broader research path, but they are not currently powering the live explorer routes.
| Source | Agency | Dataset | Coverage | Key | Role |
|---|---|---|---|---|---|
| OCA-STAT Act extract | UCS / OCA | Bulk CSV | 2021-2025 arraignments | No public person key | Documented in repo inventory, but not wired into the current explorer. |
| OCA Pretrial Release extract | UCS / OCA + DCJS | Bulk CSV | 2020-2024 arraignments | arr_cycle_id within same arrest | Useful for court detail, but not person-linkable across arrests in public form. |
| DCJS Supplemental Pretrial | DCJS | Bulk ZIP / CSV | 2019-2024 arraignments | caseid | Documented in the repo, but not part of the current web app pipeline. |
| DOCCS aggregate tables | New York State DOCCS | Open Data NY | 2008+ | County + year | Validation/context only. No prison person identifier is public. |