Sources

Public data behind the explorer

The site is built from public NYC open-data releases plus a small amount of public census geography enrichment. This page separates sources that power the live app from the wider research inventory documented in the repo.

Current Web App Inputs

Sources actively surfaced in the live routes

These are the public inputs that directly feed the four Parquet files consumed by the web app.

SourceAgencyDatasetCoverageGrainKeyRole
NYC DOC Inmate AdmissionsNYC Department of Correction6teu-xtgp2014-2026Jail admission eventINMATEID + ADMITTED_DTPrimary source for exact DOC episode histories and person rollups.
NYC DOC Inmate DischargesNYC Department of Correction94ri-3ium2014-2026Jail discharge eventINMATEID + ADMITTED_DTSupplies discharge dates and age-at-discharge for stay lengths and birth-year imputation.
NYPD Arrests HistoricNew York City Police Department8h9b-rp9u2006-2024Arrest eventARREST_KEYFeeds the arrest-to-DOC bridge after penal-code parsing and demographic/date filtering.

Current App Outputs

Derived datasets the site reads

The web app reads Parquet directly through DuckDB. Each file below has a specific role and confidence grade.

doc_recidivism_persons.parquet

Join DOC admissions to discharges on INMATEID + admit_date, sequence episodes, then aggregate to person-level metrics.

Exact. Built by scripts/analyze_doc_recidivism.py.

doc_recidivism_episodes.parquet

Exact DOC episode history with discharge date, stay_days, gap_days, episode order, and imputed birth year.

Exact. Built by scripts/analyze_doc_recidivism.py.

doc_cohort_recidivism.parquet

Build release cohorts from DOC episodes and mark whether each person returned within 1, 2, or 3 years when follow-up is observable.

Exact. Built by scripts/analyze_doc_cohort_recidivism.py.

arrest_doc_bridge.parquet

Match arrests to DOC admissions on same date, normalized sex, parsed penal code, and compatible age bucket, then keep only unique 1:1 pairs.

Candidate. Built by scripts/build_arrest_doc_bridge.py.

Repo Panel Inputs

Broader public event graph sources

These sources are part of the wider research workspace, even when they are not all surfaced on the current routes.

SourceDatasetCoverageKeyRole
NYC DOC Inmate Admissions6teu-xtgp2014-2026INMATEID + ADMITTED_DTPrimary source for exact DOC episode histories and person rollups.
NYC DOC Inmate Discharges94ri-3ium2014-2026INMATEID + ADMITTED_DTSupplies discharge dates and age-at-discharge for stay lengths and birth-year imputation.
NYPD Arrests Historic8h9b-rp9u2006-2024ARREST_KEYFeeds the arrest-to-DOC bridge after penal-code parsing and demographic/date filtering.
NYPD Complaints Historicqgea-i56i2006-2024cmplnt_numUsed in the broader public event panel and arrest-to-complaint candidate matching.
NYPD Summonses Historicsv2w-rv3k2006-2024SUMMONS_KEYIncluded as a standalone event source in the broader public event panel.
Census Batch Geocodercoordinatesbatch endpointCurrent geography benchmarkLongitude + LatitudeAdds tract and block-group geography to NYPD event rows with coordinates.

Join Surfaces

How sources touch each other

A source being public does not mean it is joinable. The limiting factor is whether the released fields expose a defensible path from one stage to another.

JoinFieldsStatusSupportsCaveat
DOC admissions ↔ DOC dischargesINMATEID + admit_dateExactStay lengths, gap lengths, ordered jail episodes, and DOC person histories.Exact within the public DOC feeds, but only for the jail stage.
DOC episodes ↔ DOC person summariesAggregation over INMATEIDExactRepeat-admission counts, tiers, charge-change counts, and person profiles.Still a DOC-only identity, not a citywide criminal-justice person key.
NYPD arrests ↔ DOC admissionsdate + sex + parsed penal code + imputed age bucketCandidateA narrow arrest-to-jail bridge subset for mapped/contextual arrest detail.Not ground truth. Only unique 1:1 matches are kept to favor precision over coverage.
NYPD arrests ↔ NYPD complaintsdate + precinct + offense code + borough + demographicsCandidateBroader repo event-graph analysis outside the current web explorer.Ambiguous and incomplete, especially in earlier years.
Anything ↔ public court bulk dataNoneUnsupportedNo public person-level court linkage in this app.The court extracts documented in the repo are intentionally de-identified.
Anything ↔ state prison person recordsNoneUnsupportedNo person-level prison linkage in this app.Public DOCCS releases are aggregate only.

Documented Inventory

Public sources tracked in the repo but not yet surfaced here

These sources are documented because they matter for the broader research path, but they are not currently powering the live explorer routes.

SourceAgencyDatasetCoverageKeyRole
OCA-STAT Act extractUCS / OCABulk CSV2021-2025 arraignmentsNo public person keyDocumented in repo inventory, but not wired into the current explorer.
OCA Pretrial Release extractUCS / OCA + DCJSBulk CSV2020-2024 arraignmentsarr_cycle_id within same arrestUseful for court detail, but not person-linkable across arrests in public form.
DCJS Supplemental PretrialDCJSBulk ZIP / CSV2019-2024 arraignmentscaseidDocumented in the repo, but not part of the current web app pipeline.
DOCCS aggregate tablesNew York State DOCCSOpen Data NY2008+County + yearValidation/context only. No prison person identifier is public.