Legacy pay rate bands extraction — query record

Date: 2026-05-18 Window: 2025-05-18 → 2026-05-18 (last 12 months) Database: jodgig_2026_clean (legacy MySQL) Output CSV: /docs/30-49-domains/35-gig/statistics/2026-05-18-legacy-pay-rate-bands.csv Row count (deduped): 10,166 data rows Supersedes: the prior 7,286-row extract (no-role-column version), which was independently validated as CONFIRMED in /docs/30-49-domains/35-gig/scratchpad/2026-05-18-legacy-pay-rate-bands-validation.md. This re-extract adds three role columns (jod_job_title, job_type_id, job_type_title) and overwrites the CSV; the previous filters and joins are unchanged and are not re-validated here — only the added columns and the new dedupe count are.

Purpose of this revision

Add per-company role information so the same CSV can serve two purposes simultaneously:

Inform the team's mental model of how rich the per-company Org::JobRole taxonomy is (input to design discussion).
Stand in as the Stage-1 extract for the Org::JobRole UC-8 legacy bulk import (admin will Stage-2 review and consolidate near-duplicates and per-shift variants).

Schema additions verified

`jod_jobs` (relevant new columns used)

Column	Type	Notes
`job_title`	varchar(255) NOT NULL	Free-text role name typed by the company admin when creating the job. Closest legacy analog to `Org::JobRole.title`. Has high free-text noise (see findings).
`job_type_id`	bigint unsigned NULL	FK → `job_types.id`. The platform-wide taxonomy (Cashier, Waiter/Waitress, F&B, Retail, …). Nullable in the source table; LEFT JOINed so NULL is preserved as empty.

`job_types` (newly joined)

DESCRIBE job_types:

Column	Type	Notes
`id`	bigint PK	FK target of `jod_jobs.job_type_id`
`title`	varchar(255)	Human-readable taxonomy label
`description`	varchar(255)
`wage_range`	varchar(255)
`image`	varchar(255)
`hex_color`	varchar(7)
`status`	smallint	Not filtered in this extraction — we want the title regardless of whether the platform taxonomy row is currently active.
…		other standard audit/soft-delete columns

SELECT COUNT(*) FROM job_types = 57. Only 7 of the 57 IDs actually appear in the filtered data — see Summary below.

Filters applied (unchanged from prior extraction)

jod_jobs.created_at >= '2025-05-18'
jod_jobs.deleted_at IS NULL
jod_jobs.status = 3 (completed — JOB_JOD_STATUS_COMPLETED)
companies.status = 1 AND companies.deleted_at IS NULL
locations.status = 1 AND locations.deleted_at IS NULL
slots.deleted_at IS NULL
At least one slot_user (not soft-deleted) joined to a payment with payment_status = 2 AND deleted_at IS NULL (PAYMENT_STATUS_PROCESSED)

job_types is joined as a LEFT JOIN (not inner) so that rows with job_type_id IS NULL are still emitted with empty job_type_id/job_type_title rather than being silently dropped.

Final extraction query

SELECT DISTINCT
  jj.company_id,
  c.name AS company_name,
  jj.location_id,
  l.name AS location_name,
  jj.job_title AS jod_job_title,
  jj.job_type_id,
  jt.title AS job_type_title,
  TIME(s.slot_start_date) AS starts_at,
  TIME(s.slot_end_date)   AS ends_at,
  jj.hourly_rate
FROM jod_jobs jj
JOIN companies c     ON c.id  = jj.company_id  AND c.status = 1 AND c.deleted_at IS NULL
JOIN locations l     ON l.id  = jj.location_id AND l.status = 1 AND l.deleted_at IS NULL
LEFT JOIN job_types jt ON jt.id = jj.job_type_id
JOIN slots s         ON s.jod_job_id  = jj.id  AND s.deleted_at IS NULL
JOIN slot_user su    ON su.slot_id    = s.id   AND su.deleted_at IS NULL
JOIN payments p      ON p.id          = su.payment_id AND p.payment_status = 2 AND p.deleted_at IS NULL
WHERE jj.created_at >= '2025-05-18'
  AND jj.deleted_at IS NULL
  AND jj.status = 3
ORDER BY jj.company_id, jj.location_id, jod_job_title, jj.job_type_id, starts_at, ends_at;

Verification query (independent COUNT)

SELECT COUNT(*) FROM (
  SELECT DISTINCT
    jj.company_id, c.name AS company_name,
    jj.location_id, l.name AS location_name,
    jj.job_title AS jod_job_title,
    jj.job_type_id, jt.title AS job_type_title,
    TIME(s.slot_start_date) AS starts_at,
    TIME(s.slot_end_date)   AS ends_at,
    jj.hourly_rate
  FROM jod_jobs jj
  JOIN companies c ON c.id = jj.company_id AND c.status = 1 AND c.deleted_at IS NULL
  JOIN locations l ON l.id = jj.location_id AND l.status = 1 AND l.deleted_at IS NULL
  LEFT JOIN job_types jt ON jt.id = jj.job_type_id
  JOIN slots s     ON s.jod_job_id = jj.id  AND s.deleted_at IS NULL
  JOIN slot_user su ON su.slot_id  = s.id   AND su.deleted_at IS NULL
  JOIN payments p   ON p.id        = su.payment_id AND p.payment_status = 2 AND p.deleted_at IS NULL
  WHERE jj.created_at >= '2025-05-18' AND jj.deleted_at IS NULL AND jj.status = 3
) x;
-- Result: 10166

Row count and comparison with the prior extract

Extract	Distinct rows	Row-shape grain
Prior (validated, no role columns)	7,286	`(company_id, location_id, starts_at, ends_at, hourly_rate)`
This extract (with `jod_job_title`, `job_type_id`, `job_type_title`)	10,166	`(company_id, location_id, jod_job_title, job_type_id, job_type_title, starts_at, ends_at, hourly_rate)`
Delta	+2,880	Adding the role columns introduces extra distinct rows where multiple `jod_job_title` values share the same band+rate at the same outlet.

The +2,880 row delta is meaningful: it is the count of (outlet × band × rate) buckets where the same band+rate was used under more than one distinct role label. This is also the lower bound on Stage-2 admin consolidation work.

CSV shape

company_id, company_name,
location_id, location_name,
jod_job_title, job_type_id, job_type_title,
starts_at, ends_at, hourly_rate

jod_job_title — jod_jobs.job_title verbatim. Not trimmed, not case-folded, not whitespace-collapsed (per task spec — admin will handle in Stage-2).
job_type_id and job_type_title — empty string when jod_jobs.job_type_id is NULL.
starts_at, ends_at, hourly_rate — unchanged from prior extract.

Reproducing

mysql -uroot jodgig_2026_clean -B -e "<extraction SQL>" > /tmp/pay_rate_bands_with_roles_raw.tsv
python3 /tmp/build_pay_rate_csv_v2.py    # converts NULL → empty, writes RFC4180 CSV (QUOTE_MINIMAL)

Prior-extract validation referenced

The prior CSV (same query but without role columns) was peer-reviewed and CONFIRMED in /docs/30-49-domains/35-gig/scratchpad/2026-05-18-legacy-pay-rate-bands-validation.md. The validator independently re-derived every numeric headline, the active-filter convention, the enum values, and the per-bucket aggregate counts from the source DB. Because this revision only adds projected columns (no change to filters, joins, or row-grain inflation behaviour), the validated baseline carries over and the new extract is a strict superset projection of the prior one at the same slot grain.

The single behavioural change beyond the projection: slots.id is no longer in the projection (it was carried in the prior extract for dedup purposes only and was always intended to be dropped from the CSV). The dedup is now driven by the natural key (company, location, jod_job_title, job_type_id, job_type_title, starts_at, ends_at, hourly_rate).

Purpose of this revision​

Schema additions verified​

jod_jobs (relevant new columns used)​

job_types (newly joined)​

Filters applied (unchanged from prior extraction)​

Final extraction query​

Verification query (independent COUNT)​

Row count and comparison with the prior extract​

CSV shape​

Reproducing​

Prior-extract validation referenced​