Skip to main content

Legacy pay rate bands extraction — query record

Date: 2026-05-18 Window: 2025-05-18 → 2026-05-18 (last 12 months) Database: jodgig_2026_clean (legacy MySQL) Output CSV: /docs/30-49-domains/35-gig/statistics/2026-05-18-legacy-pay-rate-bands.csv Row count (deduped): 10,166 data rows Supersedes: the prior 7,286-row extract (no-role-column version), which was independently validated as CONFIRMED in /docs/30-49-domains/35-gig/scratchpad/2026-05-18-legacy-pay-rate-bands-validation.md. This re-extract adds three role columns (jod_job_title, job_type_id, job_type_title) and overwrites the CSV; the previous filters and joins are unchanged and are not re-validated here — only the added columns and the new dedupe count are.

Purpose of this revision

Add per-company role information so the same CSV can serve two purposes simultaneously:

  1. Inform the team's mental model of how rich the per-company Org::JobRole taxonomy is (input to design discussion).
  2. Stand in as the Stage-1 extract for the Org::JobRole UC-8 legacy bulk import (admin will Stage-2 review and consolidate near-duplicates and per-shift variants).

Schema additions verified

jod_jobs (relevant new columns used)

ColumnTypeNotes
job_titlevarchar(255) NOT NULLFree-text role name typed by the company admin when creating the job. Closest legacy analog to Org::JobRole.title. Has high free-text noise (see findings).
job_type_idbigint unsigned NULLFK → job_types.id. The platform-wide taxonomy (Cashier, Waiter/Waitress, F&B, Retail, …). Nullable in the source table; LEFT JOINed so NULL is preserved as empty.

job_types (newly joined)

DESCRIBE job_types:

ColumnTypeNotes
idbigint PKFK target of jod_jobs.job_type_id
titlevarchar(255)Human-readable taxonomy label
descriptionvarchar(255)
wage_rangevarchar(255)
imagevarchar(255)
hex_colorvarchar(7)
statussmallintNot filtered in this extraction — we want the title regardless of whether the platform taxonomy row is currently active.
other standard audit/soft-delete columns

SELECT COUNT(*) FROM job_types = 57. Only 7 of the 57 IDs actually appear in the filtered data — see Summary below.

Filters applied (unchanged from prior extraction)

  1. jod_jobs.created_at >= '2025-05-18'
  2. jod_jobs.deleted_at IS NULL
  3. jod_jobs.status = 3 (completed — JOB_JOD_STATUS_COMPLETED)
  4. companies.status = 1 AND companies.deleted_at IS NULL
  5. locations.status = 1 AND locations.deleted_at IS NULL
  6. slots.deleted_at IS NULL
  7. At least one slot_user (not soft-deleted) joined to a payment with payment_status = 2 AND deleted_at IS NULL (PAYMENT_STATUS_PROCESSED)

job_types is joined as a LEFT JOIN (not inner) so that rows with job_type_id IS NULL are still emitted with empty job_type_id/job_type_title rather than being silently dropped.

Final extraction query

SELECT DISTINCT
jj.company_id,
c.name AS company_name,
jj.location_id,
l.name AS location_name,
jj.job_title AS jod_job_title,
jj.job_type_id,
jt.title AS job_type_title,
TIME(s.slot_start_date) AS starts_at,
TIME(s.slot_end_date) AS ends_at,
jj.hourly_rate
FROM jod_jobs jj
JOIN companies c ON c.id = jj.company_id AND c.status = 1 AND c.deleted_at IS NULL
JOIN locations l ON l.id = jj.location_id AND l.status = 1 AND l.deleted_at IS NULL
LEFT JOIN job_types jt ON jt.id = jj.job_type_id
JOIN slots s ON s.jod_job_id = jj.id AND s.deleted_at IS NULL
JOIN slot_user su ON su.slot_id = s.id AND su.deleted_at IS NULL
JOIN payments p ON p.id = su.payment_id AND p.payment_status = 2 AND p.deleted_at IS NULL
WHERE jj.created_at >= '2025-05-18'
AND jj.deleted_at IS NULL
AND jj.status = 3
ORDER BY jj.company_id, jj.location_id, jod_job_title, jj.job_type_id, starts_at, ends_at;

Verification query (independent COUNT)

SELECT COUNT(*) FROM (
SELECT DISTINCT
jj.company_id, c.name AS company_name,
jj.location_id, l.name AS location_name,
jj.job_title AS jod_job_title,
jj.job_type_id, jt.title AS job_type_title,
TIME(s.slot_start_date) AS starts_at,
TIME(s.slot_end_date) AS ends_at,
jj.hourly_rate
FROM jod_jobs jj
JOIN companies c ON c.id = jj.company_id AND c.status = 1 AND c.deleted_at IS NULL
JOIN locations l ON l.id = jj.location_id AND l.status = 1 AND l.deleted_at IS NULL
LEFT JOIN job_types jt ON jt.id = jj.job_type_id
JOIN slots s ON s.jod_job_id = jj.id AND s.deleted_at IS NULL
JOIN slot_user su ON su.slot_id = s.id AND su.deleted_at IS NULL
JOIN payments p ON p.id = su.payment_id AND p.payment_status = 2 AND p.deleted_at IS NULL
WHERE jj.created_at >= '2025-05-18' AND jj.deleted_at IS NULL AND jj.status = 3
) x;
-- Result: 10166

Row count and comparison with the prior extract

ExtractDistinct rowsRow-shape grain
Prior (validated, no role columns)7,286(company_id, location_id, starts_at, ends_at, hourly_rate)
This extract (with jod_job_title, job_type_id, job_type_title)10,166(company_id, location_id, jod_job_title, job_type_id, job_type_title, starts_at, ends_at, hourly_rate)
Delta+2,880Adding the role columns introduces extra distinct rows where multiple jod_job_title values share the same band+rate at the same outlet.

The +2,880 row delta is meaningful: it is the count of (outlet × band × rate) buckets where the same band+rate was used under more than one distinct role label. This is also the lower bound on Stage-2 admin consolidation work.

CSV shape

company_id, company_name,
location_id, location_name,
jod_job_title, job_type_id, job_type_title,
starts_at, ends_at, hourly_rate
  • jod_job_titlejod_jobs.job_title verbatim. Not trimmed, not case-folded, not whitespace-collapsed (per task spec — admin will handle in Stage-2).
  • job_type_id and job_type_title — empty string when jod_jobs.job_type_id is NULL.
  • starts_at, ends_at, hourly_rate — unchanged from prior extract.

Reproducing

mysql -uroot jodgig_2026_clean -B -e "<extraction SQL>" > /tmp/pay_rate_bands_with_roles_raw.tsv
python3 /tmp/build_pay_rate_csv_v2.py # converts NULL → empty, writes RFC4180 CSV (QUOTE_MINIMAL)

Prior-extract validation referenced

The prior CSV (same query but without role columns) was peer-reviewed and CONFIRMED in /docs/30-49-domains/35-gig/scratchpad/2026-05-18-legacy-pay-rate-bands-validation.md. The validator independently re-derived every numeric headline, the active-filter convention, the enum values, and the per-bucket aggregate counts from the source DB. Because this revision only adds projected columns (no change to filters, joins, or row-grain inflation behaviour), the validated baseline carries over and the new extract is a strict superset projection of the prior one at the same slot grain.

The single behavioural change beyond the projection: slots.id is no longer in the projection (it was carried in the prior extract for dedup purposes only and was always intended to be dropped from the CSV). The dedup is now driven by the natural key (company, location, jod_job_title, job_type_id, job_type_title, starts_at, ends_at, hourly_rate).