Orchestration
- Master/Server: job intake (DB updates, email triggers, cron)
- RabbitMQ: event buffering, routing, backpressure
- State machine: job lifecycle (queued → running → retry → success/fail)
- Retries: transient failure handling with backoff
- Notifications: email service for permanent failures
Execution
- Jenkins (master ↔ slaves): dispatch, logs, artifacts
- Slave PCs (~14): scheduler context runs jobs; progress reports back
Testing projects
- Playwright/Appium: mobile functional tests (scoped test plans)
- Metrics capture: cold/warm start, frame pacing, API RTT
Data & Config
- Database (e.g., Postgres/MySQL): job queue + status
- Config: per-plan scope (devices, envs, data) as code
Observability
- Logs & traces: Jenkins + exporters
- Dashboards/alerts: (Grafana/Prometheus optional)
Infra
- Docker / Kubernetes for scaling servers and workers
Job Flow (end-to-end)
- DB/email/cron raises a job request on the server.
- Server pushes an event into RabbitMQ.
- Master pulls from queue, decides target slave (capacity/affinity).
- Master triggers job on slave via Jenkins.
- Slave runs in scheduler context; reports progress / artifacts.
- Server state machine updates status; applies retries if needed.
- On permanent failure, email notifier alerts monitoring team.