Job Description
The Mission
We operate mission-critical market data and analytics platforms that ingest, enrich, calculate, and publish financial information for internal and external products. Our applications span end- to-end workflows: collecting real-time and end-of-day data feeds, preparing and consolidating reference data and corporate actions, running calculation services, and delivering datasets and reports to downstream tools and clients across multiple channels.
As a Senior SRE, you will engineer reliability across this heterogeneous ecosystem, web frontends, batch pipelines, real-time components, calculation services, and multi-database layers, ensuring resilient scaling, strict uptime, and secure, verifiable data delivery across QA and PROD environments.
What You Will Do
•Engineer Reliability: Define SLOs and Error Budgets for web services, batch pipelines, and real-time data flows. Measure, review, and iterate on reliability improvements with product teams.
•Eliminate Toil: Automate deployments, batch operations, and data-transfer workflows. Build self-service tooling for common run tasks and health checks across environments.
•Lead Response: Own incident response for web tiers, application services, data
pipelines, and databases. Conduct blameless postmortems and implement preventive guardrails.
•Advocate for SRE: Coach teams on production ownership, SLIs/SLOs, runbooks, and readiness reviews. Standardize operational best practices across heterogeneous apps.
•Architect for Scale: Drive Production Readiness Reviews to ensure services are observable, capacity-aware, and disaster-ready across web, batch, and data layers.
•Modernize the Stack: Lead the transition from legacy components and manual processes to infrastructure-as-code, templated deployments, and centralized observability using Ansible, Helm, GitLab CI/CD, and Elasticsearch.
What You Bring
•10+ years in SRE/DevOps running business-critical systems, with experience across mixed environments (Linux/UNIX, some Windows) and multi-environment release practices.
•Strong Linux/UNIX fundamentals, plus solid coding/scripting for automation (e.g., Java, Python, shell). Experience running and tuning web/application tiers, batch workloads, and relational databases. Configuration-as-code mindset.
•Hands-on production experience with the following (required):
oAnsible for infrastructure-as-code, configuration management, and repeatable deployments.
oAbility to define and manage environment-specific release configurations.
oGitLab CI/CD for build/test/deploy pipelines, gated releases, and automated checks.
oElasticsearch for centralized logging/observability, pipeline design, index lifecycle, and query/visualization for incident analysis.
•Proven ability to implement end-to-end observability: define SLIs, instrument services, build actionable dashboards/alerts, and trace issues across web tiers, applications, jobs, and databases using common monitoring stacks.
•Demonstrated experience teaching SRE practices (SLOs, error budgets, postmortems), improving on-call quality of life, and influencing teams to build operable, debuggable services.
•Familiarity with financial data workflows (reference data, prices, corporate actions, entitlements, calculations, reporting/publishing) and secure data exchange patterns.
•Excellent communication in English and cross-team collaboration skills; ability to write clear runbooks, lead incidents, and partner with product/engineering and operations.
Valued / Nice-to-Have
•Legacy/niche middleware and protocols: CORBA/Orbix, IIOP; UDP/TCP messaging patterns.
•Scheduler: $Universe.
•Datastores beyond common RDBMS: Sybase ASE, BerkeleyDB (real-time).
•App stack specifics: Apache Tomcat/HTTPD mod_proxy tuning, Nginx as proxy, Java frameworks (Spring/Struts/JSP/GWT).