Research Technology Engineer

The Team

The Research Technology team at XTX Markets takes care of all aspects of our research infrastructure. Tasks range from writing the software which manages fair and efficient distribution of work on our compute cluster to tuning operating systems, storage, and networks, evaluating high-end leading edge new compute technology (often before it’s formally released) and designing the datacentre environment for it to run in. We are a full stack team that works side-by-side with our quantitative researchers to make the most performant, reliable and transparent system we can on one of the larger HPC clusters in existence.

The role

The right candidate will:
- Contribute to all components of our HPC infrastructure and code, and work on growing one of the biggest private compute clusters anywhere.
- Write software that runs on a compute cluster that grows continually but currently has ~1PB memory, ~100K CPU cores, 1000’s of compute offload devices, 60+PB of high performance storage, connected by a multi-Tb networks.
- Enter an environment where improvements can usually be made very quickly, and where the results of those changes are both immediately visible and can make a large impact to the quantitative research function at the heart of our business.

The Skills

- Strong Linux engineering skills including developing automated builds and patching cycles, rolling new kernels, and fixing open source tools.
- Excellent skills at the command line, shell scripting should be second nature to you.
- Comfortable writing code in a language such as Python
- Configuration management and deployment using Puppet, Ansible or similar
- Demonstrable ability to take on complicated projects and deliver good quality work with a minimum of oversight.
- A good working knowledge of network and storage technologies.
- Container technologies such as podman, docker and Kubernetes.
We would not expect someone to have all the following skills or experience, but some subset would be preferred and gives an idea of the wide range of technologies we work with:
- Experience in other large-scale high performance compute environments.
- Monitoring and alerting experience with prometheus/alertmanager/grafana
- System admin level knowledge of hardware, networking and/or file storage systems, e.g. Dell, HP, Cisco, ZFS, NFS, Beegfs, WekaFS

- Database technologies like Mariadb / ProxySQL
- Compute offload using GPUs (and other more niche hardware)
- Machine learning frameworks eg: pytorch

The candidate

- Proven track record of delivering complex technical projects with 4 or more years of experience.
- A good STEM degree from a reputable tertiary educational establishment is preferred.
- Top-notch technical credentials, and a drive to achieve is essential.