Senior Product Engineer - Full Stack
Earthmover
Earthmover is building a product that solves the biggest storage and compute challenges of our time for users in the deep sciences - think climate, earth observations, biotech, machine learning, and AI.
The Earthmover platform today has two components. Our first product, Arraylake, is a data lake platform for multi-dimensional scientific array data. It enables users to manage and process array data in open standard cloud-native formats, with features like versioning, cataloging, collaboration and access control on top. Our second product, Flux, sits on top of Arraylake and provides an API gateway to multi-dimensional data through standard-compliant APIs. We are also heavy open source contributors in the scientific data space, leading the development of projects such as Zarr, Xarray and our own open source multidimensional array store, Icechunk. We’re actively building powerful features and experiences on top of Icechunk, Arraylake and Flux while also adding more products to our platform to advance the field of scientific computing and accelerate progress in many domains. Some specific problems we’re thinking about:
- Collaboration between scientists, teams and organizations. Getting discoverability, sharing, and access controls right.
- Transparent, high performance array access, along any dimension.
- Groundbreaking open access public datasets to the entire scientific community.
- Generating insights for organizations about how they can optimize datasets on our platform.
- Navigation, exploration, and visualization of nuanced hierarchical scientific data structures.
- Providing highly scalable, highly available, multi-region APIs for data discovery and delivery
JD
As a scientific data platform company, our product has two key objectives: provide a best in class array storage and processing system, and deliver an exceptional collaborative user experience on top of this. We’re at a stage in our product development lifecycle where we’ve built a killer foundation, and our focus is now on shipping new features and products that help our users solve challenges unique to scientific data.
Responsibilities
- Expose APIs on top of our data layer that help us build great user experiences and data products for customers.
- Design and implement features that allow our customers to quickly extract insights from their data and processes.
- Own our API control layer, and continue to expand access control features for organizations, users, and API keys.
- Own and improve our entity model, expanding our ability to monitor usage and integrate billing.
- Own our migration to a fully multi-region, multi-cloud platform.
- Work on core parts of our API performance and quality, bringing best practices around things like stability, monitoring, versioning, and client integrations.
- Improve overall query performance and consistency wherever it’s needed in our API or database layer.
You’re a great fit if you
- Are product minded, and excited to work directly with customers & scientists to shape our product to solve their problems.
- Have experience designing and building high quality APIs.
- Have experience building control layers, user permission systems, or access control tools.
- Have experience building highly-available, auto-scaling compute services (e.g. ECS, K8s)
- Would enjoy collaborating with other across the whole stack to rapidly iterate on new product features, in particular our web and client libraries. You should be able to understand the whole stack, from low level cloud performance to usability and user experience concerns.
- Have at least 6 years experience as a software engineer working on backend systems. Our stack is written in Python, TypeScript, and Rust, experience with any typed backend language is valuable.
- Have a genuine enthusiasm for this job description, even if you don’t yet have experience with all of the listed responsibilities. We have a top class team and are open to helping the right candidate grow into these responsibilities over time.
Our stack
- We’re building Arraylake to be a cloud native data platform. We are deployed on AWS and have active Google Cloud, Azure and on-prem environments.
- Our service and client library are both written in Python, leveraging asynchronous interfaces (e.g. FastAPI, HTTPX, Motor, Aiobotocore).
- Subsets of our stack and Icechunk, our open source array database, are written in Rust, and we’re interested in increasing this footprint over time.
- Our infrastructure as code stack is based on Pulumi and is written in Typescript.
- Our front-end application is built on NextJS and is deployed on Vercel.