Data Engineer (Data Automation)
inDriver
Responsibilities
We create data solutions for Geo, Fintech, Safety, ESG, Government Relations, GDPR, and other areas. We are currently actively building a Data Warehouse — a key part of the product. We work with cutting-edge technologies (GCP, AWS, Airflow, Kafka, K8s) and make infrastructure and architectural decisions based on data.
One of the key areas is the development of a personalization system for our users. We are building a large-scale data infrastructure for analytics, machine learning, and real-time recommendations.
- Develop the data driven culture within the company
- Develop processes for data processing, storage, cleaning, and enrichment
- Design and maintain data pipelines from collection to consumption
- Develop APIs (REST, gRPC) for high load services
- Create infrastructure for storing and processing large datasets on K8S, Terraform
- Automate testing, validation, and monitoring of data
- Participate in system design and architectural decision making
Qualifications
- Who we are looking for Expert in Python 3.7+, experience with PySpark Deep knowledge of SQL
- Extensive experience building ETLs with Airflow2 Industrial experience with Kubernetes
- Understanding of data processing principles and algorithms
- Excellent knowledge of OOP, design patterns, clean architecture
- Proactivity, responsibility, and the ability to take ownership
- Would be a plus: Experience with high load services DevOps skills and CI/CD automation experience
Conditions & Benefits
- Stable salary, official employment
- Health insurance
- Hybrid work mode and flexible schedule
- Relocation package offered for candidates from other regions (only for Kazakhstan and Cyprus)
- Access to professional counseling services including psychological, financial, and legal support
- Discount club membership
- Diverse internal training programs
- Partially or fully paid additional training courses
- All necessary work equipment
Our tech stack
Languages: Python, SQL, Scala, Go
Frameworks: Spark, Apache Beam Storage and
Analytics: BigQuery, GCS, S3, Trino, other GCP and
AWS stack components Integration: Apache Kafka,
Google Pub/Sub, Debezium, Zero ETL, Firehose ETL: Airflow2
Infrastructure: Kubernetes, Terraform Development: GitHub, GitHub Actions, Jira