
Hello there! 👋My name is Karim

Data Engineer with 6 years of experience and a background in Computer Science.

Data Expert ⚙️

Strong team player who develops and maintains large-scale infrastructure and implements complex data pipelines.

Personal Projects

Fraud Risk Assessment Streaming ETL

Fraud Risk Assessment Streaming ETL
KafkaAzureDatabricksSnowflakePythonSpark Streaming
View on GitHub

Lane Line Recognition System

Lane Line Recognition System
View on GitHub

Historical Data Processing with PySpark

Historical Data Processing with PySpark
PySparkPythonData ProcessingDockerHDFS
View on GitHub

Open Source Contributor Index (OSCI)

Open Source Contributor Index (OSCI)
Open SourcePythonAzureAzure Data FactoryAzure FunctionsBigQueryGoogle Data StudioDatabricks
View on GitHub

Tiny URL Service

Tiny URL Service
View on GitHub

Car Model Classification with TensorFlow

Car Model Classification with TensorFlow
PythonTensorFlowFastAPIDockerImage Classification
View on GitHub

Coming Soon 🚧

Coming Soon 🚧


Programming Languages




Big Data Tools

HadoopHDFSYARNPySparkSpark StreamingDelta LakeQlik CDC

Cloud Platforms

Azure Synapse AnalyticsAzure Event HubAzure Data FactoryAzure DatabricksGoogle BigQuerySnowflake

DevOps & CI/CD Tools

AirflowJenkinsGitLab CI/CDDockerKubernetesPrometheusNginx

Work Experience

Data Engineer

New Yorker GmbH & Co.KG

01/2023 - Present

Braunschweig, Germany

  • Designed, enhanced, and maintained Data Lake infrastructure and removed data quality issues.
  • Implemented and maintained Airflow DAGs for efficient data workflow orchestration.
  • Analyzed requirements for the new CDC; maintained legacy processes and integrated new pipelines with Qlik Replicate and Compose.
  • Documented technical processes for CDC, Data Lake, and other projects, including workflows and requirements.
  • Created and optimized CI/CD pipelines via Jenkins, Ansible, and GitHub Actions.
  • Collaborated with cross-functional teams in a dynamic business environment.
  • Utilized Bash scripts to automate data transfer and streamline processes.


  • Improved data accessibility and quality in the Data Lake, resulting in time and cost savings for multiple departments.
  • Streamlined the Rotation Call process by automating tasks with Bash scripts, reducing time and effort by 30%.
  • Migrated from an outdated CDC tool to Qlik, enhancing reliability and maintainability of data pipelines.

Backend Developer (HiWi)

Technische Universität Ilmenau

06/2022 - 12/2022

Ilmenau, Germany

  • Created RESTful API services via Django for observing endangered bees in Germany.
  • Set up Docker using docker-compose for controlling bees’ activity.
  • Added PostGIS extension to work with German TK25 coordinates.
  • Configured Nginx and Gunicorn settings in the project.
  • Set up CI/CD with unit and integration tests with GitLab CI/CD.
  • Enforced clean code guidelines.


  • Developed a backend application for monitoring endangered bees in Germany.

Data Engineer

EPAM Systems Inc.

12/2020 - 03/2022

Kazan, Russia

  • Implemented ETL processes in an open-source tracking project using Azure Cloud, Azure Data Factory, Databricks, PySpark, and Python.
  • Mentored students and juniors in Python; served as onboarding buddy, guiding new employees through processes and tools.


  • Expanded an open-source GitHub activity dashboard for major companies and an internal version to encourage employee engagement and contributions.

Backend Developer

Akvelon Inc.

05/2019 - 10/2020

Kazan, Russia

  • Implemented RESTful API services via Django, with CI/CD and Docker for an HR portal service that allows tracking of candidates’ hiring processes.
  • Designed database architecture in PostgreSQL.

Data Scientist

Meanotek AI

05/2018 - 03/2019

Kazan, Russia

  • Assisted a client in converting medical paper documents into structured databases using LSTM neural networks and Python.



  • IELTS Academic: B2


  • English: C1
  • German: A2
  • Russian: Native Speaker