Data Engineer with over 1 year of experience in designing and optimizing cloud-native data pipelines on Azure and AWS. Proficient in Python, SQL, Spark, and Azure Data Factory, with expertise in big data systems, ETL workflows, and data warehousing. Demonstrated success in integrating diverse data sources and automating workflows to convert raw data into actionable insights. Strong collaborator in cross-functional teams, delivering reliable and performance-driven data solutions.
• Delivered a project to migrate legacy on-premises processes to the cloud using Big Data technologies EMR, Spark, Python, SQL and S3, reducing processing time by 20%.
• Designed and implemented advanced scheduling capabilities using Airflow for data pipeline orchestration, reducing manual intervention time by 80%, and streamlining workflow efficiency.
• Designed and implemented advanced scheduling capabilities using Airflow for data pipeline orchestration, reducing manual intervention time by 80%, and streamlining workflow efficiency.
• Authored ETL scripts with AWS Glue, migrating data from AWS RDS to AWS Redshift.
• Worked on Amazon Redshift to design and implement stored procedures for processing large volumes of data. Optimized SQL queries and improved performance by using appropriate distribution styles (DISTSTYLE) and sort keys.
Programming Languages: Python SQL Java
Big Data Technologies: Apache Spark PySpark Spark SQL Hadoop Hive AWS EMR
Cloud Platforms & Services: AWS S3 AWS Glue AWS Lambda AWS Kinesis AWS RDS Azure SQL Azure ADLS Azure Data Factory Azure Databricks
SQL Databases: MySQL Microsoft SQL Server Oracle MongoDB Cassandra HBase Neo4j Azure Cosmos DB
Data Engineering Tools: Kafka Snowflake Oracle Data Modeler Apache NiFi Sqoop Flume IBM DataStage
Workflow Orchestration: Apache Airflow Azure Data Factory
DevOps & Collaboration: Git Docker Kubernetes Jenkins Terraform
Data Validation & Governance (optional addition): Great Expectations pandas-profiling pyjanito
Studying research papers and journals of the latest projects Attending conferences of Minnesota Acute Compute Machinery organization Helping others in projects and reviewing the project documents