Summary
Overview
Work History
Education
Skills
Certification
Extracurricular Activities
Projects
Timeline
Generic

Venkata Siva Babburi

Lakeville

Summary

Data Engineer with over 1 year of experience in designing and optimizing cloud-native data pipelines on Azure and AWS. Proficient in Python, SQL, Spark, and Azure Data Factory, with expertise in big data systems, ETL workflows, and data warehousing. Demonstrated success in integrating diverse data sources and automating workflows to convert raw data into actionable insights. Strong collaborator in cross-functional teams, delivering reliable and performance-driven data solutions.

Overview

2
2
years of professional experience
1
1
Certification

Work History

Data Engineer

Optum
Lakeville
01.2024 - Current

• Delivered a project to migrate legacy on-premises processes to the cloud using Big Data technologies EMR, Spark, Python, SQL and S3, reducing processing time by 20%.

• Designed and implemented advanced scheduling capabilities using Airflow for data pipeline orchestration, reducing manual intervention time by 80%, and streamlining workflow efficiency.

• Designed and implemented advanced scheduling capabilities using Airflow for data pipeline orchestration, reducing manual intervention time by 80%, and streamlining workflow efficiency.

• Authored ETL scripts with AWS Glue, migrating data from AWS RDS to AWS Redshift.

• Worked on Amazon Redshift to design and implement stored procedures for processing large volumes of data. Optimized SQL queries and improved performance by using appropriate distribution styles (DISTSTYLE) and sort keys.

Education

Master - I.T

Minnesota State University, Mankato
Mankato
12-2023

Bachelor of Science - Electronics & Communication

KL University
India
01.2020

Skills

Programming Languages: Python SQL Java

Big Data Technologies: Apache Spark PySpark Spark SQL Hadoop Hive AWS EMR
Cloud Platforms & Services: AWS S3 AWS Glue AWS Lambda AWS Kinesis AWS RDS Azure SQL Azure ADLS Azure Data Factory Azure Databricks
SQL Databases: MySQL Microsoft SQL Server Oracle MongoDB Cassandra HBase Neo4j Azure Cosmos DB

Data Engineering Tools: Kafka Snowflake Oracle Data Modeler Apache NiFi Sqoop Flume IBM DataStage
Workflow Orchestration: Apache Airflow Azure Data Factory
DevOps & Collaboration: Git Docker Kubernetes Jenkins Terraform
Data Validation & Governance (optional addition): Great Expectations pandas-profiling pyjanito

Certification

  • GCP Associate Cloud Engineer
  • Databricks Associate Data Engineer

Extracurricular Activities

Studying research papers and journals of the latest projects Attending conferences of Minnesota Acute Compute Machinery organization Helping others in projects and reviewing the project documents

Projects

  • Interactive Data Visualization and Statistical Analysis of Urban Bike-Sharing Patterns Using Tableau
  • SQLiteAutomated Web Scraping and Machine Learning-Based Salary Prediction from U.S. LinkedIn Job Postings
  • Distributed Big Data Pipeline for Book Recommendation System Using Hadoop Ecosystem, Spark, Kafka, and Python
  • Design and Implementation of a Multi-Node Heterogeneous Distributed Database Architecture for Food Truck Operations
  • Regression Tree Modeling and Financial Market Trend Prediction Using Historical Commodity and Stock Data in SAS
  • Binary Logistic Regression Model for Predicting Gestational Diabetes Risk Using R and Statistical Inference Techniques

Timeline

Data Engineer

Optum
01.2024 - Current

Master - I.T

Minnesota State University, Mankato

Bachelor of Science - Electronics & Communication

KL University
Venkata Siva Babburi