Irfan ElahiinLevel Up CodingEfficiently Performing Data Deduplication in Streaming Workloads using Delta Live TablesSubstantiating that how simple it is to implement in Databricks Delta Live Tables. Also highlighting a few common pitfalls.·7 min read·Jan 25, 2024--1--1
Irfan ElahiinLevel Up CodingOptimizing Merge Performance in Databricks — A Case StudyExplore Databricks features (e.g. DFP, Deletion Vectors) and data engineering principles to optimize performance of merge and joins in…·8 min read·Jan 8, 2024----
Irfan ElahiinLevel Up CodingMayday to Eureka! Cataloguing tables in AWS Glue when Crawler Just FailWhile automating it via Python, boto3 and S3 Select·6 min read·Jan 26, 2023----
Irfan ElahiinTowards Data ScienceTest Driving Delta Lake 2.0 on AWS EMR — 7 Key LearningsWhat I learned after using Delta Lake 2.0 on AWS EMR along with installation steps and performance benchmarks·8 min read·Oct 12, 2022----
Irfan ElahiinTowards Data ScienceGetting started with Delta Lake & Spark in AWS— The Easy Way!A step-by-step tutorial to configure Apache Spark and Delta Lake on EC2 in AWS along with code examples in Python·8 min read·Aug 31, 2022----
Irfan ElahiinLevel Up CodingVersioning Thy Infra: Tagging AWS resources with Git Commit Hash in CICD pipelinesBy using AWS CodePipeline, CloudFormation, Lambda and Python.·6 min read·Mar 7, 2022--1--1
Irfan ElahiinLevel Up CodingEfficiently Transforming, Compressing (in-memory) and Ingesting CSV files to AWS S3 using PythonA guide to optimize your AWS S3 ingestion processes via in-memory processing and compression of CSV files using Python and AWS SDK·6 min read·Jan 24, 2022--1--1
Irfan ElahiinLevel Up CodingWatchTower — The Missing Piece in Streamlining Amazon Cloudwatch and Application LogsHow to use WatchTower module in Python to integrate AWS Cloudwatch with Python’s logging module as an alternative of boto3·6 min read·Mar 2, 2021----
Irfan ElahiinTowards Data ScienceMachine Learning (kmeans clustering) in SparkML vs AWS SageMaker — My Two CentsMy experience of performing kmeans, an unsupervised machine learning algorithm, in SparkML and AWS SageMaker and their caveats·8 min read·Nov 9, 2019----
Irfan ElahiinTowards Data ScienceAWS Elastic MapReduce (EMR) — 6 Caveats You Shouldn’t IgnoreA few gotchas about AWS EMR and AWS Glue that you, as a developer/architect, should know·7 min read·Oct 28, 2019----