Highlight

Excel files are one of the most commonly used file format on the market. Popularity of the tool itself among the business users, business analysts and data engineers is driven by its flexibility, ease of use, powerful integration features and low price.

Intro

This is why every data engineer out there should be to understand advantages and disadvantages of this format. The variety of different internal formats like XLS, XLSX, XLSB and XLSM and which tools to use in order to process those files effectively in the cloud.

Today I bring to you a quick introduction to the process of building ETL solutions with Excel files in Azure using Data Factory and Databricks services.

Code samples: https://github.com/MarczakIO/azure4everyone-samples/tree/master/azure-excel-file-processing-with-data-factory-and-databricks

Agenda

  • 00:00 Introduction
  • 00:25 Excel Business Justification
  • 01:22 Excel Challenges
  • 02:20 Supported Services
  • 04:30 Data Factory Introduction
  • 05:35 Demo Setup
  • 07:13 Demo using Data Factory
  • 13:36 Databricks Introduction
  • 14:44 Databricks Setup
  • 18:14 Databricks Demo - Reading Excels
  • 20:55 Databricks Demo - Reading Excels using References
  • 25:56 Databricks Demo - Workbook Metadata
  • 28:05 Databricks Demo - Defining Schema
  • 30:03 Databricks Demo - Defining Schema
  • 32:53 Additional Options

Video

Next steps for you after watching the video

  1. Excel format in Data Factory
  2. Spark Excel by Crealytics documentation

Adam Marczak

Programmer, architect, trainer, blogger, evangelist are just a few of many titles. What I really am, is a passionate technology enthusiast. I take great pleasure in learning new technologies and finding ways in which it can aid people every day. My latest passion is running an Azure 4 Everyone YouTube channel, where I show that Azure really is for everyone!

Did you enjoy the article?

Share it!

More tagged posts