ETL with Azure Cookbook
上QQ阅读APP看书,第一时间看更新

Preface

Extract, Transform, Load (ETL) is one of the most common and tedious procedures for moving and processing data from one database to another. With the help of this book, you will be able to speed up the process by designing efficient ETL solutions using the Azure services available for handling and transforming any data to suit your requirements.

With this cookbook, you will become well versed in all the features of SQL Server Integration Services (SSIS) to perform data migration and ETL tasks that integrate with Azure. You will learn how to transform data in Azure and understand how legacy systems perform ETL on-premises using SSIS. Later chapters will get you up to speed with connecting and retrieving data from SQL Server 2019 Big Data Clusters, and even show you how to extend and customize the SSIS toolbox using custom-developed tasks and transformations. This ETL book also contains practical recipes for moving and transforming data with Azure services, such as Data Factory and Azure Databricks, and lets you explore various options for migrating SSIS packages to Azure. Toward the end, you will find out how to profile data in the cloud and automate package development with Business Intelligence Markup Language (BIML).

By the end of this book, you will have developed the skills you need to create and automate ETL solutions on-premises as well as in Azure.

Who this book is for

This book is for data warehouse architects, ETL developers, or anyone else who wants to build scalable ETL applications in Azure. Those looking to extend their existing on-premises ETL applications to use big data and a variety of Azure services, or others interested in migrating existing on-premises solutions to the Azure cloud platform, will also find the book useful. Familiarity with SQL Server services is necessary to get the most out of this book.

What this book covers

Chapter 1, Getting Started with Azure and SSIS 2019, describes, in step-by-step fashion, how to set up SQL Server 2019 to deploy the features that are used throughout the book. It also covers how to set up an Azure subscription.

Chapter 2, Introducing ETL, explains why ETL is needed, where it comes from, and the differences between ETL and ELT.

Chapter 3, Creating and Using SQL Server 2019 Big Data Clusters, discusses how to create, deploy, connect to, and retrieve data from SQL Server 2019 Big Data Clusters.

Chapter 4, Azure Data Integration, talks about the Azure Feature Pack, which allows SSIS to integrate Azure data from blob storage and HDInsight clusters.

Chapter 5, Extending SSIS with Custom Tasks and Transformations, talks about extending and customizing the built-in toolset with custom-developed control flow tasks and data flow transformations.

Chapter 6, Azure Data Factory, talks about using Azure Data Factory to move and transform data in the cloud.

Chapter 7, Azure Databricks, talks about using Azure Databricks to transform data in the cloud.

Chapter 8, SSIS cloud Migration Strategies, talks about scaling out SSIS packages to Azure.

Chapter 9, Profiling data in Azure, contains various recipes that show you how to perform data profiling in Azure Databricks.

Chapter 10, Manage SSIS and Azure Data Factory with BIML, contains various recipes that demonstrate how to manage SSIS and Azure Data Factory with BIML using metadata.

To get the most out of this book

Basic or working knowledge of SQL Server, SSIS, and Azure is needed, as is basic knowledge of ETL techniques and Azure SQL Server databases. You also need a personal computer running Microsoft Windows 10.

Some of the recipes in this cookbook contain partial, incomplete solutions that you will complete by following the instructions in the recipe. If, for some reason, you get stuck and are unable to complete the tasks successfully, the complete copies of the solutions are available in the Solution folder of the corresponding chapter. Please, use these files as a reference when working on the recipes; they are there to assist you, not to prevent you from learning by performing the work yourself.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the Support tab.
  3. Click on Code Downloads.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  1. WinRAR/7-Zip for Windows
  2. Zipeg/iZip/UnRarX for Mac
  3. 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/ETL-with-Azure-Cookbook In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800203310_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "For the location, use hivesource as the folder path and sales.dat as the filename."

A block of code is set as follows:

INSERT OVERWRITE TABLE ETLInAzure.SalesAgg

SELECT storename , zipcode , SUM(unitcost) AS unitcost, AVG(unitprice) AS unitprice, SUM(salesamount) AS salesamount, SUM(salesquantity) AS salesquantity, CalendarMonth

FROM etlinazure.salesource

GROUP BY storename, zipcode, CalendarMonth;

Any command-line input or output is written as follows:

USE master;

GO

CREATE DATABASE Staging;

GO

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Click on the list beside the Enumerator property and select Foreach Azure Blob Enumerator."

Tips or important notes

Appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.