Books+ Search Results

PolyBase revealed data virtualization with SQL server, Hadoop, Apache Spark, and beyond

Title
PolyBase revealed [electronic resource] : data virtualization with SQL server, Hadoop, Apache Spark, and beyond / Kevin Feasel.
ISBN
9781484254615
1484254619
9781484254608
Published
Berkeley, CA : Apress L. P., 2020.
Physical Description
1 online resource (320 p.)
Local Notes
Access is available to the Yale community.
Notes
Description based upon print version of record.
Predicate Pushdown Failure
Includes index.
Access and use
Access restricted by licensing agreement.
Summary
Harness the power of PolyBase data virtualization software to make data from a variety of sources easily accessible through SQL queries while using the T-SQL skills you already know and have mastered. PolyBase Revealed shows you how to use the PolyBase feature of SQL Server 2019 to integrate SQL Server with Azure Blob Storage, Apache Hadoop, other SQL Server instances, Oracle, Cosmos DB, Apache Spark, and more. You will learn how PolyBase can help you reduce storage and other costs by avoiding the need for ETL processes that duplicate data in order to make it accessible from one source. PolyBase makes SQL Server into that one source, and T-SQL is your golden ticket. The book also covers PolyBase scale-out clusters, allowing you to distribute PolyBase queries among several SQL Server instances, thus improving performance. With great flexibility comes great complexity, and this book shows you where to look when queries fail, complete with coverage of internals, troubleshooting techniques, and where to find more information on obscure cross-platform errors. Data virtualization is a key target for Microsoft with SQL Server 2019. This book will help you keep your skills current, remain relevant, and build new business and career opportunities around Microsoft's product direction. You will: Install and configure PolyBase as a stand-alone service, or unlock its capabilities with a scale-out cluster Understand how PolyBase interacts with outside data sources while presenting their data as regular SQL Server tables Write queries combining data from SQL Server, Apache Hadoop, Oracle, Cosmos DB, Apache Spark, and more Troubleshoot PolyBase queries using SQL Server Dynamic Management Views Tune PolyBase queries using statistics and execution plans Solve common business problems, including "cold storage" of infrequently accessed data and simplifying ETL jobs.
Variant and related titles
O'Reilly Safari. OCLC KB.
Other formats
Print version: Feasel, Kevin PolyBase Revealed : Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond Berkeley, CA : Apress L. P.,c2020
Format
Books / Online
Language
English
Added to Catalog
July 30, 2020
Contents
Intro
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Chapter 1: Installing and Configuring PolyBase
Choose the Form of Your PolyBase
Installing PolyBase Standalone-Windows
Installing PolyBase Scale-Out Group
Building a Configuration File
Installing Without a GUI
Installing PolyBase Standalone-Linux
Configuring PolyBase
Configuring a Client
Enable PolyBase
Mandatory Configuration
Scale-Out Group Configuration
Troubleshooting Common Errors
Testing for Success
Conclusion
Chapter 2: Connecting to Azure Blob Storage
Making Preparations in Azure
Create a Storage Account
Upload Data
Building a Link
Credentials
External Data Sources
External File Formats
Delimited Files
Flat File Compression
Define an External File Format
External Tables
Querying External Data
Inserting into External Tables
PolyBase Data Insertion Considerations
PolyBase Is Insert-Only
Insert Only into Folders
Conclusion
Chapter 3: Connecting to Hadoop
Hadoop Prerequisites
Preparing Files in HDFS
Gather Configuration Settings
Configuring SQL Server
Update PolyBase Configuration Files
External PolyBase Objects for Hadoop
Credentials
External Data Sources
External File Formats
Delimited Files
RCFile
ORC
Parquet
External Tables
Querying Data in Hadoop
Row Counts with Police Incident Data
Newlines and Quotes with Fire Incident Data
Going Faster with Parking Violations Data
Inserting Data into Hadoop
Conclusion
Chapter 4: Using Predicate Pushdown to Enhance Query Performance
The Importance of Predicate Pushdown
Predicate Pushdown in PolyBase
Diving into Predicate Pushdown
Packet Capture Without Predicate Pushdown
Packet Capture with Predicate Pushdown
When Predicate Pushdown Makes Sense
Small Data: Raleigh Police Incidents
Bigger Data: New York City Parking Violations
Limitations in Pushdown-Eligible Predicates
Limitations on Pushdown with Complex Filters
MapReduce and Pushdown in Summary
Conclusion
Chapter 5: Common Hadoop and Blob Storage Integration Errors
Finding the Real Logger
PolyBase Log Files
DMS Errors
DMS Movement
DWEngine Errors
DWEngine Movement
DWEngine Server
DMS PolyBase
DWEngine PolyBase
Hadoop Log Files
Job Tracker
YARN Resource Manager
JobHistory UI
NameNode Logs
Log Files
Configuration Issues
SQL Server Configuration
Check External Resources
Check SQL Server Configuration Files
Hadoop-Side Configuration
Invalid User Permissions or Missing Account
Could Not Obtain Block
Host File Pointing to 127.0.0.1
Kerberos Should Be On or Off, Not Both
PolyBase and Dockerized Data Nodes
Data Issues
Structural Mismatch
Unsupported Characters or Formats
PolyBase Data Limitations
Curate Your Data
Citation

Available from:

Online
Loading holdings.
Unable to load. Retry?
Loading holdings...
Unable to load. Retry?