Data Lake Overview?

🔸What is a data lake?

A data lake is a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data. It can store data in its native format and process any variety of it, ignoring size limits.

🔸What is a data lake example?

A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).
🔸What is a data lake vs database?

What is the difference between a database and a data lake? A database stores the current data required to power an application. A data lake stores current and historical data for one or more systems in its raw form for the purpose of analyzing the data.

🔸Is SQL a data lake?
SQL is being used for analysis and transformation of large volumes of data in data lakes. With greater data volumes, the push is toward newer technologies and paradigm

changes. SQL meanwhile has remained the mainstay 🔸Is data lake is ETL tool? ETL (Extract Transform and Load) is what happens within a Data Warehouse and ELT within a Data Lake. ETL is the most common method used when transferring data from a source system to a Data Warehouse.

Comments