DEFINITION:
DataLlake ABB is an Artifact including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). A data lake can be established "on premises" (within an organization's data centers) or "in the cloud" (using cloud services from vendors such as Amazon, Microsoft, or Google).
Source: MarkLogic
(https://www.marklogic.com/product/comparisons/data-hub-vs-data-lake/#:~:text=Data%20hubs%20are%20data%20stores%20that%20act%20as,data%20and%20store%20it%20in%20an%20underlying%20database.)
INTEROPERABILITY SALIENCY:
IoP Dimension: Structural IoP
The Data Lake ABB is salient for technical interoperability because it provides a single trustworthy location to store and access the data. EIF recommendation 37 "Make authoritative sources of information available to others while implementing access and control mechanisms to ensure security and privacy in accordance with the relevant legislation".
EXAMPLES:
The following implementation is an example on how this specific Architecture Building Block (ABB) can be instantiated as a Solution Building Block (SBB):
Google Cloud data lakes:
Google Cloud data lakes support any analysis on any type of data. This allows to import, store and analyse large volumes of heterogeneous, high-fidelity data securely and cost-effectively.
(https://cloud.google.com/solutions/data-lake)
|
|
ID | ABB557 |
dct:type | eira:DataLake |