Oracle Data Lake
An Oracle Platinum Partner, Vertice Expert Services has one of the largest, dedicated Oracle-certified Analytics practice in Europe. Oracle Data Lake is a place to store your structured and unstructured data, as well as a method for organising large volumes of highly diverse data from diverse sources. Bringing data together into a single place can be useful, and depending on your platform, a data lake can make this process much easier. For this reason, Data lakes are becoming increasingly important as business and technology professionals want to perform broad data exploration and discovery.
Why Data Lakes?
Data Lakes can handle many data structures, such as unstructured and multi-structured data, and it can help you get value out of your data. With a Data Lake, you want to get your data in there as quickly as possible so that companies with operational use cases, especially around operational reporting, analytics, and business monitoring, have the newest data so that as they’re running their processes multiple times during a single business day, they can actually see the latest things that are happening in the operations.
With Data Lake you’re usually ingesting data in the original form without altering it. An example of doing this is that advanced analytics actually depends on detailed source data. This would be analytics based on any kind of mining, whether it’s:
- Text mining
- Data mining
- Statistical analysis
- Anything involving clustering
- Graph analytics
Types of Data Lakes
Hadoop has proved to have linear scalability. It’s a low cost for scalability compared to, say, a relational database. But Hadoop is not just cheap storage. It’s also a powerful processing platform useful for algorithmic analytics. Hadoop hasn’t replaced anything whereas it is mixed in with relational databases and in today’s modern warehouses there’s Hadoop thrown into the mix.
We’re seeing it there to help data warehouses scale better. But there are also different ways for users to design their warehouses and many people design the warehouse primarily as a data store for different forms of reporting, whether it’s traditional reports or new innovative approaches to reporting like dashboards, scorecards, and so forth. In those cases, your warehouse may or may not be the best environment for the detailed source data that a lot of analytics needs. And that’s why Hadoop is brought in, to deal with large volumes of detailed source.
The benefit of clouds is elastic scalability; they can marshal server resources and other resources as workloads scale up and compared to a lot of on-premises systems, Cloud can be low-cost due to the cloud provider who already having integration covered. You basically just buy a license and you can start using that stuff within hours instead of months. In addition, you can have a hybrid mix of platforms with a data lake.
If you’re familiar with the logical data warehouse, you can also have a similar logical data lake. This is where data is physically distributed across multiple platforms. And there are some challenges to that, like if you want to do far-reaching analytic queries, and a lot of you do, then you need special tools that are really good with federated queries or data virtualization and things of that nature to help you with that. But that technology is available at the tool level, and many people are using it.
Relational Database Management System (RDBMS): The relational database management system can also be a platform for the data lake, because some people have massive amounts of data that they want to put into the lake that is structured and also relational. If your data is inherently relational, a DBMS approach for the data lake would make perfect sense. Also, if you have use cases where you want to do relational functionality, like SQL, complex table joins, that kind of thing, then the RDBMS makes perfect sense.