Over the years, we have received a lot of questions about Gluent Data Platform and how different functionality works within a production environment. We have put together an FAQ page, but that doesn’t cover everything you might want to know about the software. Here’s a question about orchestration and operational monitoring that came up recently, prompting a much longer response than we’d typically add to the frequently asked questions.
From an operational support point of view, will there be a [Gluent] module out of the box for us to configure alerts (in case of offload failures), including notifications?
Yes, we have options to configure alerts for both single table offload and bulk offload, depending on how you want to schedule these. Organizations can use their standard scheduler (for example cron, Control-M, Autosys) to schedule individual offload commands or many commands at once using a configuration-driven bulk offload script. The alerting can be automatically done by the scheduler of choice, as Gluent's offload commands will return an appropriate exit code. We also have command-line tools for monitoring offload status of tables that can be used in external automation scripts.
Gluent have designed the offload commands to be reliable and re-runnable, so no manual intervention is necessary should some jobs fail due to infrastructure issues like network problems, hardware failures, or temporarily running out of space.
Overall, the offload commands are:
If the scheduled offload succeeded, it will return a "success" exit code to the scheduler. If there was an error, it will return a non-zero exit code. This way your scheduler will know exactly which task(s) completed and which should be retried. Combined with the built-in data validation checks at the end of the offload - if the job completes with a successful exit code, the data is guaranteed to be offloaded correctly up to the desired offload threshold.
If an offload job fails for some reason, it can be re-executed as-is by the scheduler. No need for any manual clean-up tasks. This allows the scheduler to automatically retry in case the failure happened due to a transient infrastructure issue. The alternative is to not rerun the previous execution and just offload more data the next scheduled run.
3) Decoupled from purging
Some or all relational table data can be offloaded first and subsequently dropped from the source database independently much later. Note: When a majority of the data is stored and processed in Hadoop and the rest of the data remains in the relational database, it’s called a 90/10 approach to offloading. But really any percentage of data can be moved to modern storage and data processing platforms to meet your enterprise data needs. The offload process adjusts the "offload threshold" metadata in Gluent 90/10 hybrid views upon a successful offload. Any hybrid queries would start consuming the historical data from Hadoop, despite it still existing in Oracle too. The Oracle data can be independently dropped later, with no effect on the hybrid queries, since they already consume historical data from Hadoop. Alternatively, one could just move the "offload threshold" back to an earlier time to logically "restore" some data and begin reading more history from the relational database.
4) Tracked and resumable
Gluent automatically tracks how much data has been successfully offloaded to Hadoop (up to the "high water mark" in the time-partitioning column). The next time the scheduler kicks off an offload command, Gluent will check how much data has already been offloaded and will skip the partitions that have already been completed. This works both in the case of previously succeeded offloads and offload jobs that have failed in the end (but have successfully offloaded some partitions already).
Typical configuration of the 90/10 offload for time-partitioned tables would use options such as --older-than-days=30. Every time the scheduler reruns the same command, it automatically calculates the new date range to be offloaded and will execute that instead. Should a previous offload have failed on the previous run, the dynamic calculation will pick up more data the following execution. The approach is similar for tables that use various other threshold options, such as fixed batch dates or numeric partition thresholds.
As the response to the question describes, Gluent Data Platform orchestration can easily handle many different types of offload interruptions and failures, without any manual intervention. If you have questions about Gluent Data Platform that you’d like us to answer, please send us a note at email@example.com.