Introducing OpenEngine: Datazip's New Warehouse Technology with ClickHouse

Introduction

Datazip has unveiled OpenEngine, a groundbreaking solution that addresses one of the most significant challenges in making ClickHouse flexible and scalable with storage and computation detached. 

Built on the robust ClickHouse database, OpenEngine allows users to spin up new ClickHouse instances on demand, providing unparalleled flexibility and scalability.

The Problem: Out-of-Memory (OOM) Issues

Organizations often face Out-of-Memory (OOM) issues when running heavy queries, even after optimizing those queries. When a query requires more RAM than available, it forces a machine upgrade, resulting in downtime as the server goes offline, gets upgraded, and comes back online. This downtime is unacceptable for many companies, especially those relying on these instances for customer-facing analytics.

Moreover, once the machine is upgraded, it often needs to be downscaled again since the demand for a larger machine might be infrequent. This cycle of upgrading and downscaling is inefficient and disruptive.

Real-World Challenges Faced by Datazip Users

Let’s talk about some of the real world challenges we witnessed our users often getting into. 


  1. Datazip users often encounter challenges when attempting to perform ClickHouse upgrades or downgrades without causing downtime. For instance, Limechat, a prominent Datazip customer, required a solution to handle their complex analytics requirements without downtimes

  2. It is crucial to ensure that ClickHouse main processes remain unaffected by poorly written or long-running resource-intensive queries. We have observed that at least 70% of the OOM occurrences is due to badly written SQL query but this affects every other user using the database

  3. Enabling multiple users to use ClickHouse without impacting each other is imperative, necessitating the isolation of resources and permissions for different teams. This also enables teams to allocate resources to the users based on the teams budgets

  4. Long-running transformation jobs and user-facing dashboards, emphasizing the need to ensure that these processes are never adversely affected since they can be customer facing

The Solution: Adhoc Instances with Storage and Compute Separation

This challenge generates a need for the ability to spin up ad-hoc instances of ClickHouse with separated storage and compute. Unfortunately, ClickHouse has kept this feature proprietary and has not open-sourced it.

OpenEngine Architecture

Leveraging Streaming Data

To overcome this limitation, we can leverage the power of streaming data. In a typical SQL server interaction, the rows received from querying with a SQL library client represent a live, streaming connection to the database. This concept of streaming can be extended to solve our problem.

OpenEngine: A Revolutionary Approach

By understanding in-memory processing and leveraging technologies like CHDB, which uses the ClickHouse OLAP SQL engine to enable virtual database capabilities, OpenEngine is built on the core OLAP capabilities of ClickHouse.

OpenEngine introduces the capability to spin up larger ClickHouse servers on demand and stream data from the original ClickHouse instance. This enables OpenEngine to connect to a larger adhoc ClickHouse server for processing, while the Mother ClickHouse instance now acts as the storage layer in the OpenEngine ecosystem.

With this innovation OpenEngine has enabled storage and compute separation without actually separating them.

Key Features of OpenEngine

  • Workload Separation: By separating storage and compute, OpenEngine allows you to isolate and manage workloads more efficiently. Heavy queries can be offloaded to adhoc instances, preventing resource contention and ensuring consistent performance for other operations.

  • Reduced Downtime: With the ability to spin up larger ClickHouse servers on demand within 30 Seconds, OpenEngine minimizes downtime associated with upgrading hardware. This is crucial for maintaining continuous availability for customer-facing analytics and other critical applications.

  • Optimized Resource Usage: OpenEngine dynamically adjusts resource allocation by scaling instances up or down based on real-time query requirements. This ensures that resources are used efficiently, reducing costs and avoiding over-provisioning.

  • Enhanced Scalability: Instantly respond to changes in workload demand by scaling compute resources independently of storage. This flexibility allows you to handle varying workloads without disrupting ongoing operations. With this 16Gb of Mother ClickHouse can handle up-to* 150Gb RAM consuming adhoc queries.

  • Seamless Integration: OpenEngine integrates effortlessly with existing ClickHouse clients and tools, such as JDBC/ODBC clients, DBeaver, SQLAlchemy, and clickhouse-go. This ensures that you can continue using your preferred tools and workflows without modification.

Conclusion

OpenEngine by Datazip represents a significant leap forward in data warehousing technology. By enabling storage and compute separation with ClickHouse, it provides a flexible, scalable, and efficient solution to the common challenges faced by organizations. With OpenEngine, downtime becomes a thing of the past, and resource optimization is achieved effortlessly.

Stay tuned for more updates and detailed tutorials on how to get started with OpenEngine and unlock its full potential for your data warehousing needs.

Try https://datazip.io to get best performing low engineering required data platform.


Author Bio

Pavan Kalyan Chiluka and Piyush Singariya are Founding Software Engineers at Datazip. With a passion for innovation and a deep understanding of ClickHouse and containerized deployments, are dedicated to helping organizations overcome their data challenges.

Call to Action

Ready to revolutionize your data warehousing strategy? Contact us at [email protected] to learn more about OpenEngine and how it can benefit your organization.