How Datazip's OneStack Transforms Data into Strategic Business Value

October 10, 2024 · 9 min read

OLake Maintainer

Managing Analytics over multiple sources can be a nightmare, especially if your team doesn’t have a dedicated data engineer.

Datazip solves this by offering a scalable and easy-to-use data platform that anyone with basic SQL knowledge can operate on.

Whether you’re a product manager, analytics engineer, data engineer or a data analyst, you can oversee the entire data pipeline—from extraction to transformation, scaling, and monitoring—without relying on a full-fledged data engineering team.

Let’ see how Datazip’s OneStack Data (BYOC - Bring your own cloud model, Platform as a Service, PaaS) helps businesses manage their data effectively:

TLDR;

Ingestion - OneStack can easily ingest up to 100-200 million rows a day.
Source Connectors - Supports 150+ source connectors that ingest data in our warehouse for fastest analytics query.
OAuth-based - Sign-ins for popular platforms like Google Sheets, Salesforce, and Facebook Ads
Scalability - Warehouse scalable with few clicks + on demand ad-hoc scale up using OpenEngine - On-demand virtual warehouse for fast querying at 50% the cost, or you can do it even via APIs.

Sync Frequency - 30 - 60 seconds near real time for data ingestion syncs.
Pricing - One tool, one common pricing strategy, less headache. Start at as low as $200.
Save cost - Datazip is at least 60% cost efficient compared to a combination of popular tools like Fivetran, DBT, Snowflake/Bigquery, Tableau.
Ad hoc Service events - Directly ingest data from your backend services to us. Ingest into Kafka or S3/Object storage, we pick up from there.
Schedule Transformation Job - runs for the models / materialized tables periodically so that freshness of data is maintained as new data gets ingested incrementally.

Sync types - We do full syncs and Incremental sync using cursor field, Xmin replication (without cursor) and CDC (WAL, oplog, etc) depending on the connector’s support.
Support for 3 intelligent modes - Cost Saver, Balanced and Performance, for smart cold and hot storage (gp2 for AWS and Premium SSD LRS for Azure)
Legacy system - Migrating legacy SQL to Transformations in OneStack.
Query layer features - Jinja syntax support in query layer.

Medallion architecture- Visualize the lineage of your data that follows Bronze -> Silver -> Gold layer overview as you clean and congregate only the data that you need to be ready for final business use cases (mostly analytics).

Tests - Support for tests in transformations (full DBT test functionalities, singular and generic tests), to validate the correctness of your transformed data, which solves common data quality issues like duplicate or null data, stale data without writing a code.
External tool support - Built-in features that extend OneStack’s warehouse capabilities to externally connect with DBeaver, Metabase, Redash, PowerBI, Zoho Analytics. Appsmith, MindsDB, SigNoz, OpenTelemetry, etc. More
JSON Flattening support - vertical flattening with and without explore JSON Arrays option, for MongoDB, Dynamodb, S3 based json ingest.

Also read: How to Flatten Object Types and Query Arrays in Semi-Structured Nested JSON for Effective Data Extraction

Monitoring via Prometheus, Grafana and Loki Stack. Related docs.

Data archival - For Postgres, you can use Datazip as a Data archival system. Archive large postgres data which is not needed for transactional use cases but only for analytical use cases. Related Docs.
Finer RBAC (Role Based Access Control) feature.

If you click on the edit button, then you can assign a host of platform and BI permissions.

Along with ClickHouse Table permission to users for read and write.

0. OpenEngine - Storage and Compute Isolation technology by OneStack

Here’s a quick summary on what OpenEngine is and how it works.

Features:

Solves issues with Out-of-Memory (OOM) Issues with your Data Warehouse (here, ClickHouse)
Spin up new ClickHouse instances on demand, providing unparalleled flexibility and scalability.
Ad-Hoc Queries: Handle urgent requests without disrupting your main operations.
Zero Downtime Upgrades: Seamlessly upgrade data warehouse without taking your system offline.
Cost Efficiency: Pay only for the resources you use, thanks to dynamic scaling.
Workload Isolation: Keep your operations stable by isolating heavy workloads.

You will have an option to enable and disable OpenEngine from the UI.

With OpenEngine enabled:

OneStack Data achieves storage and compute isolation, the warehouse acts just as a storage unit.
Virtual Data Warehouses are spinned up on an ad-hoc basis as and when needed.
OpenEngine parses SQL queries and spins up intelligent virtual warehouses [under 30 seconds] to compute the queries with a set IdleTimeout to kill the machine when idle.
The virtual warehouse then communicates with our primary data warehouse to fetch the data required for computation [not shown in the diagram below for better readability].
Scale up and down your virtual warehouse as and when required from the in-built UI.
With this 16Gb of Mother ClickHouse can handle up-to* 150Gb RAM consuming adhoc queries.

With OpenEngine disabled:

Our Primary data warehouse acts as a storage unit and compute machine.

More information on OpenEngine.

1. Unified Data Warehouse for All Sources

If you have data coming from multiple sources—databases, SaaS applications, or analytics platforms—you know how tough it is to keep everything in sync. Datazip makes it simple:

Connect to 150+ Data Sources: From MySQL, MongoDB and Postgres to Salesforce, Google Analytics, and Facebook Ads, Datazip integrates with all major platforms. You can bring all your data into a single source of truth—your data warehouse.
Custom Data Transformations: Features like JSON flattening and array explosion let you tailor your data exactly how you need it, no matter how complex the structure.

2. Built for Teams Without Data Engineers

Datazip is designed for non-engineering teams. All you need is a basic understanding of SQL.

Accessible for Analysts and Product Managers: Anyone in your team can manage the entire data pipeline, including transformations, scaling, and monitoring.
Zero Data Engineers Required: Forget spending months hiring a data team—Datazip handles all the heavy lifting for your scattered data needs.

3. Scalable and Flexible Architecture

Scalability is critical when your data grows. With Datazip, you can scale up or down based on your needs without worrying about infrastructure.

Powered by ClickHouse: Datazip provides a scalable, high-performance managed warehouse where you can store and query your data. Whether you’re importing transactional or analytical data, we give you finer control over how much data you bring in—right down to individual columns.
Infrastructure Included: You don’t need to worry about managing servers or scaling your warehouse—Datazip handles it for you.
Continuous Sync for Real-Time Insights: Sync data from your databases in as little as 1 minute. No need to wait for overnight refreshes or endure stale dashboards.

Case Studies - How Datazip Scaled Topmate's Analytics That Drove Customer Engagement

4. Migration Support

If you are already on an existing ETL tool, then you can write the same transformation logic in Datazip via our transformation framework that uses SQL & Jinja.

We have transformation using lineage (dependencies of logic) & cron jobs support so you can easily move most of your transformation logic here without too complex Airflow sort of orchestration tool.

Plus, get discounts on OneStack spending while you migrate. We provide backup and restore and multicloud support. Refer here for more.

5. API-Driven Integration

Want to build your own tools or connect Datazip with your existing systems? The platform offers APIs for everything.

Build Custom Data Tools: Use Datazip’s APIs to create bespoke workflows or integrate data into your existing backend.

6. Authentication Simplified

Connecting new data sources is a breeze with OAuth-based sign-ins for popular platforms like Google Sheets, Salesforce, and Facebook Ads. No more dealing with complex credentials—just log in, and you’re good to go.

7. No More Stitching Tools Together

Traditional setups involve using separate tools for data ingestion, transformation, and visualization. Think Airbyte for ingestion, Snowflake for storage, DBT for transformations.

Sounds like a dream team, right? Except you spend weeks (or months) integrating these tools, handling API errors, and making them talk to each other.

With Datazip’s OneStack Data, you get everything in one place. From ingestion to analytics, we’ve got you covered.

We give Integrated Frontend and Backend so you can forget about API issues or manual stitching. Datazip eliminates those inefficiencies, saving valuable dev time.

One of our customer NirogGyan's mentioned here:

"Finding the right data engineer is a time taking process and over and above setting up a robust data infrastructure that can scale as requirements grow would easily take 3-4 months. With Datazip we can actually leapfrog our data capabilities.

8. Performance and Support: Scale with Confidence

Low Sync Frequency: Sync data as frequently as every minute.
Intelligent Scaling: Each component of Datazip scales independently for optimal performance.
Best-in-Class SLA: Average response time of 4-5 hours for critical production issues.

9. Security: Your Data, Your Cloud - Complete Data Control

Datazip supports on-premise deployment on AWS and Azure, allowing you to run the platform securely within your private cloud. Sensitive data never leaves your infrastructure—only system health metrics are shared with us for diagnostics.

What Does It Cost?

Datazip starts at $500–$1,000 per month (you start at just $230 (Basic BYOC cost) —no engineering team required), which includes infrastructure and support. Compared to building a custom data stack (which can cost upwards of $12,000 and months of development time), this is a no-brainer for businesses looking to save time and money.

Why Datazip?

Datazip makes it ridiculously easy to manage and transform data, even for teams without technical expertise. By merging data from multiple sources into a scalable high performant columnar warehouse, offering custom transformations, and providing API access for further integration, it’s the ideal platform for companies that need a single source of truth.

If you’re tired of spending hours on data integration and want a solution that just works, give Datazip a try, book a quick meeting with us.

Architecture of Datazip’s OneStack Data AWS deployment

Architecture of Datazip’s OneStack Data Azure deployment

Link - Datazip’s OneStack Data Azure Marketplace Link

If you’re tired of spending hours on data integration and building data workflows and want a solution that just works, give Datazip a try, book a quick meeting with us.

0. OpenEngine - Storage and Compute Isolation technology by OneStack​

1. Unified Data Warehouse for All Sources​

2. Built for Teams Without Data Engineers​

3. Scalable and Flexible Architecture​

4. Migration Support​

5. API-Driven Integration​

6. Authentication Simplified​

7. No More Stitching Tools Together​

8. Performance and Support: Scale with Confidence​

9. Security: Your Data, Your Cloud - Complete Data Control​

What Does It Cost?​

Why Datazip?​

Architecture of Datazip’s OneStack Data AWS deployment​

Architecture of Datazip’s OneStack Data Azure deployment​