Doughnut Reader

Launch HN: Baselit (YC W23) – Automatically Reduce Snowflake Costs

sahil_singla

30 comments

Hey HN! We are Baselit (https://baselit.ai/), a tool that automatically optimizes Snowflake costs. Here’s a demo video: https://www.youtube.com/watch?v=Ls6VRzBQ-pQ.

Snowflake is one of the most widely used data warehouses today. It abstracts out the underlying compute infrastructure into “warehouses” for the user - compute units with t-shirt sizes (X-Small, Small, Medium etc.). In general, if you want to lower your data processing costs, the only thing you can do is to just process less data (i.e. query optimization). But Snowflake’s warehouse abstraction allows an extra dimension along which you can optimize - by minimizing the compute you need to process that same data (i.e. warehouse optimization). Baselit automates Snowflake warehouse optimization for you.

While we were working on another idea last year (AI for SQL generation), users frequently shared with us how Snowflake costs had become a top concern, and cost optimization was now their business priority. Every few months, they would manually look for opportunities to cut down on costs (removing workloads or optimizing queries) - a time consuming process. We decided to build a solution that could automate cost optimization, and complement the manual effort of data teams.

There are two key components of Baselit:

1. Automated agents that cut down on warehouse idle time. This happens in one of two ways: cache optimization (when to suspend a warehouse vs letting it run idle) and cluster optimization (optimal spin down of clusters).

You can easily find out how much these agents can save you. Here’s a SQL query that you can run on your Snowflake, and it will calculate your savings - https://baselit.ai/docs/savings-estimate

2. Autoscaler that lets you create custom scaling policies for multi-cluster warehouses based on SLAs. Snowflake’s default policies (Economy and Standard) are not cost optimal in most cases, and they don’t give you any control.

One use case for Autoscaler is that it helps you efficiently merge several warehouses into one multi-cluster warehouse - with a custom scaling policy that is optimal for a particular type of workload. In Autoscaler, you can set a parameter called “Allowed Queuing Time” that controls how fast a new cluster should spin up. For e.g. if you want to merge transformation workloads, you might want to set a higher queuing time. Baselit will slow down the cluster spin up, ensuring all clusters are running at a high utilization, and you’ll see a reduction in costs.

We’ve built a bunch of other features that help in optimizing Snowflake costs: a dbt optimization feature that automatically picks the right warehouse size for dbt models through constant experimentation, a “cost lineage”, spend views by teams/roles/users, and automatic recommendations from scanning Snowflake metadata.

Due to the nature of our product (access to Snowflake metadata required), we haven’t made Baselit self-serve yet. We invite you to run our savings query (https://baselit.ai/docs/savings-estimate) and find out your potential savings. And if you’d like to know more about any of our features and get a live demo, you can book one at this link - https://calendly.com/baselit-sahil/baselit-demo

We’d love to read your feedback and ideas on Snowflake optimization!

How does this differ from https://espresso.ai ?

ukd1

12 days ago

Chiming in, I'm one of the founders of Espresso AI - we do both query optimization and warehouse optimization, both of which are hands-off. In particular we're beta-testing a fully-automated solution for query optimization (it's taken a lot of engineering!).

Based on the responses here I think we're a superset of where baselit is today, but I could be wrong.

karamazov

12 days ago

Would love to see how you’re doing warehouse optimization. Is there a demo video I can look at?

sahil_singla

11 days ago

They are more focused on query optimizations whereas we do warehouse optimizations. We are inclined towards warehouse optimizations due to it being completely hands-off.

sahil_singla

12 days ago

cool - they're kinda complimentary?

ukd1

12 days ago

Espresso does warehouse too so they’re competitors

altdataseller

11 days ago

Yeah, kind of.

Though I'm not exactly sure how their product works, I saw from the landing page that it's broadly focused on query optimization.

We've done a lot of experimentation with query optimizations, both with and without LLMs, and we don't think it's possible to build a fully automated solution. However, a workflow solution might be feasible.

sahil_singla

12 days ago

Not a Snowflake user, but I'm curious as to your business model. What barriers are there to prevent Snowflake from reverse engineering your work and including it as part of their native experience? Is the play here an eventual aquisition?

mustansirm

12 days ago

It has been my experience working on similar projects for cutting down e.g. aws spend that the primary billers often have a really hard time accepting or incorporating bill-reducing features. All the incentives they have are geared to want increased spend, regardless of the individual preferences of any members of the company, and so that inertia is really hard to overcome.

jaggederest

12 days ago

That resonates with what we have heard from our customers.

sahil_singla

12 days ago

Our belief is that building a good optimization tool is not aligned with Snowflake's interests. Instead they seem to be more focused on enabling new use cases and workloads for their customers (their AI push, for example, with Cortex). On the other hand, helping Snowflake users cut down costs is our singular focus.

sahil_singla

12 days ago

or to phrase it differently: what kind of market is this, where big companies are herded into tarpits of SaaS which apparently have exactly the same problems as running it the old way had (namely inefficient usage of ressource). Just now you have to pay some symbiotic start-up instead of hiring some generic performance-person.

fock

12 days ago

It's not really in their interests?

bluelightning2k

12 days ago

What happened to your other idea?

candiddevmike

12 days ago

not OP, but for us, LLM's just aren't good enough yet to write analytical SQL queries (and they may never be good enough using pure SQL). Some more context here: https://news.ycombinator.com/item?id=40300171

mritchie712

12 days ago

+1. We came to a similar conclusion when we were working on this idea.

sahil_singla

12 days ago

Productizing cost optimization experience! Great to see more options in this space, as so many companies are surprised by the costs of cloud.

For the warehouse size experimentation, how do you value processing time?

datadrivenangel

12 days ago

We optimize warehouse sizes for a dbt project as a whole. Users can set a maximum project runtime as one of the parameters for experimentation. The optimization honors this max runtime while tuning warehouse sizes for individual models.

sahil_singla

12 days ago

How does this differ from Keebo?

https://keebo.ai/

michaelmior

12 days ago

We are different from Keebo in the way we approach warehouse optimization. Keebo seems to dynamically change the size of a warehouse - we have found that to be somewhat risky, especially when it's downsizing. Performance can take a big hit in this case. So we've approached this problem in two ways:

1. Route queries to the right-sized warehouse instead of changing the size of a particular warehouse itself. This is part of our dbt optimizer module. This ensures that performance stays within acceptable limits while optimizing for costs.

2. Baselit's Autoscaler optimally manages the scaling out of a multi-cluster warehouse depending on the load, which is more cost effective than upsizing the warehouse.

sahil_singla

12 days ago

Does anyone support this sort of optimization for AWS Redshift?

I built some lambdas that looked at Queuelength and turned off Redshift Concurrency Scaling for WLM queues to mitigate costs for less critical afternoon workloads but it was always cruder than I wanted.

gregw2

11 days ago

Does it use AI?

iknownthing

12 days ago

No AI yet - all algorithms are deterministic under the hood. Although we are considering tinkering with LLMs for query optimization, as part of our roadmap.

sahil_singla

12 days ago

We (https://www.definite.app/) were also working on AI for SQL generation. I can see why you pivoted, it doesn't really work! Or at least well enough to displace existing BI solutions.

edit: context below is mostly irrelevant to snowflake cost optimization, but relevant if you're interested in the AI for SQL idea...

I'm pretty hard headed though, so we kept going with it and the solution we've found is to run the entire data stack for our customers. We do ETL, spin up a warehouse (duckdb), a semantic layer (cube.dev) and BI (dashboards / reports).

Since we run the ETL, we know exactly what all the data means (e.g. we know what each column coming from Stripe really means). All this metadata flows into our semantic layer.

LLM's aren't great at writing SQL, but they're really good at writing semantic layer queries. This is for a couple reasons:

1. better defined problem space (you're not feeding the LLM irrelevant context from a sea of tables)

2. the query format is JSON, so we can better control the LLM's output

3. the context is richer (e.g. instead of table and column names, we can provide rich, structured metadata)

This also solves the Snowflake cost issue from a different angle... we don't use it. DuckDB has the performance of Snowflake for a fraction of the cost. It may not scale as well, but 99% of companies don't need the sort of scale Snowflake pitches.

mritchie712

12 days ago

It may be worth having a look at Raia (https://raia.live)

I'm actively developing this product.

One of the things I added is something called "Assistant Profiles". Given the fact that you know the DB structure, you can create a custom Assistant Profile and adjust it to fit the underlying DB better, which improves the results quite a lot

You can then expand the connection to other external systems and automate a lot of the analysis processes your users may have

I'm happy to work with you to make it work for your use case

brunoa-ca

6 days ago

Kind of surprised to hear that given the number of companies I've seen pitching natural language to SQL queries.

iknownthing

12 days ago

was also doing that last year with outfinder.co, but like the others said, it's really hard

ericzakariasson

11 days ago

Can you clarify what you mean by "they're really good at writing semantic layer queries"?

Re JSON query format: you mean that's what you're using?

redwood

11 days ago

Yes, the queries for the semantic layer we're using are in JSON. Here's an example query:

    {
      "measures": ["stories.count"],
      "dimensions": ["stories.category"],
      "filters": [
        {
          "member": "stories.isDraft",
          "operator": "equals",
          "values": ["No"]
        }
      ],
      "timeDimensions": [
        {
          "dimension": "stories.time",
          "dateRange": ["2015-01-01", "2015-12-31"],
          "granularity": "month"
        }
      ],
      "limit": 100,
      "offset": 50,
      "order": {
        "stories.time": "asc",
        "stories.count": "desc"
      },
      "timezone": "America/Los_Angeles"
    }

mritchie712

11 days ago

I was thinking about an AI to feed you the proper Snowflake sales pitch each time a query runs expensive or fails a benchmark. At my previous org it could replace several headcount.

kwillets

12 days ago