When helping companies set up their Modern Data Stack, we typically grab either Snowflake or BigQuery for data storage and compute. The truth is, while those are great for most use cases for challenges facing early data teams, they have limitations at scale - most notably around performance and cost.
Snowflake and Databricks are in an all out war about who will become the only database a company will ever need - and they’re both grabbing land across the data stack. While they continue to focus on adding features in data storage, compute, sharing, and even data visualization, they have lost focus on improving on why data teams selected their technology in the first place. That is to say, they have lost focus on supporting larger volumes of data with higher performance and concurrency for analytics use cases. Enter Firebolt.
The following is an interview with Philip Lima, Mashey’s Founder & CEO, and Tom Mante, an expert in data analytics technology and partnerships, who now serves as the Head of Consulting Alliances at Firebolt.
Philip: Sure. I’m the Founder and CEO of Mashey and have been in the data analytics space for almost 15 years now. I started Mashey about six years ago and we have been building on what people call the “Modern Data Stack” for about three years.
You and I had a great time working together while you were in similar roles at Looker and Google Cloud a few years back. That was when we really started aligning to this new approach to data analytics - from prioritizing “full stack” solutions to using specific tools for specific jobs - which is where the term “Modern Data Stack” comes from.
Tom: Thanks for that. Yes, we certainly do go a ways back, but for others, can you tell us a little more about what Mashey does, and where your focus is in terms of professional services and specifically in the data analytics market?
Philip: Sure thing. Mashey is a team of data analytics experts, and we’ve worked hard to be the best Data Team that our customers will ever work with. This is actually a formal commitment we’ve made to our customers. As a data analytics consultancy, we help companies either (a) set up a data stack for the first time, or (b) help them get more out of the data stack they already have.
So in cases where we are setting up a data stack for the first time, we want to be sure we’re putting the best technology in place for the use case and that it’s set up right the first time, which obviously requires us to know what to use where and how far each of those technologies will get you. And as far as optimizing data stacks, for them it really is about helping them understand what’s working well, what’s not working well, and what might be a candidate for improvement in either configurations or moving workloads to other technologies.
Tom: So what I’m hearing is expertise not only in identifying current state data challenges, but also what the future state could look like and how to get there from a data and infrastructure perspective, sharing best practices along the way. Having known you from our Looker days, I know your team is considered a leader in implementing modern data stacks and that you provide a Data Team as a Service that acts as an extension of existing customer teams to make them successful with data. So as it translates to Firebolt, we’re very excited about the opportunity to be considered as part of that roadmapping journey. Can you tell me why you decided to partner with Firebolt alongside your partnerships with Snowflake and BigQuery?
Philip: Well, first it was awareness. Some key people joined Firebolt that we knew from Looker, where they did an incredible job bringing meaningful technology to market on a global scale and in a really impressive way. There’s a reason Google was interested in and eventually bought Looker. We also know some of the founders from working with Sisense. When the Sisense team brought the Elasticube to market over 10 years ago or so, that was a really novel approach to the challenges of analytics back then - that was a hybrid disk-based/in-memory/CPU design which no one had done before. Combine that with a CTO and a VP of Product both from the BigQuery team at Google. This combination of folks are now working together to solve current challenges in data analytics. So this put Firebolt onto our radar, so to speak.
Then we looked into the technology a bit more, and it didn’t take us long to realize it fills a different need than Snowflake and BigQuery. Those technologies are great platforms that solve a lot of use cases for our customers, but for those that need even faster query times on even larger amounts of data, especially for building modern data apps, Firebolt really fits in nicely for that particular need.
In-memory analytics was built to optimize a single machine and that approach is history. Now, decoupled storage and compute are considered table stakes. Still, no one is tackling the optimization of this decoupling at scale. The technologies mentioned earlier have moved on to add additional feature sets to their platforms, like data sharing and visualization, while Firebolt is staying intensely focused on one thing - performance - and giving data the most utility by being available at scale in a price-performant way.
So if you think about it, a past use case might be “I need a query time to run within a few seconds on a few gigabytes of data, or a hundred gigabytes.” Now use cases are becoming more like “I need queries to run in less than a second on terabytes of data.” And that’s where Firebolt fits.
Tom: So can you think of an example in your past implementations where Firebolt would have been a good tool, if it had been available then?
Philip: Oh, definitely! I was working with a CTO on a project - one of four we’ve done together at various companies over the years - and I think this particular project was the hardest. At the time I recall him showing me an HP Vertica appliance sitting on a shelf and he asked me if we should use it or not. I said, “I’m really sorry your company spent money on that, but I don’t think we should use it.”
We went on to build a data application that processed $1 Trillion of medical transactions a year and, of course, needed continuous integrations, could never go down, and needed to scale to a large number of concurrent users. It operated on similar patterns as Firebolt, but used a constellation of Microsoft SQL Servers. It was very tough to build, very expensive, and nothing close to what we could have done with Firebolt.
Tom: So it sounds like, for you, Firebolt is opening up a way to deliver next-generation data applications and analytics experiences. And while we’re talking about these new data experiences and what the Modern Data Stack provides, can you tell me a little about where you see Firebolt fitting into the Modern Data Stack for you and Mashey?
Philip: Sure, and first what do we mean by “Modern Data Stack”? Everyone has their own version of this, but the point is to use specialized tools in specific ways. To use a car analogy, it’d be like going from working on a car with a single adjustable wrench in your driveway to having a full mechanic’s workshop. In my opinion, we never should have just used an adjustable wrench in the first place, but that’s just how it was and what people expected.
Now you have a separation of data ingestion, with tools like Airbyte and Fivetran. Data is stored in both raw and modeled form in tools like BigQuery, Snowflake, and now Firebolt. For data modeling, dbt has become the gold standard, and for data analytics it’s largely Looker.
For Firebolt, it fits in well for a few reasons. It’s 100% SQL like everything else the modern data stack has sort of settled on, with storage in S3 like its predecessors. So it’s a familiar infrastructure that keeps our Data Engineers happy since they’re easily able to jump in with Firebolt and be productive. Secondly, since it’s SQL our Analytics Engineers can continue to use dbt and all their experience and best practices there, now at Firebolt scale. Thirdly, it integrates with Looker. That’s really cool because it proves Looker’s theory that if you separate where the data lives from how people interact with the data, both things can progress the fastest and it’ll be mutually beneficial. Now we see the data storage and compute space taking off with Firebolt, and the data interaction space continuing it’s momentum with tools like Looker that can just take advantage of the advancements in storage and compute layers. It’s a beautiful separation of concerns - it’s the Modern Data Stack.
Tom: And what other types of technologies are you partnering with?
Philip: Well, we pretty much use all the technologies I mentioned earlier. For data ingestion we’re using Fivetran, heavily using Airbyte which is also a newer technology on the market that’s only a couple years old, and Meltano to some extent as well. On data transformation, after a lot of testing and failure, we exclusively use dbt now for managing that and have had a 100% success rate with it. For storage and compute, we have the two that I mentioned, BigQuery and Snowflake, and now Firebolt depending on the use case. For analytics obviously Looker and we’ve just started working with Sigma. So those are the key ones we’re currently working with at least.
Tom: Excellent. Those are all great tools, great companies and good coverage for a lot of data analytics needs. The last question is in this “Modern Data Stack” realm. Obviously you’re working with multiple companies on a day-in day-out basis. You’re helping them get set up for the first time, or you’re helping them do more with data. What are they starting to surface in terms of trends that you haven’t covered yet and that would help them become more data driven, use data better, and make it a more core day-to-day interaction for them?
Philip: Hah! I’m smiling here because it sort of sounds like a “what’s the next cool thing?” type of question. One thing that comes to mind in our approach is something Lloyd Tabb, one of the founders of Looker, said when someone asked him “What’s next?” after being acquired by Google. He said, “I’m not interested in what’s next. I’m interested in what’s ready now. I think e-bikes are cool! And they’re ready. We can all ride one right now.” I really liked that and realized that instead of being speculative, we should focus on how we can help people with what they currently need. As the modern data stack has kind of settled in the last few years to be the stack we’re talking about here, we don't need more technologies in those areas, but we need to help people get more out of what they already have in place. So how do we make that more accessible, more reliable, and better understood? In general, data platform management needs help. People don’t know what data is available, how reliable it is, or even what the data is. It’s all very compartmentalized within each tool in the stack.
So we’re looking into better ways to help people document data lineage and business rules across the stack in a single place. It’s a big deal and a total pain. There are products that have done this for a while now but they’ve effectively priced themselves out of the market for most companies, or they only support legacy tools. It seems like Castor is onto something here. It's got great coverage on key modern data stack components at a price point that keeps it accessible for anyone - a strategy that I think has helped dbt and Airbyte become so widely used so quickly.
Data Observability is a fancy term for - tell me how healthy my pipelines are, and tell me about any downstream issues before my users find out. Monte Carlo is one to look at there, for sure.
Lastly, Reverse ETL. We can all have our opinions on if it should be called “Reverse ETL” or if it should have a better name, but the need really is there and there are companies jumping onto that. We build these high-value data assets and should be able to use that information to enrich data back in our other systems, like going back to Hubspot or Salesforce records to update customer LTV or cohorts. So Hightouch and Census are ones we’re looking into on that front.
Tom: Excellent. I’d say that those are definitely a few areas that are very high profile right now and will be at the forefront for a lot of companies using data moving forward.
We, at Firebolt, have a similar view of where the modern data stack is going and are constantly adding more support for that ecosystem, given our existing and future roadmap integrations with the aforementioned technologies. This is part of the reason why you and I got on the phone and started talking about a partnership. So yeah, we’re very excited on the Firebolt side to see some of those areas come to fruition, and to see where Mashey goes with that.
So how can people get a hold of you if they’d like to learn more or discuss working with you?
Philip: The easiest would be on our website (mashey.com) where you can fill out a contact form which would land directly in my inbox, or book a call at the top of any page to get time directly on my calendar.
Tom: I definitely appreciate the time and am looking forward to what’s next.
Philip: Thank you. Our entire team is really excited about this partnership. It opens us up to deliver broader and higher scale use cases and that’s pretty cool. Thanks again.