Running for class president - Snowflake vs BigQuery based on adoption
Let's see who gets the most votes!
When I started off as a data engineer, I was tasked with creating a centralized data repository in BigQuery, our data warehouse at the time. The company was considering using an open-source technology called meltano, which consisted mainly of a series of connectors that allowed you to extract data from one location (taps) and load it into another (targets). Simple EL stuff.
While setting up the tool, I realised that the only loader that was supported by the repository owners was the target-snowflake, which they leveraged for internal reporting - there were other loaders available, but all of these were community maintained, with different levels of support. I very clearly remember sending long and uneducated messages through the meltano Slack to the team at Adswerve to get some key insights on their custom BigQuery loader. It felt like a constant uphill battle to get support from the company or other users for a feature which was already supported, just for a different tool.
User adoption - A virtuous cycle
The more something is popular, the more it will be supported (by community members, by companies, etc.) which leads to even more adoption.
An easy example of this would be the adoption of the Python programming language, whose ease of use propelled it’s adoption and support, or even the english language, whose rise in popularity led to it becoming the de-facto language for business, supported all over the world by governments as a second tongue.
In tech, user adoption can be great for both providers AND users.
For providers, it can lead to:
Better marketing return on investment (ROI)
Less cost per acquisition (CPA)
Lower customer marketing and retention costs
For users, it offers:
Larger amount of supported tools
Easier access to support, both from providers & other users
Better assurance of the viability of the service
My experience with the lack of official support for the BigQuery connector in meltano got me curious about the adoption of the tool in general vis-à-vis Snowflake, one of it’s biggest competitors in the data warehouse department. So, I decided to investigate.
Back to basics - The scientific method
I’m a gigantic nerd, so it really thrilled me when I realised that through this article, I could dust off my old high-school chemistry chops and leverage the scientific method to investigate BigQuery & Snowflake’s user adoption. For those who need a refresher, there are 7 steps:
Observation - The Snowflake target is supported by the Meltano team while the BigQuery one is not.
Question - Which data warehouse is most widely adopted by users, and what impact does it have on available support?
Hypothesis - That Snowflake has a higher user adoption than BigQuery, leading to a higher level of support.
Experiment - We’ll be focusing on the user adoption, and inferring that it leads to an increase in support.
Analysis - No spoilers.
Sharing - This.
As with an election for class president, where popularity is paramount, let’s evaluate who’s on top - BigQuery or Snowflake.
Note - We will not be comparing the tools based on the available features or performance. There are PLENTY of articles that already do this, and it can quickly become subjective or use-case specific. Let’s just agree that they’re both wonderful tools.
Experiment
Google Trends
We kickstarted our analysis with everyone’s favorite first stop for trend review: google trends.
We can see that Snowflake has a steady lead over BigQuery in terms of popularity as a search term over the past 12 months.
Community activity
Next, we looked into community activity by checking the total number of subreddit members for both the r/bigquery and r/snowflake, who have 19k and 16k followers respectively.
Reddit sadly stopped exposing granular statistics at the end of 2023, but we can still use sites such as subredditstats to get an understanding of growth trends.
Even if these numbers are not up-to-date, we can see that while BigQuery has a historically had a larger user base, Snowflake’s engagements and user base were growing at a higher pace.
This would also imply that Snowflake’s subreddit has grown by 100% in the past 15 months (8000 subscribers → 16 000 subscribers), while BigQuery’s has only grown by 11% (17 000 subscribers to 19 000 subscribers).
db-engines
While both of these are interesting, it was nothing compared to the data we found on the db-engines website, which shows month-by-month popularity of different warehouses based on:
Number of mentions of the system on websites.
General interest in the system.
Frequency of technical discussions about the system.
Number of job offers, in which the system is mentioned.
Number of profiles in professional networks, in which the system is mentioned.
Relevance in social networks.
Note - Snowflake went public on September 16th 2020, which coincides with the spike.
In August 2021, Snowflake supplanted BigQuery in terms of popularity, and has been steadily distancing itself ever since.
It’s important to note, however, that this website rates overall popularity, which gives an edge to more well-established legacy systems.
As we can see, Oracle is still BY FAR the most popular database system, even if it’s popularity is decreasing year-over-year.
PyPi statistics
To get a fair comparison, I wanted to identify 3rd party python plugins which had similar or identical purposes and were supported for both BigQuery and Snowflake. We were able to identify 5:
direct connectors: google-cloud-bigquery vs snowflake-connector-python
Additionally, we were able to identify a couple Snowflake VS Google Cloud plugins (example for Prefect), but these were not relevant for our analysis, as they represented a feature that BigQuery itself does not support (Orchestration).
We used the PyPI stats API to extract package statistics for our analysis. Note that PyPI stats only go back 180 days.
Here are our results:
Note - Meltano and dask do not show as their numbers are dwarfed by the number of downloads from the other packages.
We can see that Snowflake leads in dbt, SQLAlchemy and direct connector downloads for the past 180 days.
When we look at the percentages, we see that BigQuery is a preferred solution in terms of percentage of total downloads only for dask.
You can find the code I used for this analysis here.
Discarded sources
What’s NOT shown in this experiment is all the sources we had to go through and subsequently discard for various reasons: relevance, reliability, etc. Here are just a few:
meltano usage metrics, deemed untrustworthy as sending anonymous statistics is optional.
Gartner report “Magic Quadrant for Cloud Database Management Systems”, which did not broach the subject of adoption specifically.
Airbyte article “Snowflake vs. BigQuery: Navigating Data Warehouse Landscape” for similar reasons.
Fun side note - I also found an article by Ben Rogojan pre-Seattle Data Guy, which made me laugh. Good article.
Analysis
From these 4 different sources, we can see that while BigQuery leads in the number of followers in reddit, it falls behind in the rest. It would therefore be safe to assume that, based on the findings, Snowflake is currently the more popular product.
Now, again, this does not imply that it is the superior product, only that more people are drawn to it. There are a couple reasons why this could be the case:
1. Cloud-agnostic
Never underestimate the power of flexibility.
One of the biggest downfalls of BigQuery is that it is tied to a single cloud provider, google. This excludes a large portion of the market which leverages AWS or Azure as their main tech stack, or requires companies to support a multi-cloud tech stack, which not an ideal scenario and can incur addtional costs (egress, training, etc.).
2. Marketing & Exposure
For them to like you, they have to know you.
It’s safe to say that Snowflake has a pretty incredible marketing budget and that it’s not afraid to use it. From “data for breakfast” events to user groups, conferences and countless amounts of free swag, the company is doing everything to remain top of mind in it’s user base.
3. Focus, Support & user-friendliness
Knowing who you are, and what you’re good at.
One of BigQuery’s downfalls is that it’s offering is overshadowed by everything that Google Cloud has to offer, making it more difficult to follow the release of new features or comprehensively read through the documentation.
Additionally, as Snowflake has a more targeted user base, they can afford to provide top-quality customer service and professional support, which is not the case for larger entities such as Google Cloud.
Anecdotally, I’ve heard many stories where Snowflake representatives proactively contacted existing clients about unusual activity such as compute cost spikes to make sure this was expected. This is the kind of behavior which drives retention and adoption.
Based on these findings, we can say with confidence that Snowflake is indeed the more popular product, which in terms will drive more support.
However, as we saw with Oracle, popularity does not necessarily imply growth or fit with your use case, and there many other criterias by which to evaluate a warehouse.
As with all elections, it’s good to make an informed decision, so make sure to perform a thorough investigation to see which product is the best for you.
Thanks!