Redshift is a great cloud data warehouse solution and was, in a way, the first to use the trend of migration to a massive parallel processing cloud data warehouse.
However, Snowflake has a new approach to a cloud data warehouse because it has been fully developed from scratch for the cloud, which gives Snowflake the following advantages over Redshift:
Redshift has no physical separation from Compute and Storage. If you need more storage space, you need to add more clusters, which means that you are actually paying for more computing power instead of storage space. With Snowflake, Compute and Storage are completely separate, the storage costs in Snowflake are the same as storing data on Amazon S3. AWS tried to address this problem by introducing Redshift Spectrum. With Redshift Spectrum it has become possible to fire queries on data directly on S3, but that works less well than with Snowflake.
Because Redshift does not scale automatically, you need to take and maintain a peak-sized cluster where you can use Snowflake storage as needed, and scale up and down in real time based on your usage. Snowflake also supports “auto-suspend” and “auto-resume” so as not to incur any costs when no one uses the data warehouse.
All in all, a switch from Redshift to Snowflake can save a lot of costs.
Snowflake supports Semi-Structured data types – Variant, Object and Array, so you can load data without worrying about the schedule. For some usecases this is very useful. Redshift does not have such support.
Infinite Strings – Redshift Varchar data type is limited to 65535 characters.
In addition, you have to choose the column length in advance, so it is almost natural to use the maximum size, but this creates an overhead. With Snowflake, strings are limited to 16 MB and there is no overhead due to the use of the maximum size. The default value is the maximum string, so you don’t have to know it in advance.
Snowflake supports Data Sharing, the sharing of data between different accounts (for example, from customers, or from suppliers). Data providers can share data with data consumers, and this does not even require a copy of the data. Think of a kind of Dropbox for database tables. This is a real game changer for how people work with data from third parties, and unique to Snowflake, so it is unnecessary to say that this is not supported by Redshift.
Maintenance and DevOps
With Redshift it is required to regularly vacuum. Snowflake does this all out-of-the-box. With very large tables, this can mean a huge hassle with Redshift.
Upscaling / Downscaling – Redshift does not scale up and down easily, the resizing function in Redshift is extremely expensive and causes hours of downtime. With Snowflake, scaling the storage capacity in a database is a matter of seconds, and since compute and storage are separate, there is no need to copy / duplicate data to scale up and down.
With Redshift all users look at the same cluster and are therefore concurrent users on the same system. Workload Management queues are the “Redshift way” to deal with this, but it is a very cumbersome set of rules to manage and understand. With Snowflake you can easily start different warehouses of different sizes that view the same physical data in the storage without copying data and / or assigning them to different tasks / users.
Regardless of which data warehouse you choose, or if you wish to migrate from Redshift to Snowflake, DataBright can help you roll out the right modern infrastructure.