Wednesday, November 9, 2022

Sharding

 Introduction

Sharding is a type of database partitioning that separates large databases into smaller, faster, more easily managed parts. These smaller parts are called data shards. The word shard means "a small part of a whole."

Sharding involves splitting and distributing one logical data set across multiple databases that share nothing and can be deployed across multiple servers. To achieve sharding, the rows or columns of a larger database table are split into multiple smaller tables.

Once a logical shard is stored on another node, it is known as a physical shard. One physical shard can hold multiple logical shards. The shards are autonomous and don't share the same data or computing resources.


Horizontal sharding. 

When each new table has the same schema but unique rows, it is known as horizontal sharding. In this type of sharding, more machines are added to an existing stack to spread out the load, increase processing speed and support more traffic. This method is most effective when queries return a subset of rows that are often grouped together.

Vertical sharding. 

When each new table has a schema that is a faithful subset of the original table's schema, it is known as vertical sharding. It is effective when queries usually return only a subset of columns of the data.


Benefits of sharding

Sharding is common in scalable database architectures. Since shards are smaller, faster and easier to manage, they help boost database scalability, performance and administration. Sharding also reduces the transaction cost of the database.

Horizontal scaling, which is also known as scaling out, helps create a more flexible database design, which is especially useful for parallel processing. It provides near-limitless scalability for intense workloads and big data requirements. With horizontal sharding, users can optimally use all the compute resources across a cluster for every query. This sharding method also speeds up query resolution, since each machine has to scan fewer rows when responding to a query.

Vertical sharding increases RAM or storage capacity and improves central processing unit (CPU) capacity. It thus increases the power of a single machine or server.


Sharded databases also offer higher availability and mitigate the impact of outages because, during an outage, only those portions of an application that rely on the missing chunks of data become unusable. A sharded database also replicates backup shards to additional nodes to further minimize damage due to an outage. In contrast, an application running without sharded databases may be completely unavailable following an outage.


Difference between sharding and partitioning

Although sharding and partitioning both break up a large database into smaller databases, there is a difference between the two methods.

After a database is sharded, the data in the new tables is spread across multiple systems, but with partitioning, that is not the case. Partitioning groups data subsets within a single database instance.