open-source, open source, opensource, open source software, open sources, open sourced, opensource software, open sourse, open source community, open source website, what is open source, open sorce, open source code, open source definition, open source development, open sourcing, open souce, copyleft, open source public, open source technology, software open source, open source coding, open source documentation, open source platform, open-source software, opensource projects, open source project, open source projects, open source solutions, source guide, open source license, open guide, guide source, open source site, opensource.com, open soruce, gpl license, gnu general public license, erp open source, open source faq, what is foss, what is an open source, outil de developpement, opensource web, open source web, open source technologies, open source softwares, open source erp, define open source, open source news, opensource project, project open source, solution open source, source project, source software, why open source, free open source, open source communities, open source alternatives, gpl source, free source, open software, open projects, open source initiative, open source meaning, open source ocr, open source projects for beginners, oss software How to design a stable storage system

How to design a stable storage system

Riak CS editor's selection

We used every link in evolutionary chain of storage management: filesystem -> network filesystem -> object storage. Let me quickly share what I learned.


Filesystems

XFS, Ext2/3/4 are all fast and mature filesystems. But when you add disks, in order to increase space of your LVM volume, you discover that risk of downtime increases as well. See article on scalability where I mention Dirichlet principle.

To minimize that risk, you would have to use a network filesystem with additional servers.


Network Filesystems

We tried most of network filesystems.

The most annoying issues with that type of filesystems are poor scalability and poor bad fault tolerance. Those issues are even more annoying if you store application assets on network filesystem and then links on them in RDBMS. In our case we had 40 physical application servers with data stored on NFS. Once in a while NFS lost some files, so the app UI looked broken.


Object storage is the next logical step in evolution of storage systems. It addresses issues of fault tolerancy and scalability, provides application interface and allows you to store metadata as attributes of objects.

Application can talk directly to the object storage, avoiding redundant logic, that would be used in case of RDBMS. If you use RDBMS with object storage, primary keys can be stored in object metadata now.


Object Storage

We have tested most of object storage systems.

CouchDB stores everything in JSON, so binary objects become very large.

Cassandra has unpredictable performance, as any other overbloated system with such amount of background tasks.

OpenStack Swift. While it has an acceptable quality, unlike other parts of OpenStack, it stops serving objects when one of three assumed nodes fail. That is a wrong implementation of eventual consistency from my point of view. But we have used OpenStack Swift for about two years, until the load became too big and maintenance became too difficult.

LeoFS has a poor implementation, as our studies have shown. It uses Erlang's "ets" for storing metadata, which makes resource comsumption unpredictable.

ZoDB with ZEO is too slow. Also it uses BerkeleyDB, as backend. It would be very difficult to restore BerkeleyDB in case of failure, as there'a no tools for maintenance.

MongoDB

. The first time I checked it was too unstable. It just segfaulted when we stored 9 million of small objects. Laster I had to use MongoDB anyway, and it turned out to have master-slave type of cluster. This means changes from master do not appear instantly on slaves. It is not good for highly loaded systems

Redis

is good for caching, but not on highly loaded systems. When you turn on persistance, it becomes too slow for high load.

Amazon S3 is expensive. Also see article Dangers of the Clouds.

Riak CS currently is the best object storage system on the market. It has a predictable memory and performance characteristics, as well as predictable latencies. Applications, written using Erlang Open Telecom Platform are known to work tens of years without human intervention. Riak is implemented using latest studies in distributed systems and is supposed to be used for installations that consist of 3 and more physical servers. Nevertheless it is stable and fast.


That knowledge should allow you to avoid mistakes and redundant parts in software design.

28 November, 2017