What does it mean when Twitter says their entire database is in RAM?

I am wondering where to begin with database scaling/optimization strategies. After reading articles like highscalability.com’s facebook architecture article, and this twitter architecture article, I am not sure if by RAM they mean only memcached, or something else.

My questions are:

  • Can entire SQL databases be stored in RAM? I’m thinking sharding and all that here… Just learning (not from a db background).
  • Are SQL database indexes stored in RAM?
  • Are NOSQL databases stored in RAM? Can they be?
  • Or is Memcached the standard for storing things in RAM? So 99% of requests read from RAM (Memcached), and the database (disk) is a backup?

Just looking for a point in the right direction.

Answer

The trouble with storing a database in RAM is that RAM has a nasty habit of forgetting everything when the power goes away i.e. it’s not persistent. That said, making proper use of memory for high-traffic sites is absolutely essential to getting decent performance, because you get extremely good IO rates from it and that’s a very useful thing if you have a high load.

From memory, MySQL had a MEMORY table type that stored data in RAM rather than on disk (as InnoDB and MyISAM would). Creative use of a RAMdisk would also allow any database to use RAM as a disk backing, but as above, this isn’t probably what you’d want to do. As you’ve hit upon, a more useful application would be the use of RAM as a high-performance cache, using something like Memcached. As I’m sure you know, this gives a fast key/value store, but requires the application to know to look there first and then fall back to the persistent database if nothing’s found. Sites which require a high IO rate across their entire relational DB have the option of dumping the entire DB onto something like a Fusion IO drive. This isn’t going to be as quick as RAM, but has the option of being persistent so can be a useful middle ground. I believe SO runs it’s database on a Fusion IO drive (see this blog post about their findings.

So, in summary, a high-volume site will have its data stored on persistent storage (spinning disk, SSD etc.) and then set up a series of layers of higher-performing caches in order to reduce (usually the reads) load on the database. Writes typically go straight into the db, but you can use a localized write-cache if you’ve got a lot of writes.

In answer to your specific questions:

  • Entire SQL databases can be stored in RAM, but this isn’t necessarily built-in or what you’re looking for. If you want a RAM-based database, there’s probably a better option.
  • SQL indexes will be managed by the SQL engine that you’re using. Different SQL servers (MSSQL, MySQL, Postgres etc.) might have different strategies and tuning options for determining when to dump indexes into RAM depending on a number of factors, such as how big they are, how often they’re hit, how much RAM you have.
  • I’m not a NOSQL expert, so would be making up any answer here. However, could you say that memcached is a memory-based NOSQL database? Maybe.
  • Memcached is fairly widely used and has a lot of support from various libraries and software stacks.

Attribution
Source : Link , Question Author : Lance , Answer Author : growse

Leave a Comment