Tuesday, March 7, 2017

Risky Business: memcached For Faster Websites Via Persistent In-Memory Entities

memcached is a great technology - but it does not come risk free.  Here's why.

I have been working with in-memory object caching for at least 20 years, since the days you could set up a RAMDRIVE on MS-DOS 6.22.  That technology handed me quite a coup in my first job as a Database Analyst (in those days there was no such thing as a Data Scientist).  I was able to reduce the runtime duration of a disk-intensive INFORMIX-4GL program from about a day to under an hour.  My boss was amazed.

The principle behind using a technology like RAMDRIVE or memcached is simple - Random Access Memory (RAM) is much faster than your Hard Disk Drive (HDD).  The speed difference between these two types of memory can sometimes make things go thousands of times fasterNow, this principle applies to web servers today just as equally as it did to high-end databases in 1995.  So if you want the best responsiveness for your website, keeping objects that don't change a lot (images, .css files, static web pages) in memory makes a lot of sense.  Things speed up, sometimes a lot.  

One web server I recently installed a caching technology called memcached on delivers pretty good performance, serving up a home page full of a mix of static and dynamic content in just under 4 seconds:

The numbers get even better when a page is largely composed of static content.  Under those circumstances, pages can be served up in less than 3 seconds:

So far, so good - but nothing comes for free.  Using in-memory caching strategies come with a big risk, because RAM is typically not persistent, so a power fluctuation or outage can wipe out whatever state your system happened to be in at the time of disruption - which can lead to the dreaded loss of data.  In the old days, dealing with maintaining the fidelity of state persistence through service disruption was a big deal.  A lot of money, time and energy was devoted to reducing the chances of losing data to the absolute minimum.   

These days, most of the issues that used to plague implementing caching strategies have either been mitigated or eliminated to the point where they have become commodities and therefore invisible.  The preponderance of the big challenges are now concerned with aspects of the application layer, not the physical layer.  Infrastructure-oriented Data Scientists and Architects are now more focused on challenges related to data distribution, like managing extremely large data sets (Big Data), preserving the integrity of data representation in highly distributed or parallelized processing environments, and maybe data synchronization and/or convergence on a global scale

hese challenges are not new.  I studied them in the early 90's and have been grappling with them ever since.  
What has changed is the sheer scale of things.  Stuff is huge now.  Gigabit connectivity.  Petabytes of data.  Server farms of thousands of machines.  The need for even SME companies to never, ever be unable to serve their customers information, 24*7*365 anywhere in the world.  Things like that predominate now.

The risk/return calculation of using in-memory object caching is also being challenged, mostly because of advances in storage technology.  Now, we have Solid State Disks (SSD), which leverage hardware technology somewhat like RAM to deliver performance approaching RAM-like speedsTo boost performance, many HDD and SSD now use advanced cache strategies.  With SSD, instead of orders of magnitude performance gain, the difference these days tends towards the medium single digits, which is great, but no longer amazing.

Still, some performance gain is better than nothing - and in these days of virtualized servers you may not have the ability (or desire) to set up an optimized web server platform with all of the fault-tolerance, reliability and availability features that owning a physical server brings...along with all of the effort, money and time that entails.  Procuring, provisioning and maintaining a colocated server can very quickly become an expensive, complex and time-consuming exercise.  Avoiding the cost and hassle of dealing with physical servers is a major value proposition of companies like ExpertVM, RackSpace and Amazon, to name a few.  I have used them all at one point or another.  Hardware maintenance issues usually vanish, freeing up time to do other things.

But without control over the hardware layer, the only place you can go if you want to squeeze out additional performance is in the software layer, by either re-writing the operating system, or finding an application layer program.  One option is memcached, an application layer software that doesn't require custom modification of the host operating system.  It is free, is actively maintained and works pretty good most of the time.

Installing memcached from the command line with yum
If, like me, you are using memcached within a Linux, Apache, MySQL, PHP environment (LAMP), memcached is typically installed at the command line.  For my Linux system (CentOS 6) I can use the yum package manager to install it.

#yum install memcached 

But there's a big problem with the version of memcached installed by yum on my Linux version (CentOS release 6.8).  This is a really old version of CentOS, because I set it up this server really long time ago.  Unfortunately, the software packaged with CentOSis is pretty old - the latest version of memcached for CentOS release 6.8 is version 1.4.4, which dates back to 2012.  That's over 30 versions behind the state of the art, which now stands at version 1.4.35.  yum won't be of any help here, so the only way to upgrade is to manually install memcached from source, which means visiting the memcached website, downloading the source code, compiling it and installing it.

Obtaining memcached from the maintainers
A copy of memcached can be obtained from www.memcached.org.  This will arrive as a file with the extension .tar.gz, wihch means you need to have both the tar and gzip applications installed on your system to expand it from its compressed, archival form.  As mentioned, at the time of this writing (2017-03-08) the latest version of memcached was version 1.4.35.  Here's its location:


If the version has moved on from the time of this writing, the maintainers make it easy to get the latest version of memcached on the home page.  It's a "point and shoot" experience:

 Downloading memcached

# cd ~
# mkdir memcached
# cd memcached
# wget http://www.memcached.org/files/memcached-1.4.35.tar.gz
--2017-03-08 10:25:43--  http://www.memcached.org/files/memcached-1.4.35.tar.gz
Resolving www.memcached.org...
Connecting to www.memcached.org||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 398312 (389K) [application/x-tar]
Saving to: “memcached-1.4.35.tar.gz”


398,312      385K/s   in 1.0s

2017-03-08 10:25:45 (385 KB/s) - “memcached-1.4.35.tar.gz” saved [398312/398312]


Decompressing memcached
The next challenge is to decompress the file (it's called a "tarball") to something that can be compiled and installed:

# tar -zxvf memcached-1.4.35.tar.gz
...a whole bunch of output will result here

Configuring memcached 
Before you can compile memcached, it needs to be configured for the system it is being installed on.  The following accomplishes that:
# cd memcached-1.4.35
# ./configure
...a whole bunch of output will result here

Making memcached 
Once it has been configured, compiling memcached, is easy.  The following accomplishes that:

# make
...a whole bunch of output will result here

Installing memcached 
Once it has been configured, compiling memcached, is easy.  The following accomplishes that:

# make install
...a whole bunch of output will result here

Custom Installing memcached
If your memcached instance remains the same version it was before all of this, or remains unavailable, you may need to perform a custom install.  At this point, the memcached program will be sitting in your current directory.  For your system to run this program, it needs to be located in your /usr/bin directory.  Placing a copy of memcached there is easy, here's how I did it:

#cp memcached /usr/bin/memcached

At this point, memcached should be properly installed on your system and ready to go.  You should be able to start it this way:

# /etc/init.d/memcached start
Starting memcached:                                        [  OK  ]

memcached automatic start
If, like me, you are using memcached within a Linux, Apache, MySQL, PHP environment (LAMP), memcached is typically installed to run as a system level process (daemon) and its control is fully integrated with the Linux runlevel environment.  This means that when you turn on the computer, memcached gets started via something called an init script.  My server is configured to run at runlevel 3, so in /etc/rc3.d there is a script called S55memcached that starts up the daemon for me.  Checking to make sure it is there can be accomplished in this way:

# ls /etc/rc3.d/*mem*

It would also be smart at this point to make sure that your memcached instance is the version you've just installed.  Checking your running memcached instance version number can be done this way:

# telnet localhost 11211
Connected to localhost.
Escape character is '^]'.
VERSION 1.4.35
Connection closed by foreign host.

Now, using telnet to control memcached is one way to go, but I'm not particularly fond of it.  Instead, I installed two different memcached administration systems, both of which are useful to me in their own way.  They are:
Here's the phpMemcachedAdmin interface, which offer up a lot of information:

Here's the Memcache stats interface, which offers up the same information, but in a different format that I find more friendly to the way I like to absorb information, but with slightly less up-front detail:

Now we know why technology like memcached exists, where to get it, how to install it, and how to monitor it.

No comments:

Post a Comment