In the good old days when building web sites was as easy as knocking up a few HTML pages, the delivery of a web page to a browser was a simple matter of having the web server fetch a file. A site's visitors would see its small, text-only pages almost immediately, unless they were using particularly slow modems. Once the page was downloaded, the browser would cache it somewhere on the local computer so that, should the page be requested again, after performing a quick check with the server to ensure the page hadn't been updated, the browser could display the locally cached version. Pages were served as quickly and efficiently as possible, and everyone was happy.
Then dynamic web pages came along and spoiled the party by introducing two problems:
- When a request for a dynamic web page is received by the server, some intermediate processing must be completed, such as the execution of scripts by the PHP engine. This processing introduces a delay before the web server begins to deliver the output to the browser. This may not be a significant delay where simple PHP scripts are concerned, but for a more complex application, the PHP engine may have a lot of work to do before the page is finally ready for delivery. This extra work results in a noticeable time lag between the user's requests and the actual display of pages in the browser.
- A typical web server, such as Apache, uses the time of file modification to inform a web browser of a requested page's age, allowing the browser to take appropriate caching action. With dynamic web pages, the actual PHP script may change only occasionally; meanwhile, the content it displays, which is often fetched from a database, will change frequently. The web server has no way of discerning updates to the database, so it doesn't send a last modified date. If the client (that is, the user's browser) has no indication of how long the data will remain valid, it will take a guess. This is problematic if the browser decides to use a locally cached version of the page which is now out of date, or if the browser decides to request from the server a fresh copy of the page, which actually has no new content, making the request redundant. The web server will always respond with a freshly constructed version of the page, regardless of whether or not the data in the database has actually changed.
To avoid the possibility of a web site visitor viewing out-of-date content, most web developers use a meta tag or HTTP headers to tell the browser never to use a cached version of the page. However, this negates the web browser's natural ability to cache web pages, and entails some serious disadvantages. For example, the content delivered by a dynamic page may only change once a day, so there's certainly a benefit to be gained by having the browser cache a page--even if only for 24 hours.
If you're working with a small PHP application, it's usually possible to live with both issues. But as your site increases in complexity--and attracts more traffic--you'll begin to run into performance problems. Both these issues can be solved, however: the first with server-side caching; the second, by taking control of client-side caching from within your application. The exact approach you use to solve these problems will depend on your application, but in this chapter, we'll consider both PHP and a number of class libraries from PEAR as possible panaceas for your web page woes.
Note that in this chapter's discussions of caching, we'll look at only those solutions that can be implemented in PHP. For a more general introduction, the definitive discussion of web caching isrepresented by Mark Nottingham's tutorial.
Furthermore, the solutions in this chapter should not be confused with some of the script caching solutions that work on the basis of optimizing and caching compiled PHP scripts, such as Zend Accelerator and ionCube PHP Accelerator.
This chapter is excerpted from The PHP Anthology: 101 Essential Tips, Tricks & Hacks, 2nd Edition. Download this chapter plus two others, covering PDO and Databases, and Access Control, in PDF format to read offline.
How do I prevent web browsers from caching a page?
If timely information is crucial to your web site and you wish to prevent out-of-date content from ever being visible, you need to understand how to prevent web browsers--and proxy servers--from caching pages in the first place.
Solutions
There are two possible approaches we could take to solving this problem: using HTML meta tags, and using HTTP headers.
Using HTML Meta Tags
The most basic approach to the prevention of page caching is one that utilizes HTML meta tags:
The insertion of a date that's already passed into the Expires
meta tag tells the browser that the cached copy of the page is always out of date. Upon encountering this tag, the browser usually won't cache the page. Although the Pragma: no-cache
meta tag isn't guaranteed, it's a fairly well-supported convention that most web browsers follow. However, the two issues associated with this approach, which we'll discuss below, may prompt you to look at the alternative solution.
Using HTTP Headers
A better approach is to use the HTTP protocol itself, with the help of PHP's header function, to produce the equivalent of the two HTML meta tags above:
We can go one step further than this, using the Cache-Control
header that's supported by HTTP 1.1-capable browsers:
For a precise description of HTTP 1.1 Cache-Control headers, have a look at the W3C's HTTP 1.1 RFC. Another great source of information about HTTP headers, which can be applied readily to PHP, is mod_perl's documentation on issuing correct headers.
Discussion
Using the Expires
meta tag sounds like a good approach, but two problems are associated with it:
- The browser first has to download the page in order to read the meta tags. If a tag wasn't present when the page was first requested by a browser, the browser will remain blissfully ignorant and keep its cached copy of the original.
- Proxy servers that cache web pages, such as those common to ISPs, generally won't read the HTML documents themselves. A web browser might know that it shouldn't cache the page, but the proxy server between the browser and the web server probably doesn't--it will continue to deliver the same out-of-date page to the client.
On the other hand, using the HTTP protocol to prevent page caching essentially guarantees that no web browser or intervening proxy server will cache the page, so visitors will always receive the latest content. In fact, the first header should accomplish this on its own; this is the best way to ensure a page is not cached. The Cache-Control
and Pragma
headers are added for some degree of insurance. Although they don't work on all browsers or proxies, the Cache-Control
and Pragma
headers will catch some cases in which the Expires header doesn't work as intended--if the client computer's date is set incorrectly, for example.
Of course, to disallow caching entirely introduces the problems we discussed at the start of this chapter: it negates the web browser's natural ability to cache pages, and can create unnecessary overhead, as new versions of pages are always requested, even though those pages may not have been updated since the browser's last request. We'll look at the solution to these issues in just a moment.
How do I control client-side caching?
We addressed the task of disabling client-side caching in "How do I prevent web browsers from caching a page?", but disabling the cache is rarely the only (or best) option.
Here we'll look at a mechanism that allows us to take advantage of client-side caches in a way that can be controlled from within a PHP script.
Apache Required!
This approach will only work if you're running PHP as an Apache web server module, because it requires use of the function getallheaders--which only works with Apache--to fetch the HTTP headers sent by a web browser.
Solutions
In controlling client-side caching you have two alternatives. You can set a date on which the page will expire, or respond to the browser's request headers. Let's see how each of these tactics is executed.
Setting a Page Expiry Header
The header that's easiest to implement is the Expires
header--we use it to set a date on which the page will expire, and until that time, web browsers are allowed to use a cached version of the page. Here's an example of this header at work:
expires.php (excerpt)
' );
echo ( 'The GMT is now '.gmdate('H:i:s').'
' );
echo ( 'View Again
' );
?>
In this example, we created a custom function called setExpires
that sets the HTTP Expires
header to a point in the future, defined in seconds. The output of the above example shows the current time in GMT, and provides a link that allows us to view the page again. If we follow this link, we'll notice the time updates only once every ten seconds. If you like, you can also experiment by using your browser's Refresh button to tell the browser to refresh the cache, and watching what happens to the displayed date.
Acting on the Browser's Request Headers
A more useful approach to client-side cache control is to make use of the Last-Modified
and If-Modified-Since
headers, both of which are available in HTTP 1.0. This action is known technically as performing a conditional GET request; whether your script returns any content depends on the value of the incoming If-Modified-Since
request header.
If you use PHP version 4.3.0 and above on Apache, the HTTP headers are accessible with the functions apache_request_headers
and apache_response_headers
. Note that the function getallheaders
has become an alias for the new apache_request_headers
function.
This approach requires that you send a Last-Modified
header every time your PHP script is accessed. The next time the browser requests the page, it sends an If-Modified-Since
header containing a time; your script can then identify whether the page has been updated since that time. If it hasn't, your script sends an HTTP 304 status code to indicate that the page hasn't been modified, and exits before sending the body of the page.
Let's see these headers in action. The example below uses the modification date of a text file. To simulate updates, we first need to create a way to randomly write to the file:
ifmodified.php (excerpt)
array (0,1,1);
shuffle($random);
if ( $random[0] == 0 ) {
$fp = fopen($file, 'w');
fwrite($fp, 'x');
fclose($fp);
}
$lastModified = filemtime($file);
Our simple randomizer provides a one-in-three chance that the file will be updated each time the page is requested. We also use the filemtime
function to obtain the last modified time of the file.
Next, we send a Last-Modified
header that uses the modification time of the text file. We need to send this header for every page we render, to cause visiting browsers to send us the If-Modifed-Since
header upon every request:
ifmodified.php (excerpt)
header('Last-Modified: ' .
gmdate('D, d M Y H:i:s', $lastModified) . ' GMT');
Our use of the getallheaders
function ensures that PHP gives us all the incoming request headers as an array. We then need to check that the If-Modified-Since header actually exists; if it does, we have to deal with a special case caused by older Mozilla browsers (earlier than version 6), which appended an illegal extra field to their If-Modified-Since
headers. We use PHP's strtotime
function to generate a timestamp from the date the browser sent us. If there's no such header, we set this timestamp to zero, which forces PHP to give the visitor an up-to-date copy of the page:
ifmodified.php (excerpt)
$request = getallheaders();
if (isset($request['If-Modified-Since']))
{
$modifiedSince = explode(';', $request['If-Modified-Since']);
$modifiedSince = strtotime($modifiedSince[0]);
}
else
{
$modifiedSince = 0;
}
Finally, we check to see whether or not the cache has been modified since the last time the visitor received this page. If it hasn't, we simply send a 304 Not Modified
response header and exit the script, saving bandwidth and processing time by prompting the browser to display its cached copy of the page:
ifmodified.php (excerpt)
if ($lastModified <= $modifiedSince) { header('HTTP/1.1 304 Not Modified'); exit(); } echo ( 'The GMT is now '.gmdate('H:i:s').' ' ); echo ( 'View Again
' );
?>
Remember to use the "View Again" link when you run this example (clicking the Refresh button usually clears your browser's cache). If you click on the link repeatedly, the cache will eventually be updated; your browser will throw out its cached version and fetch a new page from the server.
If you combine the Last-Modified
header approach with time values that are already available in your application--for example, the time of the most recent news article--you should be able to take advantage of web browser caches, saving bandwidth and improving your application's perceived performance in the process.
Be very careful to test any caching performed in this manner, though; if you get it wrong, you may cause your visitors to consistently see out-of-date copies of your site.
Discussion
HTTP dates are always calculated relative to Greenwich Mean Time (GMT). The PHP function gmdate is exactly the same as the date function, except that it automatically offsets the time to GMT based on your server's system clock and regional settings.
When a browser encounters an Expires
header, it caches the page. All further requests for the page that are made before the specified expiry time use the cached version of the page--no request is sent to the web server. Of course, client-side caching is only truly effective if the system time on the computer is accurate. If the computer's time is out of sync with that of the web server, you run the risk of pages either being cached improperly, or never being updated.
The Expires
header has the advantage that it's easy to implement; in most cases, however, unless you're a highly organized person, you won't know exactly when a given page on your site will be updated. Since the browser will only contact the server after the page has expired, there's no way to tell browsers that the page they've cached is out of date. In addition, you also lose some knowledge of the traffic visiting your web site, since the browser will not make contact with the server when it requests a page that's been cached.
How do I examine HTTP headers in my browser?
How can you actually check that your application is running as expected, or debug your code, if you can't actually see the HTTP headers? It's worth knowing exactly which headers your script is sending, particularly when you're dealing with HTTP cache headers.
Solution
Several worthy tools are available to help you get a closer look at your HTTP headers:
LiveHTTPHeaders
This add-on to the Firefox browser is a simple but very handy tool for examining request and response headers while you're browsing.
Firebug
Another useful Firefox add-on, Firebug is a tool whose interface offers a dedicated tab for examining HTTP request information.
HTTPWatch
This add-on to Internet Explorer for HTTP viewing and debugging is similar to LiveHTTPHeaders above.
Charles Web Debugging Proxy
Available for Windows, Mac OS X, and Linux or Unix, the Charles Web Debugging Proxy is a proxy server that allows developers to see all the HTTP traffic between their browsers and the web servers to which they connect.
Any of these tools will allow you to inspect the communication between the server and browser.
How do I cache file downloads with Internet Explorer?
If you're developing file download scripts for Internet Explorer users, you might notice a few issues with the download process. In particular, when you're serving a file download through a PHP script that uses headers such as Content-Disposition: attachment, filename=myFile.pdf
or Content-Disposition: inline, filename=myFile.pdf
, and that tells the browser not to cache pages, Internet Explorer won't deliver that file to the user.
Solutions
Internet Explorer handles downloads in a rather unusual manner: it makes two requests to the web site. The first request downloads the file and stores it in the cache before making a second request, the response to which is not stored. The second request invokes the process of delivering the file to the end user in accordance with the file's type--for instance, it starts Acrobat Reader if the file is a PDF document. Therefore, if you send the cache headers that instruct the browser not to cache the page, Internet Explorer will delete the file between the first and second requests, with the unfortunate result that the end user receives nothing!
If the file you're serving through the PHP script won't change, one solution to this problem is simply to disable the "don't cache" headers, pragma
and cache-control
, which we discussed in "How do I prevent web browsers from caching a page?", for the download script.
If the file download will change regularly, and you want the browser to download an up-to-date version of it, you'll need to use the Last-Modified
header that we met in "How do I control client-side caching?", and ensure that the time of modification remains the same across the two consecutive requests. You should be able to achieve this goal without affecting users of browsers that handle downloads correctly.
One final solution is to write the file to the file system of your web server and simply provide a link to it, leaving it to the web server to report the cache headers for you. Of course, this may not be a viable option if the file is supposed to be secured.
How do I use output buffering for server-side caching?
Server-side processing delay is one of the biggest bugbears of dynamic web pages. We can reduce server-side delay by caching output. The page is generated normally, performing database queries and so on with PHP; however, before sending it to the browser, we capture and store the finished page somewhere--in a file, for instance. The next time the page is requested, the PHP script first checks to see whether a cached version of the page exists. If it does, the script sends the cached version straight to the browser, avoiding the delay involved in rebuilding the page.
Solution
Here, we'll look at PHP's in-built caching mechanism, the output buffer, which can be used with whatever page rendering system you prefer (templates or no templates). Consider situations in which your script displays results using, for example, echo or print, rather than sending the data directly to the browser. In such cases, you can use PHP's output control functions to store the data in an in-memory buffer, which your PHP script has both access to and control over.
Here's a simple example that demonstrates how the output buffer works:
buffer.php (excerpt)
';
$buffer = ob_get_contents();
ob_end_clean();
echo '2. A normal echo
';
echo $buffer;
?>
The buffer itself stores the output as a string. So, in the above script, we commence buffering with the ob_startfunction
, and use echo
to display a piece of text which is stored in the output buffer automatically. We then use the ob_get_contents
function to fetch the data the echo statement placed in the buffer, and store it in the $buffer
variable. The ob_end_clean
function stops the output buffer and empties the contents; the alternative approach is to use the ob_end_flushfunction
, which displays the contents of the buffer.
The above script displays the following output:
2. A normal echo
1. Place this in the buffer
In other words, we captured the output of the first echo, then sent it to the browser after the second echo. As this simple example suggests, output buffering can be a very powerful tool when it comes to building your site; it provides a solution for caching, as we'll see in a moment, and is also an excellent way to hide errors from your site's visitors, as is discussed in Chapter 9. Output buffering even provides a possible alternative to browser redirection in situations such as user authentication.
In order to improve the performance of our site, we can store the output buffer contents in a file. We can then call on this file for the next request, rather than having to rebuild the output from scratch again. Let's look at a quick example of this technique. First, our example script checks for the presence of a cache file:
sscache.php (excerpt)
If the script finds the cache file, we simply output its contents and we're done! If the cache file is not found, we proceed to output the page using the output buffer:
sscache.php (excerpt)
ob_start();
?>
html public "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
This page was cached with PHP's
.net/outcontrol"
>Output Control Functions
Before we flush the output buffer to display our page, we make sure to store the buffer contents in the $buffer
variable.
The final step is to store the saved buffer contents in a text file:
sscache.php (excerpt)
$fp = fopen('./cache/page.cache','w');
fwrite($fp,$buffer);
fclose($fp);
?>
The page.cache
file contents are exactly same as the HTML that was rendered by the script:
This page was cached with PHP's
Output Control Functions
Discussion
For an example that shows how to use PHP's output buffering capabilities to handle errors more elegantly, have a look at the PHP Freaks article Introduction to Output Buffering, by Derek Ford.
Template engines often include template caching features--Smarty is a case in point. Usually, these engines offer a built-in mechanism for storing a compiled version of a template (that is, the native PHP generated from the template), which prevents us developers from having to recompile the template every time a page is requested.
This process should not be confused with output--or content--caching, which refers to the caching of the rendered HTML (or other output) that PHP sends to the browser. In addition to the content cache mechanisms discussed in this chapter, Smarty can cache the contents of the HTML page. Whether you use Smarty's content cache or one of the alternatives discussed in this chapter, you can successfully use both template and content caching together on the same site.
HTTP Headers and Output Buffering
Output buffering can help solve the most common problem associated with the header
function, not to mention the issues surrounding session_start
and set_cookie
. Normally, if you call any of these functions after page output has begun, you'll get a nasty error message. When output buffering's turned on, the only output types that can escape the buffer are HTTP headers. If you use ob_start at the very beginning of your application's execution, you can send headers at whichever point you like, without encountering the usual errors. You can then write out the buffered page content all at once, when you're sure that no more HTTP headers are required.
Use Output Buffering Responsibly
While output buffering can helpfully solve all our header problems, it should not be used solely for that reason. By ensuring that all output is generated after all the headers are sent, you'll save the time and resource overheads involved in using output buffers.
How do I cache just the parts of a page that change infrequently?
Caching an entire page is a simplistic approach to output buffering. While it's easy to implement, that approach negates the real benefits presented by PHP's output control functions to improve your site's performance in a manner that's relevant to the varying lifetimes of your content.
No doubt, some parts of the page that you send to visitors will change very rarely, such as the page's header, menus, and footer. But other parts--for example, the list of comments on your blog posts--may change quite often. Fortunately, PHP allows you to cache sections of the page separately.
Solution
Output buffering can be used to cache sections of a page in separate files. The page can then be rebuilt for output from these files.
This technique eliminates the need to repeat database queries, while loops, and so on. You might consider assigning each block of the page an expiry date after which the cache file is recreated; alternatively, you may build into your application a mechanism that deletes the cache file every time the content it stores is changed.
Let's work through an example that demonstrates the principle. Firstly, we'll create two helper functions, writeCache
and readCache
. Here's the writeCache
function:
smartcache.php (excerpt)
The writeCache
function is quite simple; it just writes the content of the first argument to a file with the name specified in the second argument, and saves that file to a location in the cache directory. We'll use this function to write our HTML to the cache files.
The readCache
function will return the contents of the cache file specified in the first argument if it has not expired--that is, the file's last modified time is not older than the current time minus the number of seconds specified in the second argument. If it has expired or the file does not exist, the function returns false:
smartcache.php (excerpt)
function readCache($filename, $expiry)
{
if (file_exists('./cache/' . $filename))
{
if ((time() - $expiry) > filemtime('./cache/' . $filename))
{
return false;
}
$cache = file('./cache/' . $filename);
return implode('', $cache);
}
return false;
}
For the purposes of demonstrating this concept, I've used a procedural approach. However, I wouldn't recommend doing this in practice, as it will result in very messy code and is likely to cause issues with file locking. For example, what happens when someone accesses the cache at the exact moment it's being updated? Better solutions will be explained later on in the chapter.
Let's continue this example. After the output buffer is started, processing begins. First, the script calls readCache
to see whether the file header.cache
exists; this contains the top of the page--the HTML tag and the start
tag. We've used PHP's date function to display the time at which the page was actually rendered, so you'll be able to see the different cache files at work when the page is displayed:
smartcache.php (excerpt)
ob_start();
if (!$header = readCache('header.cache', 604800))
{
?>
W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The header time is now:
Note what happens when a cache file isn't found: the header content is output and assigned to a variable, $header
, with ob_get_contents
, after which the ob_clean
function is called to empty the buffer. This allows us to capture the output in "chunks" and assign them to individual cache files with the writeCache
function. The header of the page is now stored as a file, which can be reused without our needing to rerender the page. Look back to the start of the if condition for a moment. When we called readCache
, we gave it an expiry time of 604800 seconds (one week); readCache
uses the file modification time of the cache file to determine whether the cache is still valid.
For the body of the page, we'll use the same process as before. However, this time, when we call readCache
, we'll use an expiry time of five seconds; the cache file will be updated whenever it's more than five seconds old:
smartcache.php (excerpt)
if (!$body = readCache('body.cache', 5))
{
echo 'The body time is now: ' . date('H:i:s') . '
';
$body = ob_get_contents();
ob_clean();
writeCache($body, 'body.cache');
}
The page footer is effectively the same as the header. After the footer, the output buffering is stopped and the contents of the three variables that hold the page data are displayed:
smartcache.php (excerpt)
if (!$footer = readCache('footer.cache', 604800)) {
?>
The footer time is now:
The end result looks like this:
The header time is now: 17:10:42
The body time is now: 18:07:40
The footer time is now: 17:10:42
The header and footer are updated on a weekly basis, while the body is updated whenever it is more than five seconds old. If you keep refreshing the page, you'll see the body time updating.
Discussion
Note that if you have a page that builds content dynamically, based on a number of variables, you'll need to make adjustments to the way you handle your cache files. For example, you might have an online shopping catalog whose listing pages are defined by a URL such as:
http://example.com/catalogue/view.php?category=1&page=2
This URL should show page two of all items in category one; let's say this is the category for socks. But if we were to use the caching code above, the results of the first page of the first category we looked at would be cached, and shown for any request for any other page or category, until the cache expiry time elapsed. This would certainly confuse the next visitor who wanted to browse the category for shoes--that person would see the cached content for socks!
To avoid this issue, you'll need to incorporate the category ID and page number in to the cache file name like so:
$cache_filename = 'catalogue_' . $category_id . '_' .
$page . '.cache';
if (!$catalogue = readCache($cache_filename, 604800))
{
...display the category HTML...
}
This way, the correct cached content can be retrieved for every request.
Nesting Buffers
You can nest one buffer within another practically ad infinitum simply by calling ob_startmore than once. This can be useful if you have multiple operations that use the output buffer, such as one that catches the PHP error messages, and another that deals with caching. Care needs to be taken to make sure that ob_end_flush
or ob_end_clean
is called every time ob_start
is used.
Problems
How do I use PEAR::Cache_Lite
for server-side caching?
The previous solution explored the ideas behind output buffering using the PHP ob_*
functions. Although we mentioned at the time, that approach probably isn't the best way to meet to dual goals of keeping your code maintainable and having a reliable caching mechanism. It's time to see how we can put a caching system into action in a manner that will be reliable and easy to maintain.
Solution
In the interests of keeping your code maintainable and having a reliable caching mechanism, it's a good idea to delegate the responsibility of caching logic to classes you trust. In this case, we'll use a little help from PEAR::Cache_Lite
(version 1.7.2 is used in the examples here). Cache_Lite
provides a solid yet easy-to-use library for caching, and handles issues such as: file locking; creating, checking for, and deleting cache files; controlling the output buffer; and directly caching the results from function and class method calls. More to the point, Cache_Lite
should be relatively easy to apply to an existing application, requiring only minor code modifications.
Cache_Lite
has four main classes. First is the base class, Cache_Lite
, which deals purely with creating and fetching cache files, but makes no use of output buffering. This class can be used alone for caching operations in which you have no need for output buffering, such as storing the contents of a template you've parsed with PHP.
The examples here will not use Cache_Lite
directly, but will instead focus on the three subclasses. Cache_Lite_Function
can be used to call a function or class method and cache the result, which might prove useful for storing a MySQL query result set, for example. The Cache_Lite_Output
class uses PHP's output control functions to catch the output generated by your script and store it in cache files; it allows you to perform tasks such as those we completed in "How do I cache just the parts of a page that change infrequently?". The Cache_Lite_File
class bases cache expiry on the timestamp of a master file, with any cache file being deemed to have expired if it is older than the timestamp.
Let's work through an example that shows how you might use Cache_Lite
to create a simple caching solution. When we're instantiating any child classes of Cache_Lite
, we must first provide an array of options that determine the behavior of Cache_Lite
itself. We'll look at these options in detail in a moment. Note that the cacheDir
directory we specify must be one to which the script has read and write access:
cachelite.php (excerpt)
'./cache/',
'writeControl' => 'true',
'readControl' => 'true',
'fileNameProtection' => false,
'readControlType' => 'md5'
);
$cache = new Cache_Lite_Output($options);
For each chunk of content that we want to cache, we need to set a lifetime (in seconds) for which the cache should live before it's refreshed. Next, we use the start method, available only in the Cache_Lite_Output
class, to turn on output buffering. The two arguments passed to the start method are an identifying value for this particular cache file, and a cache group. The group is an identifier that allows a collection of cache files to be acted upon; it's possible to delete all cache files in a given group, for example (more on this in a moment). The start method will check to see if a valid cache file is available and, if so, it will begin outputting the cache contents. If a cache file is not available, start will return false and begin caching the following output.
Once the output for this chunk has finished, we use the end
method to stop buffering and store the content as a file:
cachelite.php (excerpt)
$cache->setLifeTime(604800);
if (!$cache->start('header', 'Static')) {
?>
html public "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
PEAR::Cache_Lite example
The header time is now:
end();
}
To cache the body and footer, we follow the same procedure we used for the header. Note that, again, we specify a five-second lifetime when caching the body:
cachelite.php (excerpt)
$cache->setLifeTime(5);
if (!$cache->start('body', 'Dynamic')) {
echo 'The body time is now: ' . date('H:i:s') . '
';
$cache->end();
}
$cache->setLifeTime(604800);
if (!$cache->start('footer', 'Static')) {
?>
The footer time is now:
end();
}
?>
On viewing the page, Cache_Lite
creates cache files in the cache directory. Because we've set the fileNameProtection
option to false, Cache_Lite
creates the files with these names:
- ./cache/cache_Static_header
- ./cache/cache_Dynamic_body
- ./cache/cache_Static_footer
You can read about the fileNameProtection
option--and many more--in "What configuration options does Cache_Lite
support?". When the same page is requested later, the code above will use the cached file if it is valid and has not expired.
Protect your Cache Files
Make sure that the directory in which you place the cache files is not publicly available, or you may be offering your site's visitors access to more than you realize.
What configuration options does Cache_Lite
support?
When instantiating Cache_Lite
(or any of its subclasses, such as Cache_Lite_Output
), you can use any of a number of approaches to controlling its behavior. These options should be placed in an array and passed to the constructor as shown below (and in the previous section):
$options = array(
'cacheDir' => './cache/',
'writeControl' => true,
'readControl' => true,
'fileNameProtection' => false,
'readControlType' => 'md5'
);
$cache = new Cache_Lite_Output($options);
Solution
The options available in the current version of Cache_Lite
(1.7.2) are:
cacheDir
This is the directory in which the cache files will be placed. It defaults to /tmp/
.
caching
This option switches on and off the caching behavior of Cache_Lite
. If you have numerous Cache_Lite
calls in your code and want to disable the cache for debugging, for example, this option will be important. The default value is true
(caching enabled).
lifeTime
This option represents the default lifetime (in seconds) of cache files. It can be changed using the setLifeTime
method. The default value is 3600
(one hour), and if it's set to null, the cache files will never expire.
fileNameProtection
With this option activated, Cache_Lite
uses an MD5 encryption hash to generate the filename for the cache file. This option protects you from error when you try to use IDs or group names containing characters that aren't valid for filenames; fileNameProtection
must be turned on when you use Cache_Lite_Function
. The default is true
(enabled).
fileLocking
This option is used to switch the file locking mechanisms on and off. The default is true
(enabled).
writeControl
This option checks that a cache file has been written correctly immediately after it has been created, and throws a PEAR::Error if it finds a problem. Obviously, this facility would allow your code to attempt to rewrite a cache file that was created incorrectly, but it comes at a cost in terms of performance. The default value is true
(enabled).
readControl
This option checks any cache files that are being read to ensure they're not corrupt. Cache_Lite is able to place inside the file a value, such as the string length of the file, which can be used to confirm that the cache file isn't corrupt. There are three alternative mechanisms for checking that a file is valid, and they're specified using the readControlType
option. These mechanisms come at the cost of performance, but should help to guarantee that your visitors aren't seeing scrambled pages. The default value is true
(enabled).
readControlType
This option lets you specify the type of read control mechanism you want to use. The available mechanisms are a cyclic redundancy check (crc32
, the default value) using PHP's crc32
function, an MD5 hash using PHP's md5
function (md5
), or a simple and fast string length check (strlen
). Note that this mechanism is not intended to provide security from people tampering with your cache files; it's just a way to spot corrupt files.
pearErrorMode
This option tells Cache_Lite how it should return PEAR errors to the calling script. The default is CACHE_LITE_ERROR_RETURN
, which means Cache_Lite will return a PEAR::Error object.
memoryCaching
With memory caching enabled, every time a file is written to the cache, it is stored in an array in Cache_Lite
. The saveMemoryCachingState
and getMemoryCachingState
methods can be used to store and access the memory cache data between requests. The advantage of this facility is that the complete set of cache files can be stored in a single file, reducing the number of disk read/write operations by reconstructing the cache files straight into an array to which your code has access. The memoryCaching
option may be worth further investigation if you run a large site. The default value is false
(disabled).
onlyMemoryCaching
If this option is enabled, only the memory caching mechanism will be used. The default value is false
(disabled).
memoryCachingLimit
This option places a limit on the number of cache files that will be stored in the memory caching array. The more cache files you have, the more memory will be used up by memory caching, so it may be a good idea to enforce a limit that prevents your server from having to work too hard. Of course, this option places no restriction on the size of each cache file, so just one or two massive files may cause a problem. The default value is 1000
.
automaticSerialization
If enabled, this option will automatically serialize all data types. While this approach will slow down the caching system, it is useful for caching nonscalar data types such as objects and arrays. For higher performance, you might consider serializing nonscalar data types yourself. The default value is false
(disabled).
automaticCleaningFactor
This option will automatically clean old cache entries--on average, one in x cache writes, where x is the value set for this option. Therefore, setting this value to 0
will indicate no automatic cleaning, and a value of 1will cause cache clearing on every cache write. A value of 20
to 200
is the recommended starting point if you wish to enable this facility; it causes cache cleaning to happen, on average, 0.5% to 5% of the time. The default value is 0
(disabled).
hashedDirectoryLevel
When set to a nonzero value, this option will enable a hashed directory structure. A hashed directory structure will improve the performance of sites that have thousands of cache files. If you choose to use hashed directories, start by setting this value to 1
, and increasing it as you test for performance improvements. The default value is 0
(disabled).
errorHandlingAPIBreak
This option was added to enable backwards compatibility with code that uses the old API. When the old API was run in CACHE_LITE_ERROR_RETURN
mode (see the pearErrorMode
option earlier in this list), some functions would return a Boolean value to indicate success, rather than returning a PEAR_Error
object. By setting this value to true, the PEAR_Error
object will be returned instead. The default value is false
(disable).
How do I purge the Cache_Lite
cache?
The built-in lifetime mechanism for Cache_Lite
cache files provides a good foundation for keeping your cache files up to date, but there will be some circumstances in which you need the files to be updated immediately.
Solution
In cases in which you need immediate updates, the methods remove and clean come in handy. The remove method is designed to delete a specific cache file; it takes as arguments the cache ID and group name of the file. To delete the page body cache file we created in "How do I use PEAR::Cache_Lite for server-side caching?", we'd use this code:
$cache->remove('body', 'Dynamic');
If we use the clean method, we can delete all the files in our cache directory simply by calling the method with no arguments; alternatively, we can specify a group of cache files to delete. If we wanted to delete both the header and footer cache files we created in "How do I use PEAR::Cache_Lite for server-side caching?", we could do so like this:
$cache->clean('Static');
Discussion
The remove and clean methods should obviously be called in response to events that arise within an application. For example, if you have a discussion forum application, you probably want to remove the relevant cache files when a visitor posts a new message.
Although it may seem like this solution entails a lot of code modifications, with some care it can be applied to your application in a global manner. If you have a central script that's included in every page, your script can simply watch for incoming events--for example, a variable like $_GET['newPost']
--and respond by deleting the required cache files. This keeps the cache file removal mechanism central and easier to maintain. You might also consider using the php.ini
setting auto_prepend_file
to include this code in every PHP script.
How do I cache function calls?
Many web sites provide access to their data via web services such as SOAP and XML-RPC. (You can read all about web services in Chapter 12.) As web services are accessed over a network, it's often a very good idea to cache results so that they can be fetched locally, rather than repeating the same slow request to the server multiple times. A simple approach might be to use PHP sessions, but as that solution operates on a per-visitor basis, the opening requests for each visitor will still be slow.
Solution
Let's assume you wish to create a web page that lists all the SitePoint books available on Amazon. The actual list is not likely to change from moment to moment, so why would we make the request to the Amazon web service every time the web page is displayed? We won't! Instead, we can take advantage of Cache_Lite
by caching the results of the XML-RPC request.
Requires PEAR::SOAP Version 0.11.0
The following solution uses the PEAR::SOAP library version 0.11.0 to access the Amazon web service. You can find this package on the PEAR web site.
Here's some hypothetical code that fetches the data from the remote Amazon server:
$results = $amazonClient->ManufacturerSearchRequest($params);
Using Cache_Lite_Function
, we can cache the results so the data returned from the service can be reused; this will avoid unnecessary network calls and significantly improve performance.
The following example code focuses on the caching aspect to prevent us from getting bogged down in the details of using the Amazon web service. You can see the complete script if you download this book's code archive from the SitePoint web site.
The Cache_Lite_Function
requires the inclusion of the following file:
cachefunction.php (excerpt)
require_once 'Cache/Lite/Function.php';
We instantiate the Cache_Lite_Function
class with some options:
cachefunction.php (excerpt)
$options = array(
'cacheDir' => './cache/',
'fileNameProtection' => true,
'writeControl' => true,
'readControl' => true,
'readControlType' => 'strlen',
'defaultGroup' => 'SOAP'
);
$cache = new Cache_Lite_Function($options);
It's important that the fileNameProtection
option is set to true
(this is in fact the default value, but in this case I've set it manually to emphasize the point). If it were set to false
, the filename would be invalid, so the data will not be cached.
Here's how we make the calls to our SOAP client class:
cachefunction.php (excerpt)
$results = $cache->call('amazonClient->ManufacturerSearchRequest',
$params);
If the request is being made for the first time, Cache_Lite_Function
will store the results as a serialized array or object in a cache file (not that you need to worry about this), and this file will be used for future requests until it expires. The setLifeTime
method can again be used to specify how long the cache files should survive before they're refreshed; currently, the default value of 3600 seconds (one hour) is being used. You can then use the $results
variable exactly as if you were calling the web service method directly. The output of our example script can be seen in Figure 11.1.
Summary
Caching is an important and often overlooked aspect of web site development. Many factors that affect the performance of today's web sites weren't a problem for their predecessors--from complex, dynamic page generation, to a reliance on third-party data over the network. In this chapter, we've examined HTML meta tags, HTTP headers, PHP output buffering and PEAR::Cache_Lite
, and we've seen how you can use them to control the caching of your web site content and improve the site's reliability and performance.
Implementing a caching system for your site might be simple, but ultimately, it depends on your requirements. If you have a busy and predominantly static web site--such as a blog--that's managed through a content management system, it will likely require little alteration, yet may benefit from huge performance improvements resulting from a small investment of your time. Setting up caching for a more complex site that generates content on a per-user basis, such as a portal or shopping cart system, will prove a little more tricky and time consuming, but the benefits are still clear.
Regardless, I hope the information in this chapter has given you a good grasp of the options available, and will help you determine which techniques are most suitable for your application. Don't forget to download this chapter, plus two others -- PDO and Databases, and Access Control -- to enjoy offline. For information on the contents of the book's other chapters, check out the full Table of Contents.
SOURCE: http://www.sitepoint.com/article/caching-php-performance/