last month (november 2001) i concluded that in asp.net, caching is the key to performance if you want to exploit web controls and maintain optimal server response times. caching relates directly to applications that can work disconnected from the data source. not all applications can afford this. applications that work in a highly concurrent environment that need to detect incoming changes to data can't be adapted to work disconnected. however, there are scenarios where you have a large block of user-specific data that needs to be analyzed, sorted, aggregated, scrolled, and filtered. in this case, your users need to extrapolate numbers and trends, but aren't interested in the last-minute record. in this case, server-side caching can be a key advantage.
data caching can mean two things. you can temporarily park your frequently used data into memory data containers, or you can persist them to disk on the web server or a machine downstream. but what is the ideal format for this data? and what is the most efficient way to load it back into an in-memory binary usable format? these are the questions i will answer this month.
ado.net and xml
ado.net and xml are the core technologies that help you design an effective caching subsystem. ado.net provides a namespace of data-oriented classes through which you can build a rough but functional in-memory dbms. xml is the input and output language of this subsystem, but it's much more than the language used to serialize and deserialize living instances of ado.net objects. if you have xml documents formatted like data—hierarchical documents with equally sized subtrees—you can synchronize them to ado.net objects and use both xml-related technologies and relational approaches to walk through the collection of data rows. although ado.net and xml are tightly integrated, only one ado.net object has the ability to publicly manipulate xml for reading and writing. this object is called the dataset.
asp.net apps often end up handling dataset objects. dataset objects are returned by data adapter classes, which are one of the two ado.net command classes that get in touch with remote data sources. datasets can also be created from local data—any valid stream object can be read into and populate a dataset object.
the dataset has a powerful, feature-rich programming interface and works as an in-memory cache of disconnected data. it is structured as a collection of tables and relationships. this makes it suitable when you have to work with related tables of data. using datasets, all of your tables are stored in a single container. this container knows how to serialize its content to xml and how to restore it to its original state. what more could you ask for from a data container?
devising an xml-based caching system
the majority of asp.net applications could take advantage of the cache object for all of their caching needs. the cache object is new to asp.net and provides unique and powerful features. it is a global, thread-safe object that does not store information on a per-session basis. in addition, the cache is designed to ensure it does not tax the server's memory whatsoever. if memory pressure does become an issue, the cache will automatically purge less recently used items based on a priority defined by the developer.
like application, though, the cache object does not share its state across the machines of a web farm. i'll have more to say about the cache object later. aside from web farms, there are a few tough scenarios you might want to consider as alternatives to cache. even when you have large datasets to store on a per-session basis, storing and reloading them from memory will be faster than any other approach. however, with many users connected at the same time, each storing large blocks of data, you might want to consider helping the cache object to do its job better. an app-specific layered caching system built around the cache object is an option. in this case, sensitive data will go into the cache efficiently managed by asp.net. the rest of them could be cached in a slower but memory-free storage—for example, session-specific xml files. let's look at writing and reading datasets from disk.
saving intermediate data to disk is a caching alternative that significantly reduces the demands on the web server. to be effective, though, it should involve minimum overhead—just the time necessary to serialize and deserialize data. custom schemas and proprietary data formats are unfit for this technique because the extra steps required introduce a delay. in .net, you can use the dataset object to fetch data and to persist it to disk. the dataset object natively provides methods to save to xml and to load from it. these procedures, along with the internal representation of the dataset, have been carefully optimized. they let you save and restore xml files in an amount of time that grows linearly (rather than geometrically) with the size of the data to process. so instead of storing persistent data sets to session, you can save them on the server on a per-user basis with temporary xml files.
to recognize the xml file of a certain session, use the session id—an ascii sequence of letters and digits that uniquely identifies a connected user. to avoid the proliferation of such files, you kill them when the session ends. saving dataset objects to xml does not affect the structure of the app, as it will continue to work with the dataset object in mind. the writing and reading is performed by a couple of ad hoc methods provided by the dataset object with a little help from .net stream objects.
a layered caching system
if you want to use a cache mechanism to store data across multiple requests of the same page, your code will probably look like figure 1. when the page first loads, you fetch all the data needed using the private member datafromsourcetomemory. this function reads the rows from the data source and stores them into the cache, whatever it is. then requests for the page will result in a call to deserializedatasource to fetch data. this call will try to load the dataset from the cache and will resort to other physical access to the underlying dbms if an exception is thrown. this can happen if the file is deleted from its location for any reason. figure 2 shows the app's global.asax file. in the onend event, the code deletes the xml file whose name matches the current session id.
the global.asax file resides in the root directory of an asp.net application. when you run an asp.net application, you must use a virtual directory. if you test an asp.net page outside a virtual directory, you won't capture any session or application event in your global.asax file. also, while session_onstart is always raised, the session_onend event is not guaranteed to fire in an out-of-process scenario.
each active asp.net session is tracked using a 120-bit string that is composed of url-legal ascii characters. session id values are generated so uniqueness and randomness are guaranteed. this avoids collisions and makes it harder to guess the session id of an existing session.
the following code shows how to use session id to persist to and reload data from disk, serializing a dataset to an xml file.
void serializedatasource(dataset ds)
{
string strfile;
strfile = server.mappath(session.sessionid + ".xml");
xmltextwriter xtw = new xmltextwriter(strfile, null);
ds.writexml(xtw);
xtw.close();
}
that code is equivalent to storing the dataset in a session slot.
session["mydataset"] = ds;
of course, the functionality of the previous two approaches is actually radically different.
to read back previously saved data, you can use this code:
dataset deserializedatasource()
{
string strfile;
strfile = server.mappath(session.sessionid + ".xml");
// read the content of the file into a dataset
xmltextreader xtr = new xmltextreader(strfile);
dataset ds = new dataset();
ds.readxml(xtr);
xtr.close();
return ds;
}
this function locates an xml file whose name matches the id of the current session and loads it into a newly created dataset object. if you have a caching system based on the session object, you should use this routine to replace any code that looks like this:
dataset ds = (dataset) session["mydataset"];
how many of you remember the ibm 360/370s? when i was a first-year university student, i learned about memory management on them, which introduced virtual memory as a way to increase performance. it is structured like a pyramid of storage devices with decreasing size and increasing speed as you move from the bottom up.
why all this history? because an app-specific layered caching system built around the cache object, even in the toughest scenario with the most stringent scalability requirements, can help the cache object to perform in a better and more effective way.
figure 3 shows some of the elements that could form the asp.net caching pyramid, but the design is not set in stone. the number and the type of layers are completely up to you, and are application-specific. in several web applications, only one level is used: the dbms tables level. if scalability is important, and your data is mostly disconnected, a layered caching system is almost a must.
(待续)