mmap, file descriptors, STREAMS ?? WTF??
Roger Oberholtzer
roger
Thu Aug 4 03:36:52 PDT 2005
On Wed, 2005-08-03 at 08:52 -0500, Ben Duncan wrote:
> Ok, I have found what looks like would be a good starting point in my
> creating a embedded database for my SLAG project. This is intended to be
> a multi-user multi read/writable file manager with multi-value attribute
> extensions.
> It will be based upon tdb, which is the database manager SAMBA uses. It, in
> turn, is based upon GDBM, but has several extensions such as multi user and
> row locking.
>
> As in SO much of SLAG, I am having to learn a bunch of new stuff and my nearing 50 year
> old brain is about to explode.
>
> It deals with mmap versus STREAMS (FILE *) versus "user buffering".
> Here is what I can gather thus far:
>
> Since this is going to be a FOSS Appgen replacement, I have studied as much as I can
> on AppGen's headers and C API, to come up with how Appgen does it.
> Appgen uses a key/data pair with 512 byte block and uses a "internal buffering"
> mechanism. There are routines that deal with "virtual memory" file system and
> row locking is handled by using a semaphore locking byte in the file itself.
>
> Now looking at tdb, it uses "file descriptors", but establishes using mmap.
> MMAP is used thru and thru on tdb.
>
> Reading on linuxselfhelp about the Glibc file stuff, suggest that one uses STREAM's as
> opposed to file descriptors. That this is more " efficient " then using file descriptors,
mmap is simply another way to access a file. It makes more sense to use
when you are moving around in a file. A database is a prime candidate
for being mmap()ed. mmap access to non-sequential places in a file is
much faster than using fseek() on a FILE. The syntax is also much
easier. Just map the file in to some pointer (for example to a pointer
to the basic data structure in the file) and then forget that there is a
file behind it. The file is accessed as though it was an array or a
structure in memory. mmap is faster because it uses the virtual memory
system to hold parts of the file. There is no API or functions to access
a mmap()ed file (after it is mmap()ed, that is). Just pointer math.
> Other reading materials on "how to write a database" suggest using file descriptors
> and do your own "buffering".
As in building your own television. It begs the question: why, when you
can buy a damned good one? The fact that you can does not mean that you
should.
> HELP!!! I am now confused. Is not "mmap" a way to do "user buffering" ? Is not , based
> upon the above info, Appgen doing something along the lines of "mmap"? Is "user buffering"
> even the same as "mmap"? is "STREAMS" more effective than mmap or "user buffering"?
mmap maintains pages of virtual memory - just like all the memory you
use. Each time it needs a part of a file, it instructs the virtual
memory system (same stuff that does ordinary memory swapping to a disk
swap space) to get a page from the file. If your memory pages are, say,
4K, then you get buffering in that 4K is always read from and available
from a file in the VM page. The reason that mmap is faster here is that
the 4K is available direct in your memory. You do not need to do a fread
to get it. When you ask for something past the page you are in, the
virtual memory system swaps what you need. If you have adequate memory,
the parts of the filying that you are no longer using may still be in
the virtual memory system and thus available via a bit of fast pointer
magic done by the kernel when you access it again - simply via the
pointer you mmap()ed in. From a user program it cannot ghet easier.
If making a database with fixed size key/pairs, access can ve very fast.
Are you relying on buffering for something? Or are you just hoping that
it will result in more speed?
As you can guess, I like mmap. On most systems, mmap is even for reading
a sequential than FILE or even read/write. Oracle, for example, gets
great speed because all databases are mmap()ed. (As well as a few other
things like async file systems.)
So, go the mmap route. It is portable to any Unix/Linux and even
Windows. Not that the latter is any concern, I guess.
>
> As always, thanks ....
>
More information about the Linux-users
mailing list