Running zodbshootout

Executable

zodbshootout can be executed in one of two ways. The first and most common is via the zodbshootout script created by pip or buildout:

# In an active environment with zodbshootout on the path
$ zodbshootout ...arguments...
# in a non-active virtual environment
$ path/to/venv/bin/zodbshootout ...arguments...

zodbshootout can also be directly invoked as a module using the python interpreter where it is installed:

python -m zodbshootout

This documentation will simply refer to the zodbshootout script, but both forms are equivalent.

Tip

For the most repeatable, stable results, it is important to choose a fixed value for the hash seed used for Python’s builtin objects (str, bytes, etc). On CPython 2, this means not passing the -R argument to the interpreter, and not having the PYTHONHASHSEED environment variable set to random. On CPython 3, this means having the PYTHONHASHSEED environment variable set to a fixed value. On a Unix-like system, this invocation will work for both versions:

$ PYTHONHASHSEED=0 zodbshootout ...arguments...

Configuration File

Caution

zodbshootout packs each of the databases specified in the configuration file. This results in the permanent deletion of historical revisions, and if the database is a part of a multi-database (mount points) could result in POSKeyError and broken links. Do not configure it to open production databases!

The zodbshootout script requires the name of a database configuration file. The configuration file contains a list of databases to test, in ZConfig format. The script packs each of the databases, then writes and reads the databases while taking measurements. Finally, the script produces a tabular summary of objects written or read per second in each configuration. zodbshootout uses the names of the databases defined in the configuration file as the table column names.

An example of a configuration file testing the built-in ZODB file storage, a few variations of ZEO, and RelStorage would look like this:

# This configuration compares a database running raw FileStorage
# (no ZEO), along with a databases running FileStorage behind ZEO
# with a persistent ZEO cache, with some other databases.
#
# *This test can only run with a concurrency level of 1 if using
# multiple processes. To use higher concurrency levels, you need to
# use ``--threads``.*

%import relstorage

<zodb fs>
    <filestorage>
        path var/Data2.fs
    </filestorage>
</zodb>

<zodb zeofs_pcache>
    <zeoclient>
        server localhost:24003
        client 0
        var var
        cache-size 200000000
    </zeoclient>
</zodb>

<zodb zeo_fs>
    <zeoclient>
        server localhost:24003
    </zeoclient>
</zodb>

<zodb mysql_hf>
    <relstorage>
        keep-history false
        poll-interval 5
        <mysql>
            db relstoragetest_hf
            user relstoragetest
            passwd relstoragetest
        </mysql>
    </relstorage>
</zodb>

<zodb mysql_hf_mc>
    <relstorage>
        keep-history false
        poll-interval 5
        cache-module-name relstorage.pylibmc_wrapper
        cache-servers localhost:24005
        <mysql>
            db relstoragetest_hf
            user relstoragetest
            passwd relstoragetest
        </mysql>
    </relstorage>
</zodb>

The corresponding ZEO configuration file would look like this:

<zeo>
  address 24003
  read-only false
  invalidation-queue-size 100
  pid-filename var/zeo.pid
  # monitor-address PORT
  # transaction-timeout SECONDS
</zeo>

<filestorage 1>
  path var/Data.fs
</filestorage>

Note

If you’ll be using RelStorage, you’ll need to have the appropriate RDBMS processes installed, running, and properly configured. Likewise, if you’ll be using ZEO, you’ll need to have the ZEO server running. For pointers to more information, see Installation.

Options

The zodbshootout script accepts the following options. A description of each option follows the text output.

$ zodbshootout --help
usage: zodbshootout [-h] [-n COUNTS] [-s OBJECT_SIZE] [--btrees [{IO,OO}]]
                    [--zap] [--min-objects MIN_OBJECT_COUNT] [--blobs]
                    [-r REPETITIONS] [--test-reps TEST_REPS] [-c CONCURRENCY]
                    [--threads [{shared,unique}]]
                    [--log [{CRITICAL,ERROR,WARNING,INFO,DEBUG}]]
                    [--dump-json [DUMP_JSON]] [-p PROFILE_DIR] [-l]
                    config_file

positional arguments:
  config_file

optional arguments:
  -h, --help            show this help message and exit

Profiling:
  Control over profiling the database

  -p PROFILE_DIR, --profile PROFILE_DIR
                        Profile all tests and output results to the specified
                        directory
  -l, --leaks           Check for object leaks after every repetition. This
                        only makes sense with --threads

Objects:
  Control the objects put in ZODB

  -n COUNTS, --object-counts COUNTS
                        Object counts to use (default 1000). Use this option
                        as many times as you want.
  -s OBJECT_SIZE, --object-size OBJECT_SIZE
                        Size of each object in bytes (estimated, default
                        approx. 128)
  --btrees [{IO,OO}]    Use BTrees. An argument, if given, is the family name
                        to use, either IO or OO. Specifying --btrees by itself
                        will use an IO BTree; not specifying it will use
                        PersistentMapping.
  --zap                 Zap the entire RelStorage before running tests. This
                        will destroy all data.
  --min-objects MIN_OBJECT_COUNT
                        Ensure the database has at least this many objects
                        before running tests.
  --blobs               Use Blobs instead of pure persistent objects.

Concurrency:
  Control over concurrency

  -c CONCURRENCY, --concurrency CONCURRENCY
                        Concurrency levels to use. Default is 2. Use this
                        option as many times as you want.
  --threads [{shared,unique}]
                        Use threads instead of multiprocessing. If you don't
                        give an argument or you give the 'shared' argument,
                        then one DB will be used by all threads. If you give
                        the 'unique' argument, each thread will get its own
                        DB.

Repetitions:
  Control over test repetitions

  -r REPETITIONS, --repetitions REPETITIONS
                        Number of repetitions of the complete test. The
                        average values out of this many repetitions will be
                        displayed. Default is 3.
  --test-reps TEST_REPS
                        Number of repetitions of individual tests (such as
                        add/update/cold/warm). The average times of this many
                        repetitions will be used. Default is 20.

Output:
  Control over the output

  --log [{CRITICAL,ERROR,WARNING,INFO,DEBUG}]
                        Enable logging in the root logger at the given level
                        (INFO)
  --dump-json [DUMP_JSON]
                        Dump the results in JSON to the specified file. Use
                        '-' for stdout (or if no path is given). NOTE: The
                        JSON format is undocumented and subject to change at
                        any time. It is intended to capture more information
                        than the printed CSV summary can in order to enable
                        better statistical analysis.

Objects

These options control the objects put in the database.

  • -n (--object-counts) specifies how many persistent objects to write or read per transaction. The default is 1000. An interesting value to use is 1, causing the test to primarily measure the speed of opening connections and committing transactions.

    Changed in version 0.6: Specify this option more than once to run the tests with different object counts.

  • --btrees causes the data to be stored in the BTrees optimized for ZODB usage (without this option, a PersistentMapping will be used). This is an advanced option that may be useful when tuning particular applications and usage scenarios. This adds additional objects to manage the buckets that make up the BTree. However, if IO BTrees are used (the default when this option is specified) internal storage of keys as integers may reduce pickle times and sizes (and thus improve cache efficiency). This option can take an argument of either IO or OO to specify the type of BTree to use.

    This option is especially interesting on PyPy or when comparing the pure-Python implementation of BTrees to the C implementation.

    New in version 0.6.

  • --zap recreates the tables and indexes for a RelStorage database. This option completely destroys any existing data. You will be prompted to confirm that you want to do this for each database that supports it. This is handy for comparing Python 2 and Python 3 (which can’t otherwise use the same database schemas).

    Caution

    This option destroys all data in the relevant database.

    New in version 0.6.

  • --min-objects ensures that at least the specified number of objects exist in the database independently of the objects being tested. If the database packs away objects or if --zap is used, this option will add back the necessary number of objects. If there are more objects, nothing will be done. This option is helpful for testing for scalability issues.

    New in version 0.7.

  • --blobs causes zodbshootout to read and write blobs instead of simple persistent objects. This can be useful for testing options like shared blob dirs on network filesystems, or RelStorage’s blob-chunk-size, or for diagnosing performance problems. If objects have to be added to meet the --min-objects count, they will also be blobs. Note that because of the way blobs work, there will be two times the number of objects stored as specified in --object-counts. Expect this option to cause the test to be much slower.

    New in version 0.7.

Concurrency

These options control the concurrency of the testing.

  • -c (--concurrency) specifies how many tests to run in parallel. The default is 2. Each of the concurrent tests runs in a separate process to prevent contention over the CPython global interpreter lock. In single-host configurations, the performance measurements should increase with the concurrency level, up to the number of CPU cores in the computer. In more complex configurations, performance will be limited by other factors such as network latency.

    Changed in version 0.6: Specify this option more than once to run the tests with different concurrency levels.

  • --threads uses in-process threads for concurrency instead of multiprocessing. This can demonstrate how the GIL affects various database adapters under RelStorage, for instance. It can also have demonstrate the difference that warmup time makes for things like PyPy’s JIT.

    By default or if you give the shared argument to this option, all threads will share one ZODB DB object and re-use Connections from the same pool; most threaded applications will use ZODB in this manner. If you specify the unique argument, then each thread will get its own DB object. In addition to showing how the thread locking strategy of the underlying storage affects things, this can also highlight the impact of shared caches.

    New in version 0.6.

  • --gevent monkey-patches the system and uses cooperative greenlet concurrency in a single process (like --threads, which it implies; you can specify --threads unique to change the database sharing).

    This option is only available if gevent is installed.

    Note

    Not all storage types will work properly with this option. RelStorage will, but make sure you select a gevent-compatible driver like PyMySQL or pg8000 for best results. If your driver is not compatible, you may experience timeouts and failures, including UnexpectedChildDeathError. zodbshootout attempts to compensate for this, but may not always be successful.

    New in version 0.6.

Repetitions

These options control how many times tests are repeated.

  • -r (--repetitions) determines how many iterations of the complete test suite will be compared together to find the best time. Higher values can reduce jitter. Higher values are especially useful on platforms that have a warmup period (like PyPy’s JIT). The default is 3.

    New in version 0.6.

  • --test-reps determines how many times each individual test (such as add/update/cold/warm) will be repeated.

    New in version 0.6.

Profiling

  • -p (--profile) enables the Python profiler while running the tests and outputs a profile for each test in the specified directory. Note that the profiler typically reduces the database speed by a lot. This option is intended to help developers isolate performance bottlenecks.

    New in version 0.6.

  • --leaks prints a summary of possibly leaking objects after each test repetition. This is useful for storage and ZODB developers.

    New in version 0.6.

Output

These options control the output produced.

  • --log enables logging to the console at the specified level. If no level is specified but this option is given, then INFO logging will be enabled. This is useful for details about the workings of a storage and the effects various options have on it.

    New in version 0.6.

  • --dump-json writes a JSON structure containing the raw data collected to the file given (or if no file is given, to stdout). This can be useful for doing a more sophisticated analysis.

    Note

    The JSON structure is subject to change at any time.

    New in version 0.6.

You should write a configuration file that models your intended database and network configuration. Running zodbshootout may reveal configuration optimizations that would significantly increase your application’s performance.