Running zodbshootout
¶
Executable¶
zodbshootout
can be executed in one of two ways. The first and
most common is via the zodbshootout
script created by pip or
buildout:
# In an active environment with zodbshootout on the path
$ zodbshootout ...arguments...
# in a non-active virtual environment
$ path/to/venv/bin/zodbshootout ...arguments...
zodbshootout
can also be directly invoked as a module using the
python interpreter where it is installed:
python -m zodbshootout
This documentation will simply refer to the zodbshootout
script,
but both forms are equivalent.
Tip
For the most repeatable, stable results, it is important to choose
a fixed value for the hash seed used for Python’s builtin objects
(str, bytes, etc). On CPython 2, this means not passing the -R
argument to the interpreter, and not having the PYTHONHASHSEED
environment variable set to random
. On CPython 3, this means
having the PYTHONHASHSEED
environment variable set to a fixed value. On a Unix-like system,
this invocation will work for both versions:
$ PYTHONHASHSEED=0 zodbshootout ...arguments...
Configuration File¶
The zodbshootout
script requires the name of a database
configuration file. The configuration file contains a list of
databases to test, in ZConfig format. The script then writes and reads
each of the databases while taking measurements. During this process,
the measured times are output for each test of each database; there
are a number of command-line options to control the output or save it
to files for later analysis. (See the pyperf user guide for
information on configuring the output and adjusting the benchmark
process.)
An example of a configuration file testing the built-in ZODB file storage, a few variations of ZEO, and RelStorage would look like this:
# This configuration compares a database running raw FileStorage
# (no ZEO), along with a databases running FileStorage behind ZEO
# with a persistent ZEO cache, with some other databases.
#
# *This test can only run with a concurrency level of 1 if using
# multiple processes. To use higher concurrency levels, you need to
# use ``--threads``.*
%import relstorage
<zodb fs>
<filestorage>
path var/Data2.fs
</filestorage>
</zodb>
<zodb zeofs_pcache>
<zeoclient>
server localhost:24003
client 0
var var
cache-size 200000000
</zeoclient>
</zodb>
<zodb zeo_fs>
<zeoclient>
server localhost:24003
</zeoclient>
</zodb>
<zodb mysql_hf>
<relstorage>
keep-history false
poll-interval 5
<mysql>
db relstoragetest_hf
user relstoragetest
passwd relstoragetest
</mysql>
</relstorage>
</zodb>
<zodb mysql_hf_mc>
<relstorage>
keep-history false
poll-interval 5
cache-module-name relstorage.pylibmc_wrapper
cache-servers localhost:24005
<mysql>
db relstoragetest_hf
user relstoragetest
passwd relstoragetest
</mysql>
</relstorage>
</zodb>
The corresponding ZEO configuration file would look like this:
<zeo>
address 24003
read-only false
invalidation-queue-size 100
pid-filename var/zeo.pid
# monitor-address PORT
# transaction-timeout SECONDS
</zeo>
<filestorage 1>
path var/Data.fs
</filestorage>
Note
If you’ll be using RelStorage, you’ll need to have the appropriate RDBMS processes installed, running, and properly configured. Likewise, if you’ll be using ZEO, you’ll need to have the ZEO server running. For pointers to more information, see Installation.
Options¶
The zodbshootout
script accepts the following options. A
description of each option follows the text output.
$ zodbshootout --help
usage: zodbshootout [-h] [--rigorous] [--fast] [--debug-single-value]
[-p PROCESSES] [-n VALUES] [-w WARMUPS] [-l LOOPS] [-v]
[-q] [--pipe FD] [-o FILENAME] [--append FILENAME]
[--min-time MIN_TIME] [--worker] [--worker-task TASK_ID]
[--calibrate-loops] [--recalibrate-loops]
[--calibrate-warmups] [--recalibrate-warmups] [-d]
[--metadata] [--hist] [--stats] [--affinity CPU_LIST]
[--inherit-environ VARS] [--no-locale] [--python PYTHON]
[--compare-to REF_PYTHON]
[--python-names REF_NAME:CHANGED_NAMED]
[--tracemalloc | --track-memory]
[--object-counts [OBJECTS_PER_TXN]] [-s OBJECT_SIZE]
[--btrees [{IO,OO}]] [--zap [ZAP]]
[--min-objects MIN_OBJECT_COUNT] [--blobs] [--pack]
[-c [CONCURRENCY]] [--threads [{shared,unique}]]
[--fail-fast] [--log [LOG]] [--profile PROFILE_DIR]
[--profiler {cProfile}]
[--include-mapping [INCLUDE_MAPPING]] [--leaks]
config_file
[{all,add,cold,ex_commit,im_commit,conflicts,hot,new_oid,prefetch_cold,readCurrent,steamin,store,update,warm,tpc,-add,-cold,-ex_commit,-im_commit,-conflicts,-hot,-new_oid,-prefetch_cold,-readCurrent,-steamin,-store,-update,-warm,-tpc} [{all,add,cold,ex_commit,im_commit,conflicts,hot,new_oid,prefetch_cold,readCurrent,steamin,store,update,warm,tpc,-add,-cold,-ex_commit,-im_commit,-conflicts,-hot,-new_oid,-prefetch_cold,-readCurrent,-steamin,-store,-update,-warm,-tpc} ...]]
Benchmark
positional arguments:
config_file
{all,add,cold,ex_commit,im_commit,conflicts,hot,new_oid,prefetch_cold,readCurrent,steamin,store,update,warm,tpc,-add,-cold,-ex_commit,-im_commit,-conflicts,-hot,-new_oid,-prefetch_cold,-readCurrent,-steamin,-store,-update,-warm,-tpc}
optional arguments:
-h, --help show this help message and exit
--rigorous Spend longer running tests to get more accurate
results
--fast Get rough answers quickly
--debug-single-value Debug mode, only compute a single value
-p PROCESSES, --processes PROCESSES
number of processes used to run benchmarks (default:
20)
-n VALUES, --values VALUES
number of values per process (default: 3)
-w WARMUPS, --warmups WARMUPS
number of skipped values per run used to warmup the
benchmark
-l LOOPS, --loops LOOPS
number of loops per value, 0 means automatic
calibration (default: 0)
-v, --verbose enable verbose mode
-q, --quiet enable quiet mode
--pipe FD Write benchmarks encoded as JSON into the pipe FD
-o FILENAME, --output FILENAME
write results encoded to JSON into FILENAME
--append FILENAME append results encoded to JSON into FILENAME
--min-time MIN_TIME Minimum duration in seconds of a single value, used to
calibrate the number of loops (default: 100 ms)
--worker Worker process, run the benchmark.
--worker-task TASK_ID
Identifier of the worker task: only execute the
benchmark function TASK_ID
--calibrate-loops calibrate the number of loops
--recalibrate-loops recalibrate the the number of loops
--calibrate-warmups calibrate the number of warmups
--recalibrate-warmups
recalibrate the number of warmups
-d, --dump display benchmark run results
--metadata, -m show metadata
--hist, -g display an histogram of values
--stats, -t display statistics (min, max, ...)
--affinity CPU_LIST Specify CPU affinity for worker processes. This way,
benchmarks can be forced to run on a given set of CPUs
to minimize run to run variation. By default, worker
processes are pinned to isolate CPUs if isolated CPUs
are found.
--inherit-environ VARS
Comma-separated list of environment variables
inherited by worker child processes.
--no-locale Don't copy locale environment variables like LANG or
LC_CTYPE.
--python PYTHON Python executable (default: use running Python,
sys.executable)
--compare-to REF_PYTHON
Run benchmark on the Python executable REF_PYTHON, run
benchmark on Python executable PYTHON, and then
compare REF_PYTHON result to PYTHON result
--python-names REF_NAME:CHANGED_NAMED
option used with --compare-to to name PYTHON as
CHANGED_NAME and REF_PYTHON as REF_NAME in results
--tracemalloc Trace memory allocations using tracemalloc
--track-memory Track memory usage using a thread
Profiling:
Control over profiling the database
--profile PROFILE_DIR
Profile all tests and output results to the specified
directory
--profiler {cProfile}
The profiler to use. Must be specified with 'profile-
dir'
--include-mapping [INCLUDE_MAPPING]
Benchmark a MappingStorage. This serves as a
floor.Default is true; use any value besides 'true',
'yes' or 'on' to disable.
--leaks Check for object leaks after every repetition. This
only makes sense with --threads
Objects:
Control the objects put in ZODB
--object-counts [OBJECTS_PER_TXN]
Object counts to use (default 1000).
-s OBJECT_SIZE, --object-size OBJECT_SIZE
Size of each object in bytes (estimated, default
approx. 300)
--btrees [{IO,OO}] Use BTrees. An argument, if given, is the family name
to use, either IO or OO. Specifying --btrees by itself
will use an IO BTree; not specifying it will use
PersistentMapping.
--zap [ZAP] Zap the entire RelStorage before running tests. This
will destroy all data. An argument of 'force' does
this without prompting for all databases. An argument
that is a comma-separated list of databases will zap
those database without prompting.
--min-objects MIN_OBJECT_COUNT
Ensure the database has at least this many objects
before running tests.
--blobs Use Blobs instead of pure persistent objects.
--pack Pack the storage before populating it.
Concurrency:
Control over concurrency
-c [CONCURRENCY], --concurrency [CONCURRENCY]
Concurrency level to use. Default is 2.
--threads [{shared,unique}]
Use threads instead of multiprocessing. If you don't
give an argument or you give the 'shared' argument,
then one DB will be used by all threads. If you give
the 'unique' argument, each thread will get its own
DB.
--fail-fast Fail at the first benchmark failure instead of
continuing.
Output:
Control over the output
--log [LOG] Enable logging in the root logger at the given level.
Without an argument, the default is INFO. You may
specify DEBUG, ERROR, CRITICAL, etc. Or you may give a
path to a file that can be used by ZConfig to
configure the loggers. See
https://zconfig.readthedocs.io/en/latest/using-
logging.html
Changed in version 0.7: You can now specify just a subset of benchmarks to run by giving their names as extra command line arguments after the configuration file.
Objects¶
These options control the objects put in the database.
--object-counts
specifies how many persistent objects to write or read per transaction. The default is 1000.Changed in version 0.7: The old alias of
-n
is no longer accepted; pyperf uses that to determine the number of loop iterations.Also, this can now only be used once.
Changed in version 0.6: Specify this option more than once to run the tests with different object counts.
--btrees
causes the data to be stored in the BTrees optimized for ZODB usage (without this option, a PersistentMapping will be used). This is an advanced option that may be useful when tuning particular applications and usage scenarios. This adds additional objects to manage the buckets that make up the BTree. However, if IO BTrees are used (the default when this option is specified) internal storage of keys as integers may reduce pickle times and sizes (and thus improve cache efficiency). This option can take an argument of either IO or OO to specify the type of BTree to use.This option is especially interesting on PyPy or when comparing the pure-Python implementation of BTrees to the C implementation.
New in version 0.6.
--zap
recreates the tables and indexes for a RelStorage database or a ZODB FileStorage. This option completely destroys any existing data. You will be prompted to confirm that you want to do this for each database that supports it. This is handy for comparing Python 2 and Python 3 (which can’t otherwise use the same database schemas).Caution
This option destroys all data in the relevant database.
Changed in version 0.7: You can now specify an argument of
force
to disable the prompt and zap all databases. You can also give a comma separated list of database names to zap; only those databases will be cleared (without prompting).New in version 0.6.
--min-objects
ensures that at least the specified number of objects exist in the database independently of the objects being tested. If the database packs away objects or if--zap
is used, this option will add back the necessary number of objects. If there are more objects, nothing will be done. This option is helpful for testing for scalability issues.New in version 0.7.
--blobs
causes zodbshootout to read and write blobs instead of simple persistent objects. This can be useful for testing options like shared blob dirs on network filesystems, or RelStorage’s blob-chunk-size, or for diagnosing performance problems. If objects have to be added to meet the--min-objects
count, they will also be blobs. Note that because of the way blobs work, there will be two times the number of objects stored as specified in--object-counts
. Expect this option to cause the test to be much slower.New in version 0.7.
Concurrency¶
These options control the concurrency of the testing.
-c
(--concurrency
) specifies how many tests to run in parallel. The default is 2. Each of the concurrent tests runs in a separate process to prevent contention over the CPython global interpreter lock. In single-host configurations, the performance measurements should increase with the concurrency level, up to the number of CPU cores in the computer. In more complex configurations, performance will be limited by other factors such as network latency.Changed in version 0.7: This option can only be used once.
Changed in version 0.6: Specify this option more than once to run the tests with different concurrency levels.
--threads
uses in-process threads for concurrency instead of multiprocessing. This can demonstrate how the GIL affects various database adapters under RelStorage, for instance. It can also have demonstrate the difference that warmup time makes for things like PyPy’s JIT.By default or if you give the
shared
argument to this option, all threads will share one ZODB DB object and re-use Connections from the same pool; most threaded applications will use ZODB in this manner. If you specify theunique
argument, then each thread will get its own DB object. In addition to showing how the thread locking strategy of the underlying storage affects things, this can also highlight the impact of shared caches.New in version 0.6.
--gevent
monkey-patches the system and uses cooperative greenlet concurrency in a single process (like--threads
, which it implies; you can specify--threads unique
to change the database sharing).This option is only available if gevent is installed.
Note
Not all storage types will work properly with this option. RelStorage will, but make sure you select a gevent-compatible driver like PyMySQL or pg8000 for best results. If your driver is not compatible, you may experience timeouts and failures, including
UnexpectedChildDeathError
. zodbshootout attempts to compensate for this, but may not always be successful.New in version 0.6.
Repetitions¶
These options control how many times tests are repeated.
Changed in version 0.7: The old -r
and --test-reps
options were removed. Instead,
use the --loops
, --values
and --processes
options
provided by pyperf.
Profiling¶
-p
(--profile
) enables the Python profiler while running the tests and outputs a profile for each test in the specified directory. Note that the profiler typically reduces the database speed by a lot. This option is intended to help developers isolate performance bottlenecks.New in version 0.6.
--leaks
prints a summary of possibly leaking objects after each test repetition. This is useful for storage and ZODB developers.Changed in version 0.7: The old
-l
alias is no longer accepted.New in version 0.6.
Output¶
These options control the output produced.
Changed in version 0.7: The --dump-json
argument was removed in favor of pyperf’s
native output format, which enables much better analysis using
pyperf show
.
If the -o
argument is specified, then in addition to creating a
single file containing all the test runs, a file will be created
for each database, allowing for direct comparisons using pyperf’s
compare_to
command.
--log
enables logging to the console at the specified level. If no level is specified but this option is given, then INFO logging will be enabled. This is useful for details about the workings of a storage and the effects various options have on it.Changed in version 0.8: This option can also take a path to a ZConfig logging configuration file.
New in version 0.6.
You should write a configuration file that models your intended
database and network configuration. Running zodbshootout
may reveal
configuration optimizations that would significantly increase your
application’s performance.