Notes for Contributers
If you intend to contribute to the development of HTSeq, these notes will
help you to get started.
Source code
The source code is on an Subversion repository, hosted on SourceForge.
To check out the repository, use
svn co https://htseq.svn.sourceforge.net/svnroot/htseq/trunk htseq
To browse the repository, see here.
Languages
A good part of HTSeq is actually not written in Python but in
Cython. In case you don’t know it yet: Cython, a fork from Pyrex, is a
kind of Python compiler. You annotate Python code with additional type
informations (the lines starting with cdef in the source code). Cython
will then transform the Cython source file (with extension pyx) into
a C file, which calls the appropriate funnctions of Python’s C API. Without
type annotation, this looks and feels the same as normal Python and is not
really faster, either. With type annotation, significant performance gains
are possible, especially in inner loops.
A small part, namely the StepVector class, is written in C++ and exported with
SWIG. (SWIG, the “Simple Wrapper and Interface Generator” is a very useful
tool to generate C/C++ code to wrap an existing C/C++ library such that it
becomes accessible as a native library within a number of scripting languages.)
I am not so happy with this any more (the abstraction panelty of the object-oriented
SWIG wrapping turned out to be a bit high) and ultimatively want to rewrite this
part.
Build process
I do not want to burden the user with having to install
SWIG or Cython. Both these tools work by generating C/C++ code which then can
be compiled without the need of any files from SWIG or Cython. Hence, I’ve
divided the build process into two steps:
- Step 1: Generate the C/C++ files from the SWIG and Cython source files.
This is done by the calling make in the src directory. Note that
the Makefile there contains only calls to cython and swig but
not to the C compiler. (Note: I am using Cython 0.11. Compiling with
Cython 0.12 does not work at the moment, but I will update at some point.)
- Step 2: The C files are compiled and copied together with the Python source
files into a build directory. This is done by calling python setup.py build
in the root directory. It creates (as usual for a setup.py script) a new
directory build and in it a subdirectory for the machine architecture,
which then contains the package directory.
To test during development, set the PYTHONPATH to point to the maschine-specific
directory in the build directory, so that Python can find the HTSeq directory
that setup.py build puts there. Whenever you make a change, call the shell
script build_it, which contains just two lines: the first calls make in src,
the second calls setup.py build.
Distributing
To wrap up a package, call build_it (or at least make in src)
and then setup.py sdist. This makes a directory dists and in there,
a tarball with all the source files (Python and C/C++) and all kinds of other stuff
(unfortunately including the example_files directory, that I hence always delete manually
before running setup.py sdist to keep the package lean). The tarball contains, when unpacked
the setup.py script, which allows installing with setup.py install.
I am using setuptools (and not distutils) so that I can make Python eggs with
setup.py bdist_egg. For Windows binaries, I use setup.py bdist_wininst --compiler=mingw32
on my virtual Windows box.
Files
The package contains the following source files:
- HTSeq/__init__.py:
- The outer face of HTSeq. This file defines the name space of HTSeq and contains
the definition of all classes without performance-critical methods. The file
imports _HTSeq in its own namespace, so that, for the user, it does not matter
whether an object is defined here or in _HTSeq.pyx.
- src/HTSeq/_HTSeq.pyx:
- The core of HTSeq. All classes with perfomance-critical methods are defined here.
For most of it, this file looks as a normal Python file. Only where performance
is critical, type annotation has been added. See the Cython manual for details.
- src/HTSeq/_HTSeq.pxd:
- The “header file” for _HTSeq.pyx. It contains the type annotation for all the
fields of the classes defined in _HTSeq.pyx. If a user would want to write her
own Cython code, she could use Cython’s cimport directive to import this header
file and so make Cython aware of the typed definitions of fields and methods in
_HTSeq.pyx, which may improve performance because it allows Cython to kick out
all unnecessary type checking.
- HTSeq/_HTSeq_internal.py:
- There are a few limitation to the standard Python code allowed in Cython files;
most importantly, the yield statement is not yet supported. Hence, _HTSeq.pyx
imports this file, and whenever a method in _HTSeq.pyx needs a yield,
it calls a function which is put in here.
- src/step_vector.h:
- The C++ step_vector class template. As this is a pure template, there is no
step_vector.cc file with definitions. If you want to use a step_vector in
a C++ project, this is all you need.
- src/StepVector.i:
- An input file to SWIG, which produces the Python wrapper around step_vector.h, i.e.,
the StepVector module containing the StepVector class. Note that this file contains
not only SWIG directives but also Python and come C++ code.
- src/AutoPyObjPtr.i:
- A very small SWIG library that allows SWIG-wrapped C++ container classes to
store Python objects in a way that Python’s garbage collector is happy with.
- HTSeq/scripts/count.py and HTSeq/scripts/qa.py:
- The source code for the stand-alone scripts htseq-count and htseq-qa.
They reside in the sub-package HTSeq.scripts, allowing to call the scripts
with, e.g., python -m HTSeq.scripts.qa.
- scripts/htseq-count and scripts/htseq-qa:
- Short stubs to call the scripts from the command line simply as, e.g., htseq-qa.
Furthermore, there are these files to support development:
- src/Makefile:
- Generates C/C++ files from SWIG and Cython source files but does no C/C++ compiling.
- setup.py:
- A typical setuptools setup.py file.
- build_it:
- A three-line shell script that
* generates a file HTSeq/_version.py from the file VERSION.
* calls make in src to process src/Makefile
* runs setup.py build (see above)
- clean:
- Another two-line shell script to first call make clean in src and then
delete whatever setup.py may have written.
- test.py
- Performs all the deoctests in the documentation, using the example data in the
example_data directory.
Finally, there are these files
- VERSION:
- a one-line text-fil with the version number. It is read by setup.py, used
by build_it to generate the one-line Python file HTSeq/_version.py and
also used when building the documentation.
- MANIFEST.in:
- Brings some files to the attention of setup.py sdist which would otherwise not
be included
- LICENCE:
- The GPL, v3
- README:
- Points the user to the web site.
and these directories
- doc/:
- this documentation, in Sphinx reStructuredText format, and a Makefile to drive
Sphinx.
- example_files/:
- a few example files to be use by the doctests in the documentation.