# Uploading to PyPI

I found myself within a forest dark, For the straightforward pathway had been lost…

Edit: I have been kindly informed that there is, in fact, an official guide for uploading packages to PyPI, located here.

I’m a research assistant in a Speech Informatics lab at the University of Minnesota. One of my tasks recently has been to take a Python script written by a former student, make it more flexible and turn it into a legit Python package. I spent a few weeks working on the code, making sure it worked as we needed it to, de-duplicating code, validating it, etc. It installed fine locally. Then two days ago I tried to register the package at pypi.python.org so we could install the package using pip, along with the rest of our analysis code.

I think I have it working now, but honestly, it’s been a terrible experience. I’ve had to pull resources from documentation, message boards (both stackexchange and google groups), blogs, and other assorted places on the internet. Why isn’t this all in one place somewhere? In the end, many of the things that worked ended up being the results of trial-and-error. Part of the problem is that my package is a little complicated: it requires including some datasets that should be installed in the same directory. It also requires compiling a stand-alone C file. But really, that shouldn’t be a big deal.

In order to document the process in case I have to repeat it in the future, and to help guide any poor soul who is embarking on this perilous journey with no prior experience, I’m writing up my notes in detail and sharing them with the world.

I’ve been using Python for various projects for a few years now. Still, I’m a scientist, not a software engineer, so many of the following specifics were not obvious to me. Apologies if something that follows is not the most efficient way to do something - I’m simply writing the documentation I wish I’d had two days ago in hopes that it might be useful to others.

# Package structure

This has always been a little confusing to me, but this project has cleared it up a little bit. Your package should be structured something like this:

YOUR-PROJECT-FOLDER
├── CHANGES.txt (OPTIONAL)
├── LICENSE.txt
├── MANIFEST.in
├── README
├── docs (FOLDER)
├── setup.py
└── PACKAGENAME (FOLDER)
├── __init__.py
├── Makefile (OPTIONAL)
├── FILE1.py
├── FILE2.py
├── data (FOLDER, OPTIONAL)
│   ├── included_data.dat
└── example (FOLDER)
└── EXAMPLE.txt

A few important things here are:

• the name of the project folder on your computer doesn’t really matter. It doesn’t have to be the same as the package name. To me, it’s less confusing if the two names are different.

• the PACKAGENAME folder is the name that you’ll use to import your package into Python. So if your package is my_awesome_package, the folder with your code needs to be my_awesome_package too. Note that Pythonistas recommend having one-name package names with no capital letters.

Now for the key files:

• CHANGES.txt: (optional) keep notes on package versions, etc.

• LICENSE.txt: The license that you’re using for releasing your package. If you want to share it and want people to use it, you should include a license. As far as I understand, not including one is the same thing as claiming “All Rights Reserved”, meaning people can’t legally re-share your code. Disclaimer: I am not a lawyer, and I don’t know a ton about the available licenses.

• MANIFEST.in: When setup.py builds your package, it includes *.py files in your package folder by default. If you want any other files included in the .tar.gz file that gets created and uploaded to PyPI (more on this later), you need to include those filenames in MANIFEST.in. Here’s what the contents of mine looks like:

#documentation
recursive-include docs/_build/html *

#data used for clustering
recursive-include vfclust/data *

#phonetic representation
include vfclust/t2p/t2p.c
include vfclust/t2p/t2pin.tmp
include vfclust/Makefile

#example files
include vfclust/example/EXAMPLE.*
include vfclust/example/EXAMPLE_sem.*

#Misc
include CHANGES.txt
include LICENSE.txt

My package name is vfclust, so I’m including the contents of certain folders within the package folder that my package requires in order to run (data files, etc). I also include CHANGES.txt and LICENSE.txt, since I don’t think those are included by default (?). Finally, notice the recursive-include syntax. The way I wrote it, everything in the specified folders is included as well.

• README: can also be README.md or README.rst. The format needs to be rst (reStructured Text) for it to display appropriately on PyPI. I go into this further down.

• docs: (optional) this is where I put the formatted documentation. More on this later.

• setup.py: This is where the magic happens - where your package is defined, how it’s setup, etc. Much more on this later.

• PACKAGENAME: This is where your actual package code goes. The FILE1.py, FILE2.py, etc. are the package files that do the work of your package.

• the __init__.py file tells the world that PACKAGENAME is a Python package that is importable using >> import PACKAGENAME. This is important: __init__.py turns a folder full of Python scripts into a package that can be imported. It can be an empty file, or not, but whatever is inside will get executed when you import the package. If you want to use the other modules (files) in the folder from within Python, and you probably do, put from PACKAGENAME import * inside the __init__.py file.

• data and example: These folders could be named anything, or could be omitted. Any data files you want to distribute along with your package should be in a folder like this WITHIN the package folder.

## .tar.gz vs site-packages

This was a serious source of confusion for me. When you build your package (more on this below), setup.py creates a dist/ directory in your project directory and puts everything on the specified packages (more on this below also) along with everything in the MANIFEST.in file in a single .tar.gz file in that directory. You can then upload the .tar.gz file to PyPI and make it available there. However, when you use pip install PACKAGENAME, only *.py files will be installed into site-packages (where pip puts packages you download). This means that any data files, C files, or anything else you want available when using your package must be explicitly included in BOTH MANIFESET.in AND setup.py. As far as I understand, there are things (like formatted documentation) that you may want to include with the full *.tar.gz package but might not want to bury within site-packages. This makes some sense, but it’s still somewhat obnoxious to have to specify included files in two places. The setuptools documentation says that you only have to specify files in setup.py, but that didn’t work for me.

## setup.py

This file is like magic, and is key to creating and installing your package, as well as keeping track of the version and formatting for PyPI. There are two commonly-used tools for creating your setup.py file: distutils and setuptools. The internet tells me that setuptools is more modern and fixes some fo the problems with distutils, but I don’t really understand what. The syntax for both is nearly identical, so it’s easy to switch back and forth.

The setup.py file must, at a minimum, include something like the following. This is modified from this project, which I found to be a helpful resource.

from setuptools import setup, find_packages  # Always prefer setuptools over distutils
from codecs import open  # To use a consistent encoding
from os import path

here = path.abspath(path.dirname(__file__))

# Get the long description from the relevant file
with open(path.join(here, 'README'), encoding='utf-8') as f:
long_description = f.read()

setup(
name='sample',

# Versions should comply with PEP440.  For a discussion on single-sourcing
# the version across setup.py and the project code, see
# http://packaging.python.org/en/latest/tutorial.html#version
version='1.2.0',

description='A sample Python project',
long_description=long_description,  #this is the

# The project's main homepage.
url='https://github.com/whatever/whatever',

# Author details
author='yourname',
author_email='your@address.com',

# Choose your license
license='MIT',

# See https://PyPI.python.org/PyPI?%3Aaction=list_classifiers
classifiers=[
# How mature is this project? Common values are
#   3 - Alpha
#   4 - Beta
#   5 - Production/Stable
'Development Status :: 3 - Alpha',

# Indicate who your project is intended for
'Intended Audience :: Developers',
'Topic :: Software Development :: Build Tools',

# Pick your license as you wish (should match "license" above)
'License :: OSI Approved :: MIT License',

# Specify the Python versions you support here. In particular, ensure
# that you indicate whether you support Python 2, Python 3 or both.
'Programming Language :: Python :: 2.7',
],

# What does your project relate to?
keywords='sample setuptools development',

packages=["MY-PACKAGE"],

)

setup() is just a function called when you run python setup.py install within the project directory. Install is only one of the things it can do - you can also python setup.py sdist to build your package into a .tar.gz file, python setup.py develop to tell Python to look in the project directory rather than site-packages when doing import, etc.

I’ll go through these setup() arguments and some others in turn.

• name: This is what your package will be called, in big bold letters, on the new PyPI page for your package. Pick something you like. This will also be the root name for the .tar.gz files created. Those are formatted like PACKAGENAME-VERSION.tar.gz. Which brings us to…

• version: This is the version number, obviously. Note that PyPI forces you to make a new version for each new upload. So you MUST change this for any new upload to PyPI (unlike git or other version control systems, where you can make a new commit without worrying about version numbers). I ended up with version numbers that look like 0.1.0.12 until I got finished debugging my package on PyPI. Luckily you can go on PyPI and remove unwanted versions.

• description: Short description used on PyPI.

• long_description: Written on main page of your PyPI package, meaning it should be formatted as reStructured text. I just used my README file.

• author etc.: Used to list authors, contact information, etc on PyPI.

• classifiers: Used to categorize your package on PyPI, so people can find it while browsing or searching.

• packages: Your distribution may have more than one package. That is, when you install this package, you may want to be able to do import package1 and import package2. You can also packages and sub-packages. All of these should be listed here. You can also do something like:

packages = find_packages(exclude=['build', 'docs', 'templates'])

Those are the basics. The next few sections detail other arguments I ended up using.

## Include data files

• package_data: As mentioned earlier, if you want any non *.py files to be installed by pip and available to your package, specify them here. Messing around with this led me to the conclusion that all included data must in subfolders of the package folder (NOT the higher-level project folder). For example, if I had “data” and “my-package” as subfolders in “my-project-folder”, and then include “data” in package_data, then pip installs “data” and “my-package” both as separate folders in “site-packages”. Not really want I want. After much fiddling, my directory structure looks like this:
.
├── CHANGES.txt
├── LICENSE.txt
├── MANIFEST.in
├── README
├── docs (COLLAPSED)
├── setup.py
└── vfclust
├── Makefile
├── TextGridParser.py
├── __init__.py
├── data
│   ├── EOWL
│   │   ├── EOWL Version Notes.txt
│   │   ├── The English Open Word List.pdf
│   │   └── english_words.txt
│   ├── animals_lemmas.dat
│   ├── animals_names.dat
│   ├── animals_names_raw.dat
│   ├── animals_term_vector_dictionaries
│   │   ├── term_vectors_dict91.dat
│   │   └── term_vectors_dict91_cpickle.dat
│   ├── cmudict.0.7a.tree
│   ├── modified_cmudict.dat
│   └── t2pin.tmp
├── example
│   ├── EXAMPLE.TextGrid
│   ├── EXAMPLE.csv
│   ├── EXAMPLE_sem.TextGrid
│   └── EXAMPLE_sem.csv
├── t2p
│   ├── t2p.c
│   └── t2pin.tmp
└── vfclust.py

Here’s what I ended up with:

include_package_data=True,
# relative to the vfclust directory
package_data={
'vfclust':[
'Makefile'],
'data':
['data/animals_lemmas.dat',
'data/animals_names.dat',
'data/animals_names_raw.dat',
'data/cmudict.0.7a.tree',
'data/modified_cmudict.dat',
'data/animals_term_vector_dictionaries/term_vectors_dict91_cpickle.dat',
],
'data/EOWL':
['data/EOWL/english_words.txt',
'data/EOWL/EOWL Version Notes.txt',
'data/EOWL/The English Open Word List.pdf'
],
'example':
['example/EXAMPLE.csv',
'example/EXAMPLE.TextGrid',
'example/EXAMPLE_sem.csv',
'example/EXAMPLE_sem.TextGrid'],
't2p':
['t2p/t2p.c',
't2p/t2pin.tmp'
],
},

As far as I can tell, package_data is a dictionary where each key corresponds to a subfolder of the installed package, and each value is supposed to be a list of the files you want to go into that subfolder. In practice, however, I couldn’t get everything situated properly in the final installation until I made all my data files subfolders of the package folder. If you know a better and more flexible way to do this, please enlighten me (not being sarcastic here, really do e-mail me! I want to know!). With this setup, the following gets installed in site-packages when I do pip install vfclust:

vfclust
├── Makefile
├── TextGridParser.py
├── __init__.py
├── data
│   ├── EOWL
│   │   ├── EOWL Version Notes.txt
│   │   ├── The English Open Word List.pdf
│   │   └── english_words.txt
│   ├── animals_lemmas.dat
│   ├── animals_names.dat
│   ├── animals_names_raw.dat
│   ├── animals_term_vector_dictionaries
│   │   ├── term_vectors_dict91.dat
│   │   └── term_vectors_dict91_cpickle.dat
│   ├── cmudict.0.7a.tree
│   ├── modified_cmudict.dat
│   └── t2pin.tmp
├── example
│   ├── EXAMPLE.TextGrid
│   ├── EXAMPLE.csv
│   ├── EXAMPLE_sem.TextGrid
│   └── EXAMPLE_sem.csv
└── t2p
├── t2p
├── t2p.c
└── t2pin.tmp

which is what I wanted. data_files is another argument used to include data, but this puts your data files in /Library/Frameworks/Python.framework/Versions/2.7/my-subfolder, at least on my system. I don’t know why you’d want to put data there. At any rate I didn’t.

## Run your package as a script

If it makes sense for your package to run as a command-line script instead of (or as well as) an importable Python package, you can ask setup.py to add your script to the system path during installation. To do this, include an entry_points argument, similar to this:

entry_points={
'console_scripts': [
'vfclust = vfclust.vfclust:main',
],
}

If I understand correctly, the thing on the left of the equals sign is what you call from the command line,

#### Make a folder for documentation, and go there:

$mkdir docs$ cd docs

$sphinx-quickstart You’re already in docs/, so leave the first question as the default. Answer the next few questions by filling in your name, project name, etc. I left all the rest at their default values, EXCEPT autodoc: automatically insert docstrings from modules (y/n) [n]: y Selecting y here lets Sphinx pull information from your docstrings, as promised above, and make it into a nice little easy-to-read version in HTML or whatever. Sphinx made a Makefile in the docs/ directory that will generate your documentation for you. You can type things like make html or make latexpdf to generate documentation. If you try that now (as I did), your documentation will be empty. Another bummer. #### Tell Sphinx to generate your documentation by pulling from your docstrings. $ sphinx-apidoc -o <dest-directory> <source-directory>

You should already be in your “docs” directory. My “docs” directory is at the same level as my package directory with my nice docstrings, so I did:

$sphinx-apidoc -o . ../vfclust #### Finally, edit the new index.rst file to change from: Welcome to VFClust's documentation! =================================== Contents: .. toctree:: :maxdepth: 2 Indices and tables ================== * :ref:genindex * :ref:modindex * :ref:search to Welcome to VFClust's documentation! =================================== .. include:: ../README Contents: .. toctree:: :maxdepth: 2 Indices and tables ================== * :ref:genindex * :ref:modindex * :ref:search Your project name will be different, but the important part is the .. include:: ../README line, assuming your README is located in the root project directory (one up from “docs”). That’ll put the README as the landing page of your documentation. You can click on the “Index”, “Moduel Index” or “Search” links to see documentation for your functions/methods. #### Generate documentation. Finally! Type: $ make

to see the available options for output formatting. Create your HTML documentation using:

$make html This will generate a set of html files in “docs/_build/html/”, assuming you didn’t change the “_build” default during questioning. Open up “docs/_build/html/index.html” to see your nice new documentation. # PyPI Alright, the package has the correct directory structure, the documentation is formatted, and we’re ready to share it with the world so that anyone can install it using pip install mypackage. ## Make a test-build Navigate to the directory of your setup.py file. First, make sure everything is configured properly using $ python setup.py test

Hopefully you get no errors.

Then, create the distribution you’re going to send to PyPI using:

## Create a test upload

This has two steps: first, package the file using sdist. Then, upload that file to the server.

$python setup.py sdist #creates .tar.gz, puts it in dist/ folder$ twine upload -r test dist/PACKAGENAME-VERSION.tar.gz

The -r test uploads it to the server named in the [test] part of the .pypirc file.

## Create a test install

Python has a neat way of making sandboxed environments for you to work in, called virtual environments. I’d suggest creating a new one to test your installation, thereby guaranteeing a fresh install.

Navigate to some folder on your hard drive and type

$virtualenv MY_VIRTUALENV where you obviously use some other name than MY_VIRTUALENV. Then fire it up by typing $ cd MY_VIRTUALENV
$source bin/activate You’re now in a little sandboxed Python installation. Install your package from the PyPI test servers: $ pip install -i https://testpypi.python.org/pypi PACKAGENAME

You may need to use sudo, I can’t seem to figure out why. Anyway, your package should now be installed! Test it out:

>> import PACKAGENAME

If you included a way to run it as a script in setup.py, that should work now too.

## Upload for real

If everything seems to be working (good for you if it is, it took me 2 days to get to this point), you can upload it to the PyPI servers:

$twine upload dist/PACKAGENAME-VERSION.tar.gz And voila, you’re live! Now you can go to https://pypi.python.org.pypi and search for your package. You should also be able to install it using $ pip install PACKAGENAME`

See, nothing to it! (Haha…ha…) I hope this was helpful for someone!