First Steps with GitPython

Post updated by Matt Makai on November 30, 2017. Originally posted on November 29, 2017.

GitPython is a Python code library for programmatically reading from and writing to Git source control repositories.

Let's learn how to use GitPython by quickly installing it and reading from a local cloned Git repository.

Our Tools

This tutorial should work with either Python 2.7 or 3, but Python 3, especially 3.6+, is strongly recommended for all new applications. I used Python 3.6.3 to write this post. In addition to Python, throughout this tutorial we will also use the following application dependencies:

Take a look at this guide for setting up Python 3 and Flask on Ubuntu 16.04 LTS if you need specific instructions to get a base Python development environment set up.

All code in this blog post is available open source under the MIT license on GitHub under the first-steps-gitpython directory of the blog-code-examples repository. Use and abuse the source code as you like for your own applications.

Install GitPython

Start by creating a new virtual environment for your project. My virtualenv is named testgit but you can name yours whatever matches the project you are creating.

python3 -m venv gitpy

Activate the newly-created virtualenv.

source gitpy/bin/activate

The virtualenv's name will be prepended to the command prompt after activation.

Create and activate the Python virtual environment.

Now that the virutalenv is activated we can use the pip command to install GitPython.

pip install gitpython==2.1.7

Run the pip command and after everything is installed you should see output similar to the following "Successfully installed" message.

(gitpy) $ pip install gitpython==2.1.7
Collecting gitpython==2.1.7
  Downloading GitPython-2.1.7-py2.py3-none-any.whl (446kB)
    100% |████████████████████████████████| 450kB 651kB/s 
Collecting gitdb2>=2.0.0 (from gitpython==2.1.7)
  Downloading gitdb2-2.0.3-py2.py3-none-any.whl (63kB)
    100% |████████████████████████████████| 71kB 947kB/s 
Collecting smmap2>=2.0.0 (from gitdb2>=2.0.0->gitpython==2.1.7)
  Downloading smmap2-2.0.3-py2.py3-none-any.whl
Installing collected packages: smmap2, gitdb2, gitpython
Successfully installed gitdb2-2.0.3 gitpython-2.1.7 smmap2-2.0.3

Next we can start programmatically interacting with Git repositories in our Python applications with the GitPython installed.

Clone Repository

GitPython can work with remote repositories but for simplicity in this tutorial we'll use a cloned repository on our local system.

Clone a repository you want to work with to your local system. If you don't have a specific one in mind use the open source Full Stack Python Git repository that is hosted on GitHub.

git clone [email protected]:mattmakai/fullstackpython.com fsp

Take note of the location where you cloned the repository because we need the path to tell GitPython what repository to handle. Change into the directory for the new Git repository with cd then run the pwd (present working directory) command to get the full path.

cd fsp
pwd

You will see some output like /Users/matt/devel/py/fsp. This path is your absolute path to the base of the Git repository.

Use the export command to set an environment variable for the absolute path to the Git repository.

export GIT_REPO_PATH='/Users/matt/devel/py/fsp' # make sure this your own path

Our Git repository and path environment variable are all set so let's write the Python code that uses GitPython.

Read Repository and Commit Data

Create a new Python file named read_repo.py and open it so we can start to code up a simple script.

Start with a couple of imports and a constant:

import os
from git import Repo


COMMITS_TO_PRINT = 5

The os module makes it easy to read environment variables, such as our GIT_REPO_PATH variable we set earlier. from git import Repo gives our application access to the GitPython library when we create the Repo object. COMMITS_TO_PRINT is a constant that limits the number of lines of output based on the amount of commits we want our script to print information on. Full Stack Python has over 2,250 commits so there'd be a whole lot of output if we printed every commit.

Next within our read_repo.py file create a function to print individual commit information:

def print_commit(commit):
    print('----')
    print(str(commit.hexsha))
    print("\"{}\" by {} ({})".format(commit.summary,
                                     commit.author.name,
                                     commit.author.email))
    print(str(commit.authored_datetime))
    print(str("count: {} and size: {}".format(commit.count(),
                                              commit.size)))

The print_commit function takes in a GitPython commit object and prints the 40-character SHA-1 hash for the commit followed by:

  1. the commit summary
  2. author name
  3. author email
  4. commit date and time
  5. count and update size

Below the print_commit function, create another function named print_repository to print details of the Repo object:

def print_repository(repo):
    print('Repo description: {}'.format(repo.description))
    print('Repo active branch is {}'.format(repo.active_branch))
    for remote in repo.remotes:
        print('Remote named "{}" with URL "{}"'.format(remote, remote.url))
    print('Last commit for repo is {}.'.format(str(repo.head.commit.hexsha)))

print_repository is similar to print_commit but instead prints the repository description, active branch, all remote Git URLs configured for this repository and the latest commit.

Finally, we need a "main" function for when we invoke the script from the terminal using the python command. Round out our

if __name__ == "__main__":
    repo_path = os.getenv('GIT_REPO_PATH')
    # Repo object used to programmatically interact with Git repositories
    repo = Repo(repo_path)
    # check that the repository loaded correctly
    if not repo.bare:
        print('Repo at {} successfully loaded.'.format(repo_path))
        print_repository(repo)
        # create list of commits then print some of them to stdout
        commits = list(repo.iter_commits('master'))[:COMMITS_TO_PRINT]
        for commit in commits:
            print_commit(commit)
            pass
    else:
        print('Could not load repository at {} :('.format(repo_path))

The main function handles grabbing the GIT_REPO_PATH environment variable and creates a Repo object based on the path if possible.

If the repository is not empty, which indicates a failure to find the repository, then the print_repository and print_commit functions are called to show the repository data.

If you want to copy and paste all of the code found above at once, take a look at the read_repo.py file on GitHub.

Time to test our GitPython-using script. Invoke the read_repo.py file using the following command.

(gitpy) $ python read_repo.py

If the virtualenv is activated and the GIT_REPO_PATH environment variable is set properly, we should see output similar to the following.

Repo at ~/devel/py/fsp/ successfully loaded.
Repo description: Unnamed repository; edit this file 'description' to name the repository.
Repo active branch is master
Remote named "origin" with URL "[email protected]:mattmakai/fullstackpython.com"
Last commit for repo is 1fa2de70aeb2ea64315f69991ccada51afac1ced.
----
1fa2de70aeb2ea64315f69991ccada51afac1ced
"update latest blog post with code" by Matt Makai ([email protected])
2017-11-30 17:15:14-05:00
count: 2256 and size: 254
----
1b026e4268d3ee1bd55f1979e9c397ca99bb5864
"new blog post, just needs completed code section" by Matt Makai ([email protected])
2017-11-30 09:00:06-05:00
count: 2255 and size: 269
----
2136d845de6f332505c3df38efcfd4c7d84a45e2
"change previous email newsletters list style" by Matt Makai ([email protected])
2017-11-20 11:44:13-05:00
count: 2254 and size: 265
----
9df077a50027d9314edba7e4cbff6bb05c433257
"ensure picture sizes are reasonable" by Matt Makai ([email protected])
2017-11-14 13:29:39-05:00
count: 2253 and size: 256
----
3f6458c80b15f58a6e6c85a46d06ade72242c572
"add databases logos to relational databases pagem" by Matt Makai ([email protected])
2017-11-14 13:28:02-05:00
count: 2252 and size: 270

The specific commits you see will vary based on the last 5 commits I've pushed to the GitHub repository, but if you see something like the output above that is a good sign everything worked as expected.

What's next?

We just cloned a Git repository and used the GitPython library to read a slew of data about the repository and all of its commits.

GitPython can do more than just read data though - it can also create and write to Git repositories! Take a look at the modifying references documentation page in the official GitPython tutorial or check back here in the future when I get a chance to write up a more advanced GitPython walkthrough.

Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.

See something wrong in this blog post? Fork this page's source on GitHub and submit a pull request.


Matt Makai 2012-2022