Learning the basics of the distributed version control system Git
Using a version control system is an absolute requirement in programming and research. This is the tool that makes it just about impossible to lose one's work. In this recipe, we will cover the basics of Git.
Getting ready
Notable distributed version control systems include Git, Mercurial, and Bazaar, among others. In this chapter, we will use the popular Git system. You can download the Git program and Git GUI clients from http://git-scm.com.
An online provider allows you to host your code in the cloud. You can use it as a backup of your work and as a platform to share your code with your colleagues. These services include GitHub (https://github.com), GitLab (https://gitlab.com), and Bitbucket (https://bitbucket.org). All of these websites offer free and paid plans with unlimited public and/or private repositories.
GitHub offers desktop applications for Windows and macOS at https://desktop.github.com/.
This book's code is stored on GitHub. Most Python libraries are also developed on GitHub.
How to do it...
- The very first thing to do when starting a new project or computing experiment is create a new folder locally:
$ mkdir myproject $ cd myproject
- We initialize a Git repository:
$ git init Initialized empty Git repository in ~/git/cookbook-2nd/chapter02/myproject/.git/ $ pwd ~/git/cookbook-2nd/chapter02/myproject $ ls -a . .. .git
Git created a
.git
subdirectory that contains all the parameters and history of the repository. - Let's set our name and email address globally:
$ git config --global user.name "My Name" $ git config --global user.email "me@home.com"
- We create a new file, and we tell Git to track it:
$ echo "Hello world" > file.txt $ git add file.txt
- Let's create our first commit:
$ git commit -m "Initial commit" [master (root-commit) 02971c0] Initial commit 1 file changed, 1 insertion(+) create mode 100644 file.txt
- We can check the list of commits:
$ git log commit 02971c0e1176cd26ec33900e359b192a27df2821 Author: My Name <me@home.com> Date: Tue Dec 12 10:50:37 2017 +0100 Initial commit
- Next, we edit the file by appending an exclamation mark:
$ echo "Hello world!" > file.txt $ cat file.txt Hello world!
- We can see the differences between the current state of our repository, and the state in the last commit:
$ git diff diff --git a/file.txt b/file.txt index 802992c..cd08755 100644 --- a/file.txt +++ b/file.txt @@ -1 +1 @@ -Hello world +Hello world!
The output of
git diff
shows that the contents offile.txt
were changed fromHello world
toHello world!
. Git compares the states of all tracked files and computes the differences between the files. - We can also get a summary of the changes as follows:
$ git status On branch master Changes not staged for commit: (use "git add <file>..." to update what will be committed) modified: file.txt no changes added to commit (use "git add") $ git diff --stat file.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The
git status
command gives a summary of all changes since the last commit. Thegit diff --stat
command shows, for each modified text file, the number of changed lines. - Finally, we commit our change with a shortcut that automatically adds all changes in the tracked files (
-a
option):$ git commit -am "Add exclamation mark to file.txt" [master 045df6a] Add exclamation mark to file.txt 1 file changed, 1 insertion(+), 1 deletion(-) $ git log commit 045df6a6f8a62b19f45025d15168d6d7382a8429 Author: My Name <me@home.com> Date: Tue Dec 12 10:59:39 2017 +0100 Add exclamation mark to file.txt commit 02971c0e1176cd26ec33900e359b192a27df2821 Author: My Name <me@home.com> Date: Tue Dec 12 10:50:37 2017 +0100 Initial comm it
How it works...
When you start a new project or a new computing experiment, create a new folder on your computer. You will eventually add code, text files, datasets, and other resources in this folder. The distributed version control system keeps track of the changes you make to your files as your project evolves. It is more than a simple backup, as every change you make on any file can be saved along with the corresponding timestamp. You can even revert to a previous state at any time; never be afraid of breaking your code anymore!
Tip
Git works best with text files. It can handle binary files but with limitations. It is better to use a separate system such as Git Large File Storage, or Git LFS (see https://git-lfs.github.com/).
Specifically, you can take a snapshot of your project at any time by doing a commit. The snapshot includes all staged (or tracked) files. You are in total control of which files and changes will be tracked. With Git, you specify a file as staged for your next commit with git add
, before committing your changes with git commit
. The git commit -a
command allows you to commit all changes in the files that are already being tracked.
When committing, you should provide a clear and short message describing the changes you made. This makes the repository's history considerably more informative than just writing work in progress. If the commit message is long, write a short title (less than 50 characters), insert two line breaks, and write a longer description.
Tip
How often should you commit?
The answer is very often. Git only takes responsibility for your work when you commit changes. What happens between two commits may be lost, so it's better to commit very regularly. Besides, commits are quick and cheap as they are local; that is, they do not involve any remote communication with an external server.
Git is a distributed version control system; your local repository does not need to synchronize with an external server. However, you should synchronize if you need to work on several computers, or if you prefer to have a remote backup. Synchronization with a remote repository can be done with git push
(send your local commits on the remote server), git fetch
(download remote branches and objects), and git pull
(synchronize the remote changes on your local repository), after you've set up remotes.
There's more...
We can also create a new repository on an online Git provider such as GitHub:
On the main web page of the newly created project, click on the Clone or download button to get the repository URL and type in a Terminal:
$ git clone https://github.com/mylogin/myproject.git
If the local repository already exists, do not tick the Initialize this repository with a README box on the GitHub page, and add the remote with git remote add origin https://github.com/yourlogin/myproject.git
. See https://help.github.com/articles/adding-a-remote/ for more details.
The simplistic workflow shown in this recipe is linear. In practice, though, workflows with Git are typically nonlinear; this is the concept of branching. We will describe this idea in the next recipe, A typical workflow with Git branching.
Here are some references on Git:
- Hands-on tutorial, available at https://try.github.io
- Git, a simple guide by Roger Dudler, available at http://rogerdudler.github.io/git-guide/
- Git Immersion, a guided tour, at http://gitimmersion.com
- Atlassian Git tutorial, available at http://www.atlassian.com/git
- Online Git course, available at http://www.codeschool.com/courses/try-git
- Git tutorial by Lars Vogel, available at http://www.vogella.com/tutorials/Git/article.html
- GitHub and Git tutorial, available at http://git-lectures.github.io
- Intro to Git for scientists, available at http://karthik.github.io/git_intro/
- GitHub help, available at https://help.github.com
See also
- The A typical workflow with Git branching recipe