Creating a Repository

Last updated on 2024-12-04 | Edit this page

Estimated time: 10 minutes

Overview

Questions

  • Where does Git store information?

Objectives

  • Create a local Git repository.
  • Describe the purpose of the .git directory.

Once Git is configured, we can start using it.

To demonstrate the use of git, we will build a data dictionary for the data we obtained from UKHSA.

First, let’s create a new directory in the Desktop folder for our work and then change the current working directory to the newly created one:

BASH

$ cd ~/Desktop
$ mkdir data-dictionary
$ cd data-dictionary

Then we tell Git to make data-dictionary a repository – a place where Git can store versions of our files:

BASH

$ git init

It is important to note that git init will create a repository that can include subdirectories and their files – there is no need to create separate repositories nested within the data-dictionary repository, whether subdirectories are present from the beginning or added later. Also, note that the creation of the data-dictionary directory and its initialization as a repository are completely separate processes.

If we use ls to show the directory’s contents, it appears that nothing has changed:

BASH

$ ls

But if we add the -a flag to show everything, we can see that Git has created a hidden directory within data-dictionary called .git:

BASH

$ ls -a

OUTPUT

.	..	.git

Git uses this special subdirectory to store all the information about the project, including the tracked files and sub-directories located within the project’s directory. If we ever delete the .git subdirectory, we will lose the project’s history.

Next, we will change the default branch to be called main. This might be the default branch depending on your settings and version of git. See the setup episode for more information on this change.

BASH

$ git checkout -b main

OUTPUT

Switched to a new branch 'main'

We can now start using one of the most important git commands, which is particularly helpful to beginners. git status tells us the status of our project, and better, a list of changes in the project and options on what to do with those changes. We can use it as often as we want, whenever we want to understand what is going on.

BASH

$ git status

OUTPUT

On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

If you are using a different version of git, the exact wording of the output might be slightly different.

Places to Create Git Repositories

Along with tracking information about the data dictionary (the project we have already created), we would also like to track information about related datasets. Despite any concerns, we create a related-data project inside the data-dictionary project with the following sequence of commands:

BASH

$ cd ~/Desktop   # return to Desktop directory
$ cd data-dictionary     # go into data-dictionary directory, which is already a Git repository
$ ls -a          # ensure the .git subdirectory is still present in the data-dictionary directory
$ mkdir related-data    # make a subdirectory data-dictionary/related-data
$ cd related-data       # go into related-data subdirectory
$ git init       # make the related-data subdirectory a Git repository
$ ls -a          # ensure the .git subdirectory is present indicating we have created a new Git repository

Is the git init command, run inside the related-data subdirectory, required for tracking files stored in the related-data subdirectory?

No. We do not need to make the related-data subdirectory a Git repository because the data-dictionary repository can track any files, sub-directories, and subdirectory files under the data-dictionary directory. Thus, in order to track all information about related data, we only needed to add the related-data subdirectory to the data-dictionary directory.

Additionally, Git repositories can interfere with each other if they are “nested”: the outer repository will try to version-control the inner repository. Therefore, it’s best to create each new Git repository in a separate directory. To be sure that there is no conflicting repository in the directory, check the output of git status. If it looks like the following, you are good to go to create a new repository as shown above:

BASH

$ git status

OUTPUT

fatal: Not a git repository (or any of the parent directories): .git

Correcting git init Mistakes

Now that we know that a nested repository is redundant and may cause confusion down the road, we would like to go back to a single git repository. How can we undo our last git init in the related-data subdirectory?

Background

Removing files from a Git repository needs to be done with caution. But we have not learned yet how to tell Git to track a particular file; we will learn this in the next episode. Files that are not tracked by Git can easily be removed like any other “ordinary” files with

BASH

$ rm filename

Similarly a directory can be removed using rm -r dirname. If the files or folder being removed in this fashion are tracked by Git, then their removal becomes another change that we will need to track, as we will see in the next episode.

Solution

Git keeps all of its files in the .git directory. To recover from this little mistake, we can remove the .git folder in the related-data subdirectory by running the following command from inside the data-dictionary directory:

BASH

$ rm -rf related-data/.git

But be careful! Running this command in the wrong directory will remove the entire Git history of a project you might want to keep. In general, deleting files and directories using rm from the command line cannot be reversed. Therefore, always check your current directory using the command pwd.

Key Points

  • git init initializes a repository.
  • Git stores all of its repository data in the .git directory.