It happens to all of us, we accidentally commit something into a git repo that we didn’t mean to, even worse we also pushed it to the main repository on GitLab or GitHub.
TL;DR — too long; didn’t read
Replace the current “master” branch with another “backup” branch.
Local:
git branch -f master backup
Remote:
via: StackOverflowgit push origin +backup:master
Why does it matter if we commit undesired content into a repository?
Good question. We use git to keep track of our source code, facilitate collaboration and preserve how the project evolved to the current state. Having the history at our fingertips is helpful when developing new features or trying to understand a specific section in the code – granted having meaningful commit statements is important.
However there are a few things we do not want to track in git. This can entail, but is not limited to:
- temporary files
- the built software (binaries, executables, …)
- personal information
- credentials
- proprietary information
Sensitive information that could accidentally or unknowingly be committed to the repository, like personal information, credentials and proprietary information may be harmful to the business should the information ever be leaked. This applies to public and private repositories – please note that even private repositories are affected, since they sometimes become public or get accessed by various people.
Can’t I just undo the change or roll back to a previous commit?
If you already pushed it to the master branch of your project, it will likely be too late. Even reverting a commit or committing an updated version where the undesired information is removed, will still leave traces in the git history.
Strategies on how to remove unwanted information
Here are a few strategies, depending how big of a correction is needed:
L1 – committing unwanted information to a local branch
Committing to a local branch is the easiest step to resolve, you can either amend the lastest commit and remove the undesired changes, or branch off from the previous commit and then delete the erroneous branch.
L2 – pushing unwanted commits to a shared repository, feature branch
As soon as commits are pushed to a shared (origin/remote) repository, things get tricky. If it is a feature branch or any other non-main (aka master) branch follow these steps:
- Notify anyone that might be collaborating on this branch, that they should refrain from pushing/pulling
- Follow the L1 steps and create a new branch that contains only the commits before the erroneous commit.
- Push the new branch and delete the erroneous branch locally and on the shared repository
- Optional: rename the new branch to have the same name as the original branch
- Notify colleagues to resume work as usual
L3 – pushing unwanted commits to a shared repository, main branch
This is one gets very nerve wracking to resolve. Main branches could be any branches that are either production or development-sensitive. Often their names contain words like master, development or release, but this depends on the practices of your team.
- Notify your the project lead and your coworkers that are actively working on the project to stop pulling and pushing to the main branch
- If you have a Continuous Integration and Continuous Deployment (CI/CD) pipeline running, check with your colleagues if it has to be paused or active jobs have to be aborted
- Follow the L1 steps and create a new branch that contains only the commits before the erroneous commit.
- Replace the main “master” branch with the “fixed “branch using:
git push origin +fixed:master
Depending on what your remote repository is named you might need to replace origin. However remote and origin are frequent default values - Notify the project lead and coworkers to resume work and reactivate any suspended CI/CD pipelines
L4 – accidentally rewriting the history of a main branch
What if a branch gets merged into a main branch that contains a lot of undesired commits and changes? For example an experimental branch. Merging a lot of changes into a main branch may rewrite a lot of the git history and make going back to a previous commit and re-applying changes too tedious and practically impossible to fix.
Fear not! There might still be ways to salvage this. Check the following places:
- Do you have a local copy or a very recent branch/fork of the main branch that could be used – even if it needed a few edits
- Most backup softwares usually do hourly backups and might have an older version of the branch in question that could be restored
- Coworkers might haven’t pulled the latest main branch yet or a branch that is only minimally diverged from the original main branch. Get a copy from them and re-apply – if needed – the latest commits
- If nothing works you could explore using the BFG Repo-Cleaner to scrub the project from sensitive information. However, using the tool is very tedious and the history might still not be completely clean. Furthermore, it will result in creating a new repository and having everyone switch over to that one
Hopefully a backup can be found and the main branch can be restored by following the steps outlined in L3.
How can we prevent accidental merges from happening in the future?
It is good to know that we can fix these mistakes, but the approaches covered earlier are tedious and nerve wracking. There must be better ways to prevent these mishaps from occurring in the first place?
Fortunately there are ways to reduce the risk of accidentally committing into the most sensitive branches. Let me suggest the following actions:
- Do not push to main branches directly —
This is the most obvious takeaway. Never push directly to a main branch. We’ve all done it and we’ve told others not to do it. It happens to me too and I had to fix main branches in the past - Use a branching model like Git-Flow or GitLab Flow. These approaches are detailed and even using a customized variation that works for you and your team is possible and worthwhile
- Protect important branches – see GitLab protected branches and GitHub protected branches
- Have merge-request reviews (or pull requests for GitHub)
- Use secret detection – tools like GitLab support secret detection and security scanning
Conclusion
Protecting the main branches like master and establishing a workflow entailing review processes in order to get commits into these important branches helps prevent accidental commits including secrets, sensitive information and errors. Even with this model it is still possible, that mistakes can happen and we learned ways on how we can remedy undesired commits, how to replace entire branches and how sensitive information can be removed.
Please let all of us know in the comments below what other approaches there are, which ones worked for you and share your scary git stories with us.
References
- Stackoverflow: How to replace a branch
- BFG Repo-Cleaner
- How to Write a useful Commit Message: A Git Guide
- GitLab: How (and why!) to keep your Git commit history clean
- Git branching: Git-Flow
- Git branching: GitLab-Flow
Also published on Medium.