Git excels at tracking text files, but struggles with large binary files. Every clone downloads the entire history, and binary files don't compress or diff efficiently. Git Large File Storage (LFS) solves this by replacing large files with lightweight pointers while storing actual content separately. This guide covers setup, workflows, migration, and alternatives for managing large files in Git.
Why Git Struggles with Large Files
┌─────────────────────────────────────────────────────────────┐
│ GIT WITHOUT LFS │
├─────────────────────────────────────────────────────────────┤
│ │
│ Repository: 50MB code + 2GB images │
│ │
│ Clone operation: │
│ ├── Download all commits │
│ ├── Download ALL versions of ALL images │
│ └── Total: 8GB (historical versions) │
│ │
│ Time: 20+ minutes │
│ │
├─────────────────────────────────────────────────────────────┤
│ GIT WITH LFS │
├─────────────────────────────────────────────────────────────┤
│ │
│ Repository: 50MB code + pointer files │
│ │
│ Clone operation: │
│ ├── Download all commits (code + pointers) │
│ └── Download only CURRENT version of images │
│ │
│ Total: 250MB │
│ Time: 2 minutes │
│ │
└─────────────────────────────────────────────────────────────┘
When to Use Git LFS
| File Type | Typical Size | LFS Recommended |
|---|---|---|
| Source code | < 100KB | No |
| Config files | < 1MB | No |
| Small images | < 500KB | Optional |
| PSD/AI files | 10-500MB | Yes |
| Video files | 100MB+ | Yes |
| ML models | 100MB+ | Yes |
| Game assets | 10MB+ | Yes |
| Compiled binaries | 10MB+ | Yes |
| Datasets | 10MB+ | Yes |
Rule of thumb: Track files with LFS if they're binary AND (larger than 1MB OR change frequently).
Setting Up Git LFS
Installation
# macOS
brew install git-lfs
# Ubuntu/Debian
sudo apt install git-lfs
# Windows
# Download from https://git-lfs.github.com/
# Or use: choco install git-lfs
# Initialize Git LFS for your user
git lfs install
Repository Setup
# Navigate to your repository
cd my-repo
# Track file types with LFS
git lfs track "*.psd"
git lfs track "*.mp4"
git lfs track "*.zip"
git lfs track "assets/large/**"
# Check tracked patterns
git lfs track
# This creates/updates .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text
# *.mp4 filter=lfs diff=lfs merge=lfs -text
# ...
# Commit the tracking configuration
git add .gitattributes
git commit -m "Configure Git LFS tracking"
How LFS Works
┌─────────────────────────────────────────────────────────────┐
│ GIT LFS FLOW │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ git add │──► LFS filter detects tracked file │
│ │ logo.psd │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ 1. Calculate SHA-256 of file content │ │
│ │ 2. Store file in .git/lfs/objects/ │ │
│ │ 3. Create pointer file for staging │ │
│ └─────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Pointer file content: │ │
│ │ version https://git-lfs.github.com/... │ │
│ │ oid sha256:abc123... │ │
│ │ size 15728640 │ │
│ └─────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌────────────────────┐ │
│ │ git commit │──► │ Commit contains │ │
│ │ │ │ only pointer │ │
│ └──────┬──────┘ └────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌────────────────────┐ │
│ │ git push │──► │ Pointer to GitHub │ │
│ │ │ │ File to LFS server │ │
│ └─────────────┘ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Common Workflows
Adding New Large Files
# Ensure file type is tracked
git lfs track "*.psd"
# Add and commit normally
git add design.psd
git commit -m "Add design file"
git push
Cloning Repositories with LFS
# Standard clone (downloads LFS files automatically)
git clone https://github.com/org/repo.git
# Clone without LFS files (faster for large repos)
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
# Download specific files later
git lfs pull --include="assets/needed/*"
# Or download all LFS files
git lfs pull
Checking LFS Status
# List tracked patterns
git lfs track
# List all LFS files in repository
git lfs ls-files
# Show LFS file information
git lfs ls-files -l
# Check LFS status
git lfs status
# Verify LFS files
git lfs fsck
Fetching and Pulling
# Fetch LFS objects (download without checkout)
git lfs fetch
# Fetch specific paths only
git lfs fetch --include="assets/textures/*"
# Pull (fetch + checkout)
git lfs pull
# Fetch from specific remote
git lfs fetch origin
# Fetch all refs (branches, tags)
git lfs fetch --all
Migrating Existing Files to LFS
Track New Files Going Forward
# Track pattern before adding files
git lfs track "*.psd"
git add .gitattributes
git commit -m "Track PSD files with LFS"
# Now add the files
git add designs/*.psd
git commit -m "Add design files"
Migrate Files Already in History
Warning: This rewrites Git history. Coordinate with your team.
# See what would be migrated
git lfs migrate info --include="*.psd"
# Migrate files in history
git lfs migrate import --include="*.psd" --everything
# For specific branches only
git lfs migrate import --include="*.psd" --include-ref=main --include-ref=develop
# Force push after migration
git push --force-with-lease
Cleaning Up After Migration
# Remove old objects from local repo
git reflog expire --expire=now --all
git gc --prune=now --aggressive
# Team members must re-clone
# Old clones still have bloated history
CI/CD Integration
GitHub Actions
name: Build
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout with LFS
uses: actions/checkout@v4
with:
lfs: true
- name: Build
run: npm run build
Optimized with caching:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Cache LFS objects
uses: actions/cache@v4
with:
path: .git/lfs
key: lfs-${{ hashFiles('.lfs-assets-id') }}
restore-keys: lfs-
- name: Pull LFS files
run: git lfs pull
- name: Build
run: npm run build
Selective LFS pull (faster):
steps:
- uses: actions/checkout@v4
- name: Pull only needed LFS files
run: |
git lfs install
git lfs pull --include="src/assets/images/*" --exclude="*.psd"
GitLab CI
build:
variables:
GIT_LFS_SKIP_SMUDGE: "1" # Skip automatic LFS
script:
- git lfs pull --include="needed-files/*"
- npm run build
Storage and Hosting
Provider Comparison
| Provider | Free Storage | Free Bandwidth | Paid Plans |
|---|---|---|---|
| GitHub | 1 GB | 1 GB/month | $5/50GB pack |
| GitLab.com | 5 GB | 10 GB/month | $60/year more |
| Bitbucket | 1 GB | 1 GB/month | Varies by plan |
| Self-hosted | Unlimited | Unlimited | Storage costs |
Self-Hosted LFS Server
Using git-lfs-s3:
# Install
go install github.com/git-lfs/lfs-test-server@latest
# Configure S3 backend
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=xxx
export LFS_CONTENTPATH=s3://my-bucket/lfs
export LFS_ADMINUSER=admin
export LFS_ADMINPASS=secret
# Run server
lfs-test-server
Configure repository:
# Point repo to custom LFS server
git config lfs.url https://my-lfs-server.com/org/repo
Using .lfsconfig (committed to repo):
[lfs]
url = https://my-lfs-server.com/org/repo
Troubleshooting
Common Issues
Problem: "This repository is over its data quota"
# Check storage usage
git lfs info
# Prune old versions locally
git lfs prune
# Remove files from LFS tracking (keeps in regular Git)
git lfs untrack "*.old"
Problem: LFS files showing as pointer text
# Check if LFS is installed
git lfs install
# Re-checkout LFS files
git lfs checkout
# Or pull all LFS content
git lfs pull
Problem: Slow clone/pull
# Clone without LFS, then selective pull
GIT_LFS_SKIP_SMUDGE=1 git clone <url>
cd repo
git lfs pull --include="needed/**"
# Parallel downloads
git config lfs.concurrenttransfers 8
git lfs pull
Problem: File too large for GitHub
# GitHub limit is 2GB per file
# Split large files or use different storage
# Check file sizes
git lfs ls-files -s
Debugging
# Verbose output
GIT_TRACE=1 GIT_TRANSFER_TRACE=1 git lfs pull
# Check LFS configuration
git lfs env
# Verify file integrity
git lfs fsck
Alternatives to Git LFS
Comparison
┌─────────────────────────────────────────────────────────────┐
│ LARGE FILE SOLUTIONS │
├─────────────────────────────────────────────────────────────┤
│ │
│ Git LFS │
│ ├── Best for: General binary file versioning │
│ ├── Pros: Simple, well-supported, integrated │
│ └── Cons: Bandwidth costs, requires LFS support │
│ │
│ DVC (Data Version Control) │
│ ├── Best for: ML datasets, pipelines, experiments │
│ ├── Pros: ML-focused features, remote storage options │
│ └── Cons: Separate tool, learning curve │
│ │
│ git-annex │
│ ├── Best for: Complex storage backends, partial sync │
│ ├── Pros: Flexible, works with any storage │
│ └── Cons: Complex setup, different mental model │
│ │
│ Partial Clone + Sparse Checkout │
│ ├── Best for: Huge monorepos with no LFS support │
│ ├── Pros: Native Git, no extra tools │
│ └── Cons: Limited to recent Git versions │
│ │
│ External Storage (S3 + references) │
│ ├── Best for: Truly massive files (10GB+) │
│ ├── Pros: No size limits, cheap storage │
│ └── Cons: Manual management, no versioning │
│ │
└─────────────────────────────────────────────────────────────┘
Git Partial Clone (Native Alternative)
# Clone without blob content
git clone --filter=blob:none https://github.com/org/repo.git
cd repo
# Files downloaded on demand when accessed
cat large-file.bin # Downloaded now
# Sparse checkout for large repos
git sparse-checkout init
git sparse-checkout set src/ docs/
DVC for ML Projects
# Install DVC
pip install dvc
# Initialize in repo
dvc init
# Track large file
dvc add data/training-set.parquet
# Configure remote storage
dvc remote add -d myremote s3://my-bucket/dvc
# Push data
dvc push
# Pull data on another machine
dvc pull
Best Practices
.gitattributes Patterns
# Images
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.ai filter=lfs diff=lfs merge=lfs -text
# Videos
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
*.avi filter=lfs diff=lfs merge=lfs -text
# Audio
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text
# Archives
*.zip filter=lfs diff=lfs merge=lfs -text
*.tar.gz filter=lfs diff=lfs merge=lfs -text
# Binaries
*.exe filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text
# Data
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
# Game/3D assets
*.fbx filter=lfs diff=lfs merge=lfs -text
*.blend filter=lfs diff=lfs merge=lfs -text
*.unitypackage filter=lfs diff=lfs merge=lfs -text
Repository Organization
project/
├── src/ # Regular Git (code)
├── docs/ # Regular Git (documentation)
├── assets/ # LFS tracked
│ ├── images/
│ ├── videos/
│ └── designs/
├── data/ # LFS tracked (or DVC for ML)
│ ├── raw/
│ └── processed/
├── .gitattributes # LFS tracking patterns
└── .lfsconfig # LFS server configuration
Related Resources
- Git & GitHub Complete Guide - Hub for all Git guides
- Monorepo Management - Managing large repositories
- GitHub Actions CI/CD - CI integration with LFS
- GitHub Repository Security - Access control for sensitive files