Home/Blog/Git LFS: Managing Large Files in Git Repositories
Software Engineering

Git LFS: Managing Large Files in Git Repositories

Learn how to use Git Large File Storage (LFS) to manage large binary files, images, videos, and datasets in your Git repositories without slowing down operations.

By Inventive HQ Team
Git LFS: Managing Large Files in Git Repositories

Git excels at tracking text files, but struggles with large binary files. Every clone downloads the entire history, and binary files don't compress or diff efficiently. Git Large File Storage (LFS) solves this by replacing large files with lightweight pointers while storing actual content separately. This guide covers setup, workflows, migration, and alternatives for managing large files in Git.

Why Git Struggles with Large Files

┌─────────────────────────────────────────────────────────────┐
│              GIT WITHOUT LFS                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Repository: 50MB code + 2GB images                        │
│                                                              │
│   Clone operation:                                           │
│   ├── Download all commits                                  │
│   ├── Download ALL versions of ALL images                   │
│   └── Total: 8GB (historical versions)                      │
│                                                              │
│   Time: 20+ minutes                                          │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│              GIT WITH LFS                                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Repository: 50MB code + pointer files                     │
│                                                              │
│   Clone operation:                                           │
│   ├── Download all commits (code + pointers)                │
│   └── Download only CURRENT version of images               │
│                                                              │
│   Total: 250MB                                               │
│   Time: 2 minutes                                            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

When to Use Git LFS

File TypeTypical SizeLFS Recommended
Source code< 100KBNo
Config files< 1MBNo
Small images< 500KBOptional
PSD/AI files10-500MBYes
Video files100MB+Yes
ML models100MB+Yes
Game assets10MB+Yes
Compiled binaries10MB+Yes
Datasets10MB+Yes

Rule of thumb: Track files with LFS if they're binary AND (larger than 1MB OR change frequently).

Setting Up Git LFS

Installation

# macOS
brew install git-lfs

# Ubuntu/Debian
sudo apt install git-lfs

# Windows
# Download from https://git-lfs.github.com/
# Or use: choco install git-lfs

# Initialize Git LFS for your user
git lfs install

Repository Setup

# Navigate to your repository
cd my-repo

# Track file types with LFS
git lfs track "*.psd"
git lfs track "*.mp4"
git lfs track "*.zip"
git lfs track "assets/large/**"

# Check tracked patterns
git lfs track

# This creates/updates .gitattributes
cat .gitattributes
# *.psd filter=lfs diff=lfs merge=lfs -text
# *.mp4 filter=lfs diff=lfs merge=lfs -text
# ...

# Commit the tracking configuration
git add .gitattributes
git commit -m "Configure Git LFS tracking"

How LFS Works

┌─────────────────────────────────────────────────────────────┐
│                    GIT LFS FLOW                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ┌─────────────┐                                           │
│   │  git add    │──► LFS filter detects tracked file        │
│   │  logo.psd   │                                           │
│   └──────┬──────┘                                           │
│          │                                                   │
│          ▼                                                   │
│   ┌─────────────────────────────────────────┐               │
│   │  1. Calculate SHA-256 of file content   │               │
│   │  2. Store file in .git/lfs/objects/     │               │
│   │  3. Create pointer file for staging     │               │
│   └─────────────┬───────────────────────────┘               │
│                 │                                            │
│                 ▼                                            │
│   ┌─────────────────────────────────────────┐               │
│   │  Pointer file content:                   │               │
│   │  version https://git-lfs.github.com/... │               │
│   │  oid sha256:abc123...                    │               │
│   │  size 15728640                           │               │
│   └─────────────┬───────────────────────────┘               │
│                 │                                            │
│                 ▼                                            │
│   ┌─────────────┐      ┌────────────────────┐               │
│   │ git commit  │──►   │  Commit contains   │               │
│   │             │      │  only pointer      │               │
│   └──────┬──────┘      └────────────────────┘               │
│          │                                                   │
│          ▼                                                   │
│   ┌─────────────┐      ┌────────────────────┐               │
│   │  git push   │──►   │ Pointer to GitHub  │               │
│   │             │      │ File to LFS server │               │
│   └─────────────┘      └────────────────────┘               │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Common Workflows

Adding New Large Files

# Ensure file type is tracked
git lfs track "*.psd"

# Add and commit normally
git add design.psd
git commit -m "Add design file"
git push

Cloning Repositories with LFS

# Standard clone (downloads LFS files automatically)
git clone https://github.com/org/repo.git

# Clone without LFS files (faster for large repos)
GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/org/repo.git
cd repo
# Download specific files later
git lfs pull --include="assets/needed/*"

# Or download all LFS files
git lfs pull

Checking LFS Status

# List tracked patterns
git lfs track

# List all LFS files in repository
git lfs ls-files

# Show LFS file information
git lfs ls-files -l

# Check LFS status
git lfs status

# Verify LFS files
git lfs fsck

Fetching and Pulling

# Fetch LFS objects (download without checkout)
git lfs fetch

# Fetch specific paths only
git lfs fetch --include="assets/textures/*"

# Pull (fetch + checkout)
git lfs pull

# Fetch from specific remote
git lfs fetch origin

# Fetch all refs (branches, tags)
git lfs fetch --all

Migrating Existing Files to LFS

Track New Files Going Forward

# Track pattern before adding files
git lfs track "*.psd"
git add .gitattributes
git commit -m "Track PSD files with LFS"

# Now add the files
git add designs/*.psd
git commit -m "Add design files"

Migrate Files Already in History

Warning: This rewrites Git history. Coordinate with your team.

# See what would be migrated
git lfs migrate info --include="*.psd"

# Migrate files in history
git lfs migrate import --include="*.psd" --everything

# For specific branches only
git lfs migrate import --include="*.psd" --include-ref=main --include-ref=develop

# Force push after migration
git push --force-with-lease

Cleaning Up After Migration

# Remove old objects from local repo
git reflog expire --expire=now --all
git gc --prune=now --aggressive

# Team members must re-clone
# Old clones still have bloated history

CI/CD Integration

GitHub Actions

name: Build
on: push

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout with LFS
        uses: actions/checkout@v4
        with:
          lfs: true

      - name: Build
        run: npm run build

Optimized with caching:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Cache LFS objects
        uses: actions/cache@v4
        with:
          path: .git/lfs
          key: lfs-${{ hashFiles('.lfs-assets-id') }}
          restore-keys: lfs-

      - name: Pull LFS files
        run: git lfs pull

      - name: Build
        run: npm run build

Selective LFS pull (faster):

steps:
  - uses: actions/checkout@v4

  - name: Pull only needed LFS files
    run: |
      git lfs install
      git lfs pull --include="src/assets/images/*" --exclude="*.psd"

GitLab CI

build:
  variables:
    GIT_LFS_SKIP_SMUDGE: "1"  # Skip automatic LFS
  script:
    - git lfs pull --include="needed-files/*"
    - npm run build

Storage and Hosting

Provider Comparison

ProviderFree StorageFree BandwidthPaid Plans
GitHub1 GB1 GB/month$5/50GB pack
GitLab.com5 GB10 GB/month$60/year more
Bitbucket1 GB1 GB/monthVaries by plan
Self-hostedUnlimitedUnlimitedStorage costs

Self-Hosted LFS Server

Using git-lfs-s3:

# Install
go install github.com/git-lfs/lfs-test-server@latest

# Configure S3 backend
export AWS_ACCESS_KEY_ID=xxx
export AWS_SECRET_ACCESS_KEY=xxx
export LFS_CONTENTPATH=s3://my-bucket/lfs
export LFS_ADMINUSER=admin
export LFS_ADMINPASS=secret

# Run server
lfs-test-server

Configure repository:

# Point repo to custom LFS server
git config lfs.url https://my-lfs-server.com/org/repo

Using .lfsconfig (committed to repo):

[lfs]
  url = https://my-lfs-server.com/org/repo

Troubleshooting

Common Issues

Problem: "This repository is over its data quota"

# Check storage usage
git lfs info

# Prune old versions locally
git lfs prune

# Remove files from LFS tracking (keeps in regular Git)
git lfs untrack "*.old"

Problem: LFS files showing as pointer text

# Check if LFS is installed
git lfs install

# Re-checkout LFS files
git lfs checkout

# Or pull all LFS content
git lfs pull

Problem: Slow clone/pull

# Clone without LFS, then selective pull
GIT_LFS_SKIP_SMUDGE=1 git clone <url>
cd repo
git lfs pull --include="needed/**"

# Parallel downloads
git config lfs.concurrenttransfers 8
git lfs pull

Problem: File too large for GitHub

# GitHub limit is 2GB per file
# Split large files or use different storage

# Check file sizes
git lfs ls-files -s

Debugging

# Verbose output
GIT_TRACE=1 GIT_TRANSFER_TRACE=1 git lfs pull

# Check LFS configuration
git lfs env

# Verify file integrity
git lfs fsck

Alternatives to Git LFS

Comparison

┌─────────────────────────────────────────────────────────────┐
│                  LARGE FILE SOLUTIONS                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Git LFS                                                    │
│   ├── Best for: General binary file versioning              │
│   ├── Pros: Simple, well-supported, integrated              │
│   └── Cons: Bandwidth costs, requires LFS support           │
│                                                              │
│   DVC (Data Version Control)                                 │
│   ├── Best for: ML datasets, pipelines, experiments         │
│   ├── Pros: ML-focused features, remote storage options     │
│   └── Cons: Separate tool, learning curve                   │
│                                                              │
│   git-annex                                                  │
│   ├── Best for: Complex storage backends, partial sync      │
│   ├── Pros: Flexible, works with any storage                │
│   └── Cons: Complex setup, different mental model           │
│                                                              │
│   Partial Clone + Sparse Checkout                            │
│   ├── Best for: Huge monorepos with no LFS support          │
│   ├── Pros: Native Git, no extra tools                      │
│   └── Cons: Limited to recent Git versions                  │
│                                                              │
│   External Storage (S3 + references)                         │
│   ├── Best for: Truly massive files (10GB+)                 │
│   ├── Pros: No size limits, cheap storage                   │
│   └── Cons: Manual management, no versioning                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Git Partial Clone (Native Alternative)

# Clone without blob content
git clone --filter=blob:none https://github.com/org/repo.git
cd repo

# Files downloaded on demand when accessed
cat large-file.bin  # Downloaded now

# Sparse checkout for large repos
git sparse-checkout init
git sparse-checkout set src/ docs/

DVC for ML Projects

# Install DVC
pip install dvc

# Initialize in repo
dvc init

# Track large file
dvc add data/training-set.parquet

# Configure remote storage
dvc remote add -d myremote s3://my-bucket/dvc

# Push data
dvc push

# Pull data on another machine
dvc pull

Best Practices

.gitattributes Patterns

# Images
*.png filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.ai filter=lfs diff=lfs merge=lfs -text

# Videos
*.mp4 filter=lfs diff=lfs merge=lfs -text
*.mov filter=lfs diff=lfs merge=lfs -text
*.avi filter=lfs diff=lfs merge=lfs -text

# Audio
*.mp3 filter=lfs diff=lfs merge=lfs -text
*.wav filter=lfs diff=lfs merge=lfs -text

# Archives
*.zip filter=lfs diff=lfs merge=lfs -text
*.tar.gz filter=lfs diff=lfs merge=lfs -text

# Binaries
*.exe filter=lfs diff=lfs merge=lfs -text
*.dll filter=lfs diff=lfs merge=lfs -text
*.so filter=lfs diff=lfs merge=lfs -text

# Data
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text

# Game/3D assets
*.fbx filter=lfs diff=lfs merge=lfs -text
*.blend filter=lfs diff=lfs merge=lfs -text
*.unitypackage filter=lfs diff=lfs merge=lfs -text

Repository Organization

project/
├── src/                    # Regular Git (code)
├── docs/                   # Regular Git (documentation)
├── assets/                 # LFS tracked
│   ├── images/
│   ├── videos/
│   └── designs/
├── data/                   # LFS tracked (or DVC for ML)
│   ├── raw/
│   └── processed/
├── .gitattributes         # LFS tracking patterns
└── .lfsconfig             # LFS server configuration

Let's turn this knowledge into action

Get a free 30-minute consultation with our experts. We'll help you apply these insights to your specific situation.