Files
DP44/GIT Migration.md
2026-04-17 14:55:32 -04:00

5.9 KiB

SVN → Git Migration Plan: BRANCH_MAINT_4_04

Overview

Source: http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04 Target: New Git repository Scope: Single project branch, full history preservation Estimated Time: 1-2 hours


SVN Repository Structure

http://build:8080/svn/Software/
└── Views/
    └── DTS.Suite/
        └── branches/
            └── BRANCH_MAINT_4_04/   ← This working copy
                ├── Common/
                ├── DataPRO/
                ├── DataPRO_sql/
                ├── DTS Viewer/
                └── ...

Note: Non-standard SVN layout (Views/... instead of trunk/branches at root).


Prerequisites

# Install tools on macOS
brew install git svn

# Verify network access
ping build

Phase 1: Author Mapping (~10 min)

1.1 Extract SVN Authors

svn log http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04 --quiet | grep "^r" | awk '{print $3}' | sort -u > svn-authors.txt

1.2 Create authors.txt

Transform the extracted list into Git format:

svn_username = Full Name <email@example.com>
jdoe = John Doe <john@company.com>
tsmith = Tom Smith <tom@company.com>

Format: svn_username = Git Name <email>


Phase 2: Binary Exclusion Strategy

2.1 Patterns to Exclude

Based on repository analysis (366 binary files identified):

Category Pattern Reason
Build outputs **/bin/, **/obj/ Generated by build
DLLs *.dll NuGet restore or build output
Executables *.exe Build output or redistributable
Installers *.msi Build artifact
Packages *.nupkg, **/packages/ NuGet restore
Debug files *.pdb Build output
Database files *.mdf, *.ldf Development data
PDFs (large) *.pdf Documentation, not source

2.2 .gitignore Template

# Build outputs
**/bin/
**/obj/
*.dll
*.exe
*.pdb

# Installers & redistributables
**/DataPRO Installer/**/*.msi
**/DataPRO Installer/**/*.exe
**/Redistributables/

# Database files
*.mdf
*.ldf

# Packages (restore via NuGet)
**/packages/
*.nupkg

# IDE & OS files
.vs/
.idea/
*.user
*.suo
.DS_Store

# Generated files
*.Designer.cs
*.g.cs
*.g.i.cs

# AI enrichment (optional)
enriched/
enriched-qwen3-coder-next/
.vectordb/

2.3 Third-Party DLLs Decision

Question: Are any third-party DLLs required in source control (not available via NuGet)?

If yes, track exceptions:

!Common/DTS.CommonCore/lib/ThirdParty/required.dll

Phase 3: Migration (~30-90 min)

3.1 Clone SVN to Git

git svn clone \
  --authors-file=authors.txt \
  --no-metadata \
  --prefix=svn/ \
  http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04 \
  BRANCH_MAINT_4_04-git

Flags:

  • --authors-file: Maps SVN users → Git identities
  • --no-metadata: Cleaner commits (no git-svn-id lines)
  • --prefix=svn/: Remote-tracking branch naming

3.2 Alternative: git svn init + fetch (for better control)

If the clone is interrupted or you need more control:

mkdir BRANCH_MAINT_4_04-git
cd BRANCH_MAINT_4_04-git
git svn init --authors-file=../authors.txt --no-metadata \
  http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04
git svn fetch

Phase 4: Post-Clone Cleanup (~15 min)

4.1 Navigate to New Repo

cd BRANCH_MAINT_4_04-git

4.2 Add .gitignore

# Create .gitignore with content from Phase 2.2

4.3 Remove Binaries from Git Tracking

# Remove build outputs (keep locally)
git rm -r --cached '**/bin/'
git rm -r --cached '**/obj/'
git rm -r --cached '**/packages/'

# Remove binaries
git rm --cached '*.dll'
git rm --cached '*.exe'
git rm --cached '*.msi'
git rm --cached '*.pdb'
git rm --cached '*.mdf'

# Commit cleanup
git add .gitignore
git commit -m "Add .gitignore, remove binaries from tracking"

4.4 Clean Repository Size (Optional)

# Remove large files from entire history (destructive)
git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch *.dll *.exe *.msi *.mdf' \
  --prune-empty --tag-name-filter cat -- --all

Then garbage collect:

git reflog expire --expire=now --all
git gc --prune=now --aggressive

Phase 5: Push to Remote (~5 min)

5.1 Create Remote Repository

Options:

  • GitHub: gh repo create datapro --private
  • GitLab: Create via UI
  • Self-hosted: git init --bare on server

5.2 Push

git remote add origin git@github.com:your-org/datapro.git
git branch -M main
git push -u origin main

Phase 6: Verification

# Check history
git log --oneline | head -20

# Check file count
git ls-files | wc -l

# Check for missed binaries
git ls-files | grep -E '\.(dll|exe|msi|mdf)$'

# Verify author mapping
git log --format='%an <%ae>' | sort -u

Timeline Summary

Step Time Risk
Install tools 5 min Low
Extract authors & create mapping 10 min Low
git svn clone 30-90 min Medium (network)
Cleanup & .gitignore 15 min Low
Push to remote 5 min Low

Total: ~1-2 hours


Open Questions

  1. Migration machine: Run on Mac or machine on same network as SVN server?
  2. Git hosting: GitHub, GitLab, or self-hosted?
  3. Third-party DLLs: Any that must stay in source control?
  4. Private files: Any secrets/configs to exclude before push?

Rollback Plan

If migration fails:

  1. Original SVN working copy is unaffected
  2. Delete BRANCH_MAINT_4_04-git/ and retry
  3. SVN server remains authoritative until Git push succeeds

Post-Migration

  • Update CI/CD pipelines to use Git
  • Notify team of new repository location
  • Set SVN branch to read-only (optional)
  • Document new workflow in team wiki