Files
DP44/GIT Migration.md
2026-04-17 14:55:32 -04:00

292 lines
5.9 KiB
Markdown

# SVN → Git Migration Plan: BRANCH_MAINT_4_04
## Overview
**Source:** `http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04`
**Target:** New Git repository
**Scope:** Single project branch, full history preservation
**Estimated Time:** 1-2 hours
---
## SVN Repository Structure
```
http://build:8080/svn/Software/
└── Views/
└── DTS.Suite/
└── branches/
└── BRANCH_MAINT_4_04/ ← This working copy
├── Common/
├── DataPRO/
├── DataPRO_sql/
├── DTS Viewer/
└── ...
```
Note: Non-standard SVN layout (Views/... instead of trunk/branches at root).
---
## Prerequisites
```bash
# Install tools on macOS
brew install git svn
# Verify network access
ping build
```
---
## Phase 1: Author Mapping (~10 min)
### 1.1 Extract SVN Authors
```bash
svn log http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04 --quiet | grep "^r" | awk '{print $3}' | sort -u > svn-authors.txt
```
### 1.2 Create authors.txt
Transform the extracted list into Git format:
```
svn_username = Full Name <email@example.com>
jdoe = John Doe <john@company.com>
tsmith = Tom Smith <tom@company.com>
```
**Format:** `svn_username = Git Name <email>`
---
## Phase 2: Binary Exclusion Strategy
### 2.1 Patterns to Exclude
Based on repository analysis (366 binary files identified):
| Category | Pattern | Reason |
|----------|---------|--------|
| Build outputs | `**/bin/`, `**/obj/` | Generated by build |
| DLLs | `*.dll` | NuGet restore or build output |
| Executables | `*.exe` | Build output or redistributable |
| Installers | `*.msi` | Build artifact |
| Packages | `*.nupkg`, `**/packages/` | NuGet restore |
| Debug files | `*.pdb` | Build output |
| Database files | `*.mdf`, `*.ldf` | Development data |
| PDFs (large) | `*.pdf` | Documentation, not source |
### 2.2 .gitignore Template
```gitignore
# Build outputs
**/bin/
**/obj/
*.dll
*.exe
*.pdb
# Installers & redistributables
**/DataPRO Installer/**/*.msi
**/DataPRO Installer/**/*.exe
**/Redistributables/
# Database files
*.mdf
*.ldf
# Packages (restore via NuGet)
**/packages/
*.nupkg
# IDE & OS files
.vs/
.idea/
*.user
*.suo
.DS_Store
# Generated files
*.Designer.cs
*.g.cs
*.g.i.cs
# AI enrichment (optional)
enriched/
enriched-qwen3-coder-next/
.vectordb/
```
### 2.3 Third-Party DLLs Decision
**Question:** Are any third-party DLLs required in source control (not available via NuGet)?
If yes, track exceptions:
```
!Common/DTS.CommonCore/lib/ThirdParty/required.dll
```
---
## Phase 3: Migration (~30-90 min)
### 3.1 Clone SVN to Git
```bash
git svn clone \
--authors-file=authors.txt \
--no-metadata \
--prefix=svn/ \
http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04 \
BRANCH_MAINT_4_04-git
```
**Flags:**
- `--authors-file`: Maps SVN users → Git identities
- `--no-metadata`: Cleaner commits (no `git-svn-id` lines)
- `--prefix=svn/`: Remote-tracking branch naming
### 3.2 Alternative: git svn init + fetch (for better control)
If the clone is interrupted or you need more control:
```bash
mkdir BRANCH_MAINT_4_04-git
cd BRANCH_MAINT_4_04-git
git svn init --authors-file=../authors.txt --no-metadata \
http://build:8080/svn/Software/Views/DTS.Suite/branches/BRANCH_MAINT_4_04
git svn fetch
```
---
## Phase 4: Post-Clone Cleanup (~15 min)
### 4.1 Navigate to New Repo
```bash
cd BRANCH_MAINT_4_04-git
```
### 4.2 Add .gitignore
```bash
# Create .gitignore with content from Phase 2.2
```
### 4.3 Remove Binaries from Git Tracking
```bash
# Remove build outputs (keep locally)
git rm -r --cached '**/bin/'
git rm -r --cached '**/obj/'
git rm -r --cached '**/packages/'
# Remove binaries
git rm --cached '*.dll'
git rm --cached '*.exe'
git rm --cached '*.msi'
git rm --cached '*.pdb'
git rm --cached '*.mdf'
# Commit cleanup
git add .gitignore
git commit -m "Add .gitignore, remove binaries from tracking"
```
### 4.4 Clean Repository Size (Optional)
```bash
# Remove large files from entire history (destructive)
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch *.dll *.exe *.msi *.mdf' \
--prune-empty --tag-name-filter cat -- --all
```
Then garbage collect:
```bash
git reflog expire --expire=now --all
git gc --prune=now --aggressive
```
---
## Phase 5: Push to Remote (~5 min)
### 5.1 Create Remote Repository
Options:
- **GitHub:** `gh repo create datapro --private`
- **GitLab:** Create via UI
- **Self-hosted:** `git init --bare` on server
### 5.2 Push
```bash
git remote add origin git@github.com:your-org/datapro.git
git branch -M main
git push -u origin main
```
---
## Phase 6: Verification
```bash
# Check history
git log --oneline | head -20
# Check file count
git ls-files | wc -l
# Check for missed binaries
git ls-files | grep -E '\.(dll|exe|msi|mdf)$'
# Verify author mapping
git log --format='%an <%ae>' | sort -u
```
---
## Timeline Summary
| Step | Time | Risk |
|------|------|------|
| Install tools | 5 min | Low |
| Extract authors & create mapping | 10 min | Low |
| git svn clone | 30-90 min | Medium (network) |
| Cleanup & .gitignore | 15 min | Low |
| Push to remote | 5 min | Low |
**Total:** ~1-2 hours
---
## Open Questions
1. **Migration machine:** Run on Mac or machine on same network as SVN server?
2. **Git hosting:** GitHub, GitLab, or self-hosted?
3. **Third-party DLLs:** Any that must stay in source control?
4. **Private files:** Any secrets/configs to exclude before push?
---
## Rollback Plan
If migration fails:
1. Original SVN working copy is unaffected
2. Delete `BRANCH_MAINT_4_04-git/` and retry
3. SVN server remains authoritative until Git push succeeds
---
## Post-Migration
- [ ] Update CI/CD pipelines to use Git
- [ ] Notify team of new repository location
- [ ] Set SVN branch to read-only (optional)
- [ ] Document new workflow in team wiki