The World Before Git, and a few other thoughts on source control
https://ludovic.chabant.com/blog/2023/12/19/the-world-before-git/
@ludovic "a couple hundred GBs" for AAA data? If you include "raw" data we're waaaaaaaaaay past that now, I'm afraid.
Even our engine development branch is bigger than that now
@jon_valdes I don't include raw data because I've never seen it included -- it's always stored... somewhere else... I don't even know where :D
@ludovic I can tell you where. It's... all in Perforce
(at least at EA)
@jon_valdes @ludovic it's just excluded from programmers' workspaces because it's too big
@hugin @jon_valdes @ludovic even not including raw data the numbers are a bit optimistic perhaps. :D Also the main problem with binaries and whatnot isn't their raw size per-se, it's that you do a bunch of CI builds per day and they all just get yeeted into P4. Without some way to limit the history on certain files it's not really possible at all to do that in Git.
@jon_valdes @dotstdy @hugin @ludovic I wonder how filesystem-level deduplication would fare here. Otherwise something like FS virtualization with a (deduplicated cache) might help as well
@RichardKogelnig @dotstdy @hugin @ludovic would only help if the 2 branches are closely related. If it's 2 different games, you'll deduplicate most of the engine code, but almost none of the data or the game code
@jon_valdes @RichardKogelnig @hugin @ludovic filesystem transparent compression helps us a fair bit size wise, but at a considerable performance hit. Tbh I really like the guerilla games approach of giving all the workstations 10gbe links and serving every file directly over the network without caching from a large server. Then your local workspace could be a lot sparser.
@dotstdy @jon_valdes @hugin @ludovic yeah IOI was thinking about something like that as well. Maybe they implemented it in the last 6 years
@jon_valdes Post updated, for posterity :D