
On 2013-12-13 18:26, Tim Connors wrote:
git beginner here, so forgive me.
As part of change management of a couple hundred servers, we do a regular 'git add .' on a central repository of half a million files, followed by a bit of munging, then a 'git commit -a -m ...' (rhel5, so 'git -a' is 'git -A' elsewhere).
'git add .' is very very slow at finding the additions and modifications (we don't care about the '-u' deletions at this stage, because of the future munging required) and staging them. I suspect it's actually staging every single file rather than just changes.
'git status -s' on a freshly changed repository prior to doing any git add is really quite quick (no, it's not a cold or undersized cache issue), and finds all additions, deletions and modifications. We could simply pipe that rather small output to 'git add' and it would be much much quicker at staging them (er, I think; but also, a bit of a kludge). Any known reason why git-add would appear to be recursing through the entire tree staging even unchanged files, and not just acting on the changed files that git-status obviously can find very quickly? Any missing bit of git magic I could be applying?
Hi Tim, My first bet would be write load vs read load. git add has to not only check the hashes of each file, but for each new file, after it hashes it, it has to add it to its object database. git status, however, needs to do zero writing. I find particularly when using git over NFS, that the add is far slower than the status. You could always do an strace to find out what it's doing in more detail. Keep in mind that by its very nature, git would not stage unchanged files, because it would hash the file, determine immediately via hash table that the hash already existed in the object store, and not bother to store it again. It *does* have to go through the entire process of *calculating* the hash for each file every time though, as far as I know. Hope this helps. -- Regards, Matthew Cengia