On 2013-12-13 18:26, Tim Connors wrote:
git beginner here, so forgive me.
As part of change management of a couple hundred servers, we do a regular
'git add .' on a central repository of half a million files, followed by a
bit of munging, then a 'git commit -a -m ...' (rhel5, so 'git -a' is
'git
-A' elsewhere).
'git add .' is very very slow at finding the additions and modifications
(we don't care about the '-u' deletions at this stage, because of the
future munging required) and staging them. I suspect it's actually
staging every single file rather than just changes.
'git status -s' on a freshly changed repository prior to doing any git add
is really quite quick (no, it's not a cold or undersized cache issue), and
finds all additions, deletions and modifications. We could simply pipe
that rather small output to 'git add' and it would be much much quicker at
staging them (er, I think; but also, a bit of a kludge). Any known reason
why git-add would appear to be recursing through the entire tree staging
even unchanged files, and not just acting on the changed files that
git-status obviously can find very quickly? Any missing bit of git magic
I could be applying?
Hi Tim,
My first bet would be write load vs read load. git add has to not only
check the hashes of each file, but for each new file, after it hashes
it, it has to add it to its object database. git status, however, needs
to do zero writing.
I find particularly when using git over NFS, that the add is far slower
than the status.
You could always do an strace to find out what it's doing in more
detail.
Keep in mind that by its very nature, git would not stage unchanged
files, because it would hash the file, determine immediately via hash
table that the hash already existed in the object store, and not bother
to store it again. It *does* have to go through the entire process of
*calculating* the hash for each file every time though, as far as I
know.
Hope this helps.
--
Regards,
Matthew Cengia