gsutil -m rsync + Invalid Unicode path encountered error

I am getting this error: Caught non-retryable exception while listing file:///home/backups/html/: CommandException: Invalid Unicode path encountered (u'/home/backups/499322._425050_.can._R\xe9 sum\xe9 - Jeremy Smith.pdf'). gsutil cannot proceed with such files present. Please remove or rename this file and try again. NOTE: the path printed above replaces the problematic characters with a hex- encoded printable representation. For more details (including how to convert to a gsutil-compatible encoding) see `gsutil help encoding`. The filename is; 499322._425050_.can._Résumé - Jeremy Smith.pdf # convmv -r -f ISO-8859-1 -t UTF-8 499322._425050_.can._Résumé\ -\ Jeremy\ Smith.pdf Skipping, already UTF-8: 499322._425050_.can._Résumé - Jeremy Smith.pdf I am confused here - the convmv says its UTF-8 already but gsutil throws the error. Using the convmv --nosmart flag screws up the file name. Can anyone shed light on this? I have about 1 mil files and I don't want to rename them one by one each time I get an error. Thanks Piers

On 05.03.16 14:08, Piers Rowan via luv-main wrote:
The filename is;
499322._425050_.can._Résumé - Jeremy Smith.pdf
Well, a quick check of your post in vim, with "8g8" shows no illegal UTF characters anywhere. With "g8" you can read the UTF-8 for the character under the cursor, e.g. é is "c3 a9". I'd elide characters from a sample filename, until gsutil stops complaining. Then do a g8 on the offending character. Then you'd have have specific fault diagnostics for a bug report, if that's the way it goes. (It's two-to-one in favour, so far.) Erik

On 05/03/16 15:30, Erik Christiansen via luv-main wrote:
Well, a quick check of your post in vim, with "8g8" shows no illegal UTF characters anywhere. With "g8" you can read the UTF-8 for the character under the cursor, e.g. é is "c3 a9".
Thanks for the tip - I am not much of a vim specialist
I'd elide characters from a sample filename, until gsutil stops complaining. Then do a g8 on the offending character. Then you'd have have specific fault diagnostics for a bug report, if that's the way it goes. (It's two-to-one in favour, so far.)
As much as I would want to help Google write better tools for its customers * I really need to get this done so that I can actually get a sever migration done before tomorrow. I have ended up using: # find filenames I want to change find . -regex '.*[é].*' > /root/bad.txt # use PHP to write a bash script ## Because I can code in PHP but it suffers from the same encoding issue # Execute a bash file to copy & delete files ~~~~ Elegant? No. Does the job? Yes * There are two faults that I have with the gsutil -m rsync applications: 1) Does not skip on the file - one broken file means backup no-worky 2) The error message breaks that file name across two+lines so does not paste right into the shell. Thanks for your help Cheers Piers
participants (2)
-
Erik Christiansen
-
Piers Rowan