
I'm trying to find a simple way to parse squid logfiles looking for cryptolocker (http://en.wikipedia.org/wiki/CryptoLocker) URL's. The proxy in question denies these anyway because the current version of cryptolocker doesn't authenticate and this proxy requires authentication, so right now it's a useful trigger to notice an infection after the fact but before it has downloaded enough to start infecting user files.
The url's in question are <something>.net/com/biz/etc, and some examples of the something are: qoemswifeitgetscytkircyfq diqkbihifambsnvbylvtdcyyd tlfmwcyfikzcuqoqgpzdpz
so they are random strings of varying length. The challenge is to find a way to identify them without an excessive amount of CPU time (eg not dictionary lookups).
Taking advantage of the fact that the requests are DENIED, and that the url is http://<name>.<tld>/ with no further path, this gets relatively few false positives: zgrep DENIED /var/log/squid/access.log-201404* | egrep 'http://[^.]{10,}\.(com|biz|net)\/ ' but obviously still hits on a few legitimate but long url's. Given that it gets a tiny handful of hits for a non-infected computer, but hundreds and hundreds for an infected computer, it should be relatively easy to sift the results a bit and come up with something. Further suggestions appreciated though! Thanks James