Proxy logs contain requests made to the internet by users, applications or services on your network.

There is various things we can look for in these requests which can indicate malicious intent such as malware downloads and callbacks to command and control servers. To find this malicious behaviour, there is a number of methods we can use, lets go through them.


The first method we are going to use is looking for top level domains which are known for selling domains to spammers and malware operators. The Spamhous Project has a list of these which are regularly updated.

For each of the above 10 top level domains, between one third and two thirds of all domains are known for spam or malware. Search your logs for these domains, whitelist any legitimate ones which your users visit, and anything else should highlight nefarious communications.


At times, malicious attackers will encode data in base64 within URLs to exfiltrate data. For example, when a host becomes infected, it will sometimes send its details out via base64. There is two methods we can use to look for this kind of traffic.

Firstly, we can look for “Vm0wd” within a URL. When a string is Base64 encoded enough times, the first 5 characters will always become Wm0wd. There is also little reason for this legitimately to appear within a URL. Search for this string and if found it could highlight a host that has been infected in some way or another.

The second method we can use is to look for any base64 encoded string within a URL. We can use the following regular expression to do this.


This will find any base64 encoding at the end of a URL which can be an indication of malicious behaviour.



Looking for known bad URLs in logs is great, but an even better method we can use is looking for common patterns in these URLs which we can look for.

To do this, I first went to and downloaded all submitted URLs for the past 30 days. I then ran through the 4322 URLs and looked for common things within them which we can look for.


The first thing to look for in your logs are any URLs containing pastebin or github.

The only people making regular requests out to pastebin or github should be IT staff, so any connections outwith this should stand out. 54 of the URLs submitted in the past 30 days to urlhaus used either pastebin or github. If someone from HR or Finance starts making regular connections out to a random pastebin, its probably not a good indicator.


WordPress is very commonly used by attackers. They hijack legitimate sites and then use these sites to host their malware unknowingly. WordPress, wp-admin, wp-includes, wp-content and wp-images were together included in over 350 of the reported URLs in the past month to urlhaus.

To detect this, there are a few methods we can use. Firstly, we can use URL matches the following expression


This regex will find any URL which contains wp- or wordpress and then finishes with a common file name.

Another method we can use is just to look for any URL which contains wp- or wordpress. This is more prone to false positives as it isn’t looking at a file download, however I personally like using this method too. It takes more work as you have to whitelist any legitimate sites your users go to, but it will help you to unearth some threats within your network. 132 of the URLs reported to urlhaus in the past month contain wp- or wordpress but don’t end with a filename.


Looking at the data from urlhaus, a quarter of the submissions this month either ended with .i or Mozi.m. This is due to the Hajime malware and Mozi botnet. Again we can look for these using a simple regular expression. (/Mozi.m|/.i)$

This will look for any URL ending with /Mozi.m or /.i which will help to highlight malware and botnet infections.

Using the above three methods, we are able to detect almost half of the submitted URLs without relying on exact URL matches.



Another good way to find malicious or suspicious behaviour is to look for rare behaviour. Take your proxy logs for the past week or month and sort one by one looking at user agents, domains and destination IPs.

If a user agent, domain or destination has only appeared once in a week or month, this is very suspicious. Using this method is a great way of finding attacks that does not require any prior knowledge.



Domain Generation Algorithm (DGA) Domains are used by attackers to avoid detection by blacklists. These domains avoid detection by being changed around quickly so even if one domain is blocked by a blacklist, the next one is used before it can be blocked.

The only issue with domains like this is they stand out due to their randomness, they look nothing like a valid domain in most cases. Due to this, most major SIEM vendors have created built in tools we can use to detect this behaviour.

Splunk –

QRadar –

Thanks for reading this post and I hope some of these methods help you to detect malicious behaviour within your proxy logs. You may have noticed I never spoke about using URL blacklist’s to detect malicious behaviour. I decided to skip this as it is so common and used by every SOC already.

If you have any questions on how to implement any of the above, please contact me at

Leave a Reply

Your email address will not be published. Required fields are marked *