Parsing Log Share JSON logs

Overview

There are different tools that you can use to parse and perform analytics on your logs, which can be downloaded via REST API.

One way to do so is with a tool called jq. You can read about this and download from here: https://stedolan.github.io/jq/

jq is a powerful command for parsing JSON and allows for some detailed analysis. Long term and more detailed analysis would be better suited to a data analysis system such as Kibana, but jq is great for one-off analysis.

You can find details on downloading the log and the full log format in this article.

Aggregating fields

To aggregate a field in the logs, such as by IP address, URI, or referrer, we can use a simple jq query. This is useful to identify any patterns in traffic, for example, to identify your most popular pages or how to block an attack.

$ jq -r .ClientRequestURI logs.json | sort -n | uniq -c | sort -n | tail
2 /nginx-logo.png
2 /poweredby.png
2 /testagain
3 /favicon.ico
3 /testing
3 /testing123
6 /test
7 /testing1234
10 /cdn-cgi/nexp/dok3v=1613a3a185/cloudflare/rocket.js
54 /
$ jq -r .ClientRequestUserAgent logs.json | sort -n | uniq -c | sort -n | tail
1 python-requests/2.9.1
2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.56 Safari/537.17
4 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
5 curl/7.47.2-DEV
36 Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0
51 curl/7.46.0-DEV
$ jq -r .ClientRequestReferer logs.json | sort -n | uniq -c | sort -n | tail
2 http://example.com/testagain
3 http://example.com/testing
5 http://example.com/
5 http://example.com/testing123
7 http://example.com/testing1234
77 null

Filtering fields

Another common case is you may want to see data filtered by a specific field and then aggregated after that. This allows you to answer questions like "which URLs saw the most 502 errors?":

$ jq 'select(.OriginResponseStatus == 502) | .ClientRequestURI' logs.json | sort -n | uniq -c | sort -n | tail
1 "/favicon.ico"
1 "/testing"
3 "/testing123"
6 "/test"
6 "/testing1234"
18 "/"

Or the top IP addresses that saw WAF block:

$ jq -r 'select(.WAFAction == "drop") | .ClientIP' logs.json | sort -n | uniq -c | sort -n
1 127.0.0.1

Understanding pathing

There are three pathing fields: op, status, and src.

Pathing source is the system that last handled the request before an error or passing to the cache server, typically, this will be the macro/reputation list. Possible pathing sources are err, sslv (SSL verification checker), bic (browser integrity check), hot (hotlink protection), macro (the reputation list), skip (Always Online or CDNJS resources), or user (user firewall rule).

$ jq -r .EdgePathingSrc logs.json | sort -n | uniq -c | sort -n | tail
1 err
5 user
93 macro

Pathing op indicates how the request was handled. "wl" is a request that passed checks and went to your origin. Other possible values are: errHost (host header mismatch, DNS errors, etc), ban (blocked by IP address, range, etc), tempOk (challenge successfully completed), chl (challenge issued).

$ jq -r .EdgePathingOp logs.json | sort -n | uniq -c | sort -n | tail
1 chl
1 errHost
97 wl

The pathing status is the value pathing source returns. With a pathing source of macro, user, or err, the pathing status indicates which list the IP address was found on. "nr" is the most common in most cases and means the request was not flagged by a security check. Some indicate the class of user, e.g. "se" means search engine. Others indicate whether they saw an error, captcha, such as, captchaNew or jschlOK.

$ jq -r .EdgePathingStatus logs.json | sort -n | uniq -c | sort -n | tail
1 captchaNew
1 dnsErr
5 ip
92 nr

How does pathing map to Threat Analytics

Certain combinations of pathing have been labelled in our Threat Analytics. The mapping is as follows:


Understanding response fields

The response status is found in three places in a request: edgeResponse, cacheResponse, and originResponse. In your logs, the edge is what first accepts a visitor requests, the cache then accepts the request and forwards to your origin or from the cache. It's possible to have a request that has only an edgeResponse or a request that has an edgeResponse, cacheResponse, but no originResponse.

This allows you to see where a request terminates. Requests with only an edgeResponse likely hit a security check or processing error. Requests with an edgeResponse and a cacheResponse either were served from the cache or saw an error contacting your origin. Requests that have an originResponse went all the way to your origin and errors seen would have been served directly from there.

For example, this query would show the status code and pathing information for all requests that terminated at our edge:

$ jq -r 'select(.OriginResponseStatus == null) | select(.CacheResponseStatus == null) |"\(.EdgeResponseStatus) / \(.EdgePathingSrc) / \(.EdgePathingStatus) / \(.EdgePathingOp)"' logs.json | sort -n | uniq -c | sort -n
1 403 / macro / captchaNew / chl
1 403 / macro / nr / wl
1 409 / err / dnsErr / errHost

Showing cached requests

To see what your cache ratios are used the following query:

$ jq -r '.CacheCacheStatus' logs.json | sort -n | uniq -c | sort -n
3 hit
3 null
3 stale
4 expired
6 miss
81 unknown

Showing TLS versions

To see what TLS versions your visitors are using (for example, to make an educated decision on whether you can disable versions of TLS less than 1.2) use this query:

$ jq -r '.ClientSSLProtocol' logs.json | sort -n | uniq -c | sort -n
42 none
58 TLSv1.2
Not finding what you need?

95% of questions can be answered using the search tool. This is the quickest way to get a response.

Powered by Zendesk