Cloudflare’s Enterprise customers have access to the Enterprise Log Share service, a RESTful API that allows our customers to consume request logs over HTTP. Our REST API provides an easy method for customers to access a domain’s request logs using their existing client API key.
These logs contain data surrounding the connecting client, the request’s path through Cloudflare’s network, and the response from the origin, and are extremely useful for enriching existing logs on an origin server.
*Note: if you are currently using the old "/logs/requests" endpoint, information about it can be found here. However, please note that it has been deprecated and will be sunset in early 2018 (as announced here).
The "logs/received" REST API route exposes data by time received (the time the event was written to disk in our log aggregation system). This is in contrast to our "logs/requests" API documented here, which orders logs by request time. Ordering by log aggregation time instead of log generation time results in lower (faster) log share pipeline latency and deterministic log pulls. Functionally, it is similar to tailing a log file, or reading from rsyslog (albeit in chunks).
This means that if you want to obtain logs for a given time range, you can do so by issuing one call for each consecutive minute (or other time range). Because log lines are batched by time received and made available, there is no "late arriving data." A response for a given minute will never change. You do not have to repeatedly poll a given time range to receive logs as they converge on our aggregation system.
Recommended Access Pattern
Pick a time interval that works (we suggest starting with a minute for most request volumes; try a few intervals, if the interval returns too much data to easily handle, scale down), and then make requests with that fixed interval (not setting "count") for the whole time range in which you are interested. Once you read an entire 200 response for a request, you do not need to hit that url again.
cURL example from burritobot.com
- start: inclusive, either unix timestamp (which by definition is UTC) or time in rfc3339 (specifies time zone); must be at least 5 minutes in the past.
- end: exclusive, format same as start.
- count: return up to that many records; empty or negative means return all records.
- sample: return only a sample of records; sample=0.1 means return 10% (1 in 10) of records
- fields: comma separated list of fields to return; when empty default list is returned.
- timestamps (beta): format in which timestamp fields will be returned, one of: "unixnano" (default), "unix", "rfc3339"
curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=RayID,ClientIP"
Once a 200 response is received and read completely for a given zone and time range, the following will be true for all subsequent requests:
- The number and content of returned records will be the same.
- The order of returned records may be (and is likely to) be different.
Requests will fail when:
- The rate of requests exceeds 1 request per zone per second (Cloudflare will return a HTTP 429)
- There are more than 5 requests to the endpoint for a given zone currently in progress.
- There is more than 1GB of uncompressed data in the response. Because responses are streamed, there is no way to determine response size ahead of time. After 1GB of data is streamed, the response will fail with a terminated connection. This allows users to tune their queries to a range that will not result in failures based on their request volume. We believe this is a better solution than hard-limiting requests to an arbitrary time range (e.g., 1 minute at a time).
- When set explicitly in the request url, the response fields will never change.
- When not set explicitly, default fields will be used; default fields may change at any time.
API Parameters In Depth
When "?count=" is provided, the response will contain up to count results. Because results are not sorted, you are likely to get different data for repeated requests.
When "?sample=" is provided, a sample of matching records is returned. If "sample=0.1", approximately 10% of records will be returned. Sampling is random: repeated calls will not only return different records, but likely will also vary slightly in number of returned records.
When "?count=" is also specified, count is applied to the number of returned records, not the sampled records. So, with "sample=0.05" and "count=7", when there is a total of 100 records available, approximately 5 will be returned. When there are 1,000 records, 7 will be returned. When there are 10,000 records, 7 will be returned.
If "fields" are not specified, by default a limited set of fields will be returned. This default field set may change at any time. The full list of all available fields can be found here:
Fields are passed to the request as a comma separated list. So, to have "ClientIP" and "RayID", use:
The order in which fields are specified doesn't matter, and the order of fields in the response is not specified. If you need fields added to the list of available fields, please contact your account manager.
Currently available fields (as of January 2018):
"CacheCacheStatus": "unknown | miss | expired | updating | stale | hit | ignored | bypass | revalidated",
"CacheResponseBytes": "Number of bytes returned by the cache",
"CacheResponseStatus": "HTTP status code returned by the cache to the edge: all requests (including non-cacheable ones) go through the cache: also see CacheStatus field",
"ClientASN": "Client AS number",
"ClientCountry": "Country of the client IP address",
"ClientDeviceType": "Client device type",
"ClientIP": "IP address of the client",
"ClientIPClass": "Client IP class",
"ClientRequestBytes": "Number of bytes in the client request",
"ClientRequestHost": "Host requested by the client",
"ClientRequestMethod": "HTTP method of client request",
"ClientRequestProtocol": "HTTP protocol of client request",
"ClientRequestReferer": "HTTP request referrer",
"ClientRequestURI": "URI requested by the client",
"ClientRequestUserAgent": "User agent reported by the client",
"ClientSSLCipher": "Client SSL cipher",
"ClientSSLProtocol": "Client SSL protocol",
"ClientSrcPort": "Client source port",
"EdgeColoID": "Cloudflare edge colo id",
"EdgeEndTimestamp": "Unix nanosecond timestamp the edge finished sending response to the client",
"EdgePathingOp": "Indicates what type of response was issued for this request (unknown = no specific action)",
"EdgePathingSrc": "Details how the request was classified based on security checks (unknown = no specific classification)",
"EdgePathingStatus": "Indicates what data was used to determine the handling of this request (unknown = no data)",
"EdgeResponseBytes": "Number of bytes returned by the edge to the client",
"EdgeResponseCompressionRatio": "Edge response compression ratio",
"EdgeResponseStatus": "HTTP status code returned by Cloudflare to the client",
"EdgeStartTimestamp": "Unix nanosecond timestamp the edge received request from the client",
"OriginIP": "IP of the origin server",
"OriginResponseBytes": "Number of bytes returned by the origin server",
"OriginResponseHTTPExpires": "Value of the origin 'expires' header in RFC1123 format",
"OriginResponseHTTPLastModified": "Value of the origin 'last-modified' header in RFC1123 format",
"OriginResponseStatus": "Status returned by the origin server",
"OriginResponseTime": "Number of nanoseconds it took the origin to return the response to edge",
"RayID": "Ray ID of the request",
"SecurityLevel": "The security level configured at the time of this request. This is used to determine the sensitivity of the IP Reputation system.",
"WAFAction": "Action taken by the WAF, if triggered",
"WAFFlags": "Additional configuration flags: simulate (0x1) | null",
"WAFMatchedVar": "The full name of the most-recently matched variable",
"WAFProfile": "WAF profile: low | med | high",
"WAFRuleID": "ID of the applied WAF rule",
"WAFRuleMessage": "Rule message associated with the triggered rule",
"ZoneID": "Internal zone ID"
By default timestamps in responses are returned as Unix nanosecond integers. The
timestamps= query parameter can be set to change the format in which response timestamps are returned. Possible values are: unix, unixnano, rfc3339. Note: unix and unixnano return timestamps as integers; rfc3339 returns timestamps as strings.
Data is timestamped by time it was written to disk at our log aggregation point.
Response data is returned in json, 1 json object (1 log message) per line. Sample log message with default fields:
Estimating daily data volumes
To quickly estimate the amount of data for a zone per day (the number of log lines and the amount of bytes they take up), request a 1 in a 1000 sample of data for a 24h period; note that
end=2017-09-11T00:00:00Z span a 24h period, and
curl -s -H"X-Auth-Email: MONKEY" -H"X-Auth-Key: BANANA" \ "https://api.cloudflare.com/client/v4/zones/<ZONE_TAG>/logs/received?start=2017-09-10T00:00:00Z&end=2017-09-11T00:00:00ZZ&sample=0.001" \ >sample.log ... $ wc -l sample.log 47146 sample.log ... $ ls -lh sample.log -rw-r--r-- 1 mik mik 15M Sep 11 18:56 sample.log
Based on this information, approximate number of messages /day is 47,146,000, and the byte size is 15GB. The size estimate is based on the default response field set; changing the response field set (see the "fields" section) will change the response size.
To get a good estimate of daily traffic it is best to query on entire 24h. If the response size is too small or to large, adjust the sample value, not the time range.
Responses are compressed by default (gzip). Curl transparently decompresses responses, unless called with
-H"accept-encoding: gzip". In that case, output remains gzipped. You should expect compressed data to be about 5-10% of uncompressed. This means that with the 1GB response size limit (uncompressed) compressed responses will be 50-100MB.