Cloudflare Enterprise customers have access to the Enterprise Log Share (ELS) service, a RESTful API for consuming request logs over HTTP. This API provides a method for customers to access a domain’s request logs using a client API key.
These logs contain data related to the connecting client, the request path through the Cloudflare network, and the response from the origin web server. This data is useful for enriching existing logs on an origin server.
Data retention period
You can query for log data starting at 5 minutes in the past (relative to the actual time the request is being made) and going back up to 72 hours.
Order of the data returned
The "logs/received" REST API route exposes data by time received, which is the time the event was written to disk in the Cloudflare log aggregation system.
Ordering by log aggregation time instead of log generation time results in lower (faster) log share pipeline latency and deterministic log pulls. Functionally, it is similar to tailing a log file or reading from rsyslog (albeit in chunks).
This means that if you want to obtain logs for a given time range, you can do so by issuing one call for each consecutive minute (or other time range). Because log lines are batched by time received and made available, there is no "late arriving data." A response for a given minute will never change. You do not have to repeatedly poll a given time range to receive logs as they converge on our aggregation system.
Format of the data returned
The Logpull Rest API returns data in NDJSON format, whereby each log line is a valid JSON object. Major analysis tools like Google BigQuery and AWS Kinesis require this format.
If you'd like to turn the resulting log data into a JSON array with one array element per log line, you can use the jq tool. Essentially, you pipe the API response into jq using the slurp (or simply s) flag:
<API request code> | jq -s
Recommended access pattern
The basic access pattern is "give me all the logs for zone Z for minute M" where the minute M refers to the time log records were written to disk in our log aggregation system. Try running your query every minute to start. If responses are too small, go up to 5 minutes (this will be appropriate for most zones). If the responses are too large, trying going down to 15 seconds. If your zone has so many logs that it takes longer than 1 minutes to read 1 minute worth of logs, run 2 workers staggered, each requesting 1 minute worth of logs every 2 minutes.
Data returned by the API will not change on repeat calls. The order of messages in the response may be different, but the number and content of the messages will always be the same for a given query, if the response code is 200 and there is no error reading the response body.
Because data is ingested by our log processing system in batches, most zones seeing less than 1 million requests per minute will have "empty" minutes: queries for such a minute will result in responses with status 200, but no data in the body. This does not mean that there were no requests proxied by Cloudflare for that minute; it just means that our system did not process a batch of logs for that zone in that minute.
- start: inclusive; timestamp formatted unix (which by definition is UTC), unix nano, or rfc3339 (specifies time zone); must be no more than 7 days earlier than now
- end: exclusive; same format as start; must be at least 5 minutes earlier than now and later than start
- X-Auth-Key (Global API key)
- count: Return up to that many records. To return all records, do not include the count parameter.
- sample: Return only a sample of records; sample=0.1 means return 10% (1 in 10) of records
- fields: Comma-separated list of fields to return; when empty default list is returned (9 fields).
- timestamps (beta): Format in which timestamp fields will be returned, one of: "unixnano" (default), "unix", "rfc3339".
curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=RayID,ClientIP"
The only arguments accepted by the "rayids" are "fields" and "timestamps". The rayids route will return 0, 1, or more records (Ray IDs may not be unique).
Once a 200 response is received and read completely for a given zone and time range, the following will be true for all subsequent requests:
- The number and content of returned records will be the same.
- The order of returned records may be (and is likely to) be different.
- When set explicitly in the request url, the response fields will never change, unless a field was designated "beta," in which case it may be removed at any time
- When not set explicitly, default fields will be used; the set of default fields may change at any time.
- The effective rate limit is 1 request every 5 seconds per zone. Multiple requests can run at the same time, but a new request can be made only once every 5 seconds. Exceeding the limit results in a 429 error response.
- Maximum time range (difference between start and end parameters is 1 hour)
- Maximum response size is 1GB uncompressed for time ranges greater than 1 minute; for time ranges of 1 minute or less, there is no limit
- Because responses are streamed, there is no way to determine response size ahead of time. After 1GB of data is streamed, the response will fail with a terminated connection. This allows users to tune their queries to a range that will not result in failures based on their request volume. We believe this is a better solution than hard-limiting requests to an arbitrary time range (e.g., 1 minute at a time).
API Parameters in Depth
When "?count=" is provided, the response will contain up to count results. Because results are not sorted, you are likely to get different data for repeated requests.
When "?sample=" is provided, a sample of matching records is returned. If "sample=0.1", approximately 10% of records will be returned. Sampling is random: repeated calls will not only return different records, but likely will also vary slightly in number of returned records.
When "?count=" is also specified, count is applied to the number of returned records, not the sampled records. So, with "sample=0.05" and "count=7", when there is a total of 100 records available, approximately 5 will be returned. When there are 1,000 records, 7 will be returned. When there are 10,000 records, 7 will be returned.
If "fields" are not specified, by default a limited set of fields will be returned. This default field set may change at any time. The full list of all available fields can be found here:
Fields are passed to the request as a comma separated list. So, to have "ClientIP" and "RayID", use:
The order in which fields are specified doesn't matter, and the order of fields in the response is not specified.
Using Bash subshell and jq, you can download the logs with all available fields without manually copying and pasting the fields into the request like so:
curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=$(curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received/fields" | jq '. | to_entries | .key' -r | paste -sd "," -)"
Currently available fields (as of May 2018):
"CacheCacheStatus": "unknown | miss | expired | updating | stale | hit | ignored | bypass | revalidated",
"CacheResponseBytes":"Number of bytes returned by the cache",
"CacheResponseStatus":"HTTP status code returned by the cache to the edge: all requests (including non-cacheable ones) go through the cache: also see CacheStatus field",
"CacheTieredFill":"Tiered Cache was used to serve this request (beta)",
"ClientASN":"Client AS number",
"ClientCountry":"Country of the client IP address",
"ClientDeviceType":"Client device type",
"ClientIP":"IP address of the client",
"ClientIPClass":"Client IP class",
"ClientRequestBytes":"Number of bytes in the client request",
"ClientRequestHost":"Host requested by the client",
"ClientRequestMethod":"HTTP method of client request",
"ClientRequestProtocol":"HTTP protocol of client request",
"ClientRequestReferer":"HTTP request referrer",
"ClientRequestURI": "URI requested by the client",
"ClientRequestUserAgent":"User agent reported by the client",
"ClientSSLCipher":"Client SSL cipher",
"ClientSSLProtocol":"Client SSL (TLS) protocol",
"ClientSrcPort":"Client source port",
"EdgeColoID":"Cloudflare edge colo id",
"EdgeEndTimestamp":"Unix nanosecond timestamp the edge finished sending response to the client",
"EdgePathingOp":"Indicates what type of response was issued for this request (unknown = no specific action)",
"EdgePathingSrc":"Details how the request was classified based on security checks (unknown = no specific classification)",
"EdgePathingStatus":"Indicates what data was used to determine the handling of this request (unknown = no data)",
"EdgeRateLimitAction":"The action taken by the blocking rule; empty if no action taken (beta)",
"EdgeRateLimitID":"uint64: The internal rule ID of the rate-limiting rule that triggered a block (ban) or simulate action. 0 if no action taken. (beta)",
"EdgeRequestHost":"Host header on the request from the edge to the origin (beta)",
"EdgeResponseBytes":"Number of bytes returned by the edge to the client",
"EdgeResponseCompressionRatio":"Edge response compression ratio",
"EdgeResponseContentType":"Edge response Content-Type header value (beta)",
"EdgeResponseStatus":"HTTP status code returned by Cloudflare to the client",
"EdgeServerIP": "IP of the edge server making a request to the origin (beta)",
"EdgeStartTimestamp":"Unix nanosecond timestamp the edge received request from the client",
"OriginIP":"IP of the origin server",
"OriginResponseBytes":"Number of bytes returned by the origin server",
"OriginResponseHTTPExpires":"Value of the origin 'expires' header in RFC1123 format",
"OriginResponseHTTPLastModified":"Value of the origin 'last-modified' header in RFC1123 format",
"OriginResponseStatus":"Status returned by the origin server",
"OriginResponseTime":"Number of nanoseconds it took the origin to return the response to edge",
"OriginSSLProtocol":"SSL (TLS) protocol used to connect to the origin (beta)",
"RayID":"ID of the request",
"SecurityLevel":"The security level configured at the time of this request. This is used to determine the sensitivity of the IP Reputation system",
"WAFAction":"Action taken by the WAF, if triggered",
"WAFFlags":"Additional configuration flags: simulate (0x1) | null",
"WAFMatchedVar":"The full name of the most-recently matched variable",
"WAFProfile":"WAF profile: low | med | high",
"WAFRuleID":"ID of the applied WAF rule",
"WAFRuleMessage":"Rule message associated with the triggered rule",
"WorkerCPUTime": "Amount of time in microseconds spent executing a worker, if any (beta)",
"WorkerStatus": "Status returned from worker daemon as a string (beta)",
"WorkerSubrequest": "Whether or not this request was a worker subrequest (beta)",
"WorkerSubrequestCount": "Number of subrequests issued by a worker when handling this request (beta)",
"ZoneID":"Internal zone ID"
By default timestamps in responses are returned as Unix nanosecond integers. The
timestamps= query parameter can be set to change the format in which response timestamps are returned. Possible values are: unix, unixnano, rfc3339. Note: unix and unixnano return timestamps as integers; rfc3339 returns timestamps as strings.
Data is timestamped by time it was written to disk at our log aggregation point.
Response data is returned in json, 1 json object (1 log message) per line. Sample log message with default fields:
Estimating daily data volumes
To quickly estimate the amount of data for a zone per day (the number of log lines and the amount of bytes they take up), request a 1 in a 1000 sample of data for a 24h period; note that
end=2017-09-11T00:00:00Z span a 24h period, and
curl -s -H"X-Auth-Email: MONKEY" -H"X-Auth-Key: BANANA" \ "https://api.cloudflare.com/client/v4/zones/<ZONE_TAG>/logs/received?start=2017-09-10T00:00:00Z&end=2017-09-11T00:00:00ZZ&sample=0.001" \ >sample.log ... $ wc -l sample.log 47146 sample.log ... $ ls -lh sample.log -rw-r--r-- 1 mik mik 15M Sep 11 18:56 sample.log
Based on this information, approximate number of messages /day is 47,146,000, and the byte size is 15GB. The size estimate is based on the default response field set; changing the response field set (see the "fields" section) will change the response size.
To get a good estimate of daily traffic it is best to query on entire 24h. If the response size is too small or to large, adjust the sample value, not the time range.
Responses are compressed by default (gzip). Curl transparently decompresses responses, unless called with
-H"accept-encoding: gzip". In that case, output remains gzipped. You should expect compressed data to be about 5-10% of uncompressed. This means that with the 1GB response size limit (uncompressed) compressed responses will be 50-100MB.
Features and fields marked as "beta" are still being tested and validated. They may be removed without notice.