Cloudflare Enterprise customers have access to the Enterprise Log Share (ELS) service, a RESTful API for consuming request logs over HTTP. This API provides a method for customers to access a domain’s request logs using a client API key.
These logs contain data related to the connecting client, the request path through the Cloudflare network, and the response from the origin web server. This data is useful for enriching existing logs on an origin server.
Data retention period
You can query for log data starting at 1 minute in the past (relative to the actual time the request is being made) and going back up to 72 hours.
Order of the data returned
The "logs/received" REST API route exposes data by time received, which is the time the event was written to disk in the Cloudflare log aggregation system.
Ordering by log aggregation time instead of log generation time results in lower (faster) log share pipeline latency and deterministic log pulls. Functionally, it is similar to tailing a log file or reading from rsyslog (albeit in chunks).
This means that if you want to obtain logs for a given time range, you can do so by issuing one call for each consecutive minute (or other time range). Because log lines are batched by time received and made available, there is no "late arriving data." A response for a given minute will never change. You do not have to repeatedly poll a given time range to receive logs as they converge on our aggregation system.
Format of the data returned
The Logpull Rest API returns data in NDJSON format, whereby each log line is a valid JSON object. Major analysis tools like Google BigQuery and AWS Kinesis require this format.
If you'd like to turn the resulting log data into a JSON array with one array element per log line, you can use the jq tool. Essentially, you pipe the API response into jq using the slurp (or simply s) flag:
<API request code> | jq -s
Recommended access pattern
The basic access pattern is "give me all the logs for zone Z for minute M" where the minute M refers to the time log records were written to disk in our log aggregation system. Try running your query every minute to start. If responses are too small, go up to 5 minutes (this will be appropriate for most zones). If the responses are too large, trying going down to 15 seconds. If your zone has so many logs that it takes longer than 1 minutes to read 1 minute worth of logs, run 2 workers staggered, each requesting 1 minute worth of logs every 2 minutes.
Data returned by the API will not change on repeat calls. The order of messages in the response may be different, but the number and content of the messages will always be the same for a given query, if the response code is 200 and there is no error reading the response body.
Because data is ingested by our log processing system in batches, most zones seeing less than 1 million requests per minute will have "empty" minutes: queries for such a minute will result in responses with status 200, but no data in the body. This does not mean that there were no requests proxied by Cloudflare for that minute; it just means that our system did not process a batch of logs for that zone in that minute.
- start: inclusive; timestamp formatted unix (which by definition is UTC), unix nano, or rfc3339 (specifies time zone); must be no more than 7 days earlier than now
- end: exclusive; same format as start; must be at least 1 minute earlier than now and later than start
- X-Auth-Key (Global API key)
- count: Return up to that many records. To return all records, do not include the count parameter.
- sample: Return only a sample of records; sample=0.1 means return 10% (1 in 10) of records
- fields: Comma-separated list of fields to return; when empty default list is returned (9 fields).
- timestamps (beta): Format in which timestamp fields will be returned, one of: "unixnano" (default), "unix", "rfc3339".
curl -s -H "X-Auth-Email: firstname.lastname@example.org" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=RayID,ClientIP"
The only arguments accepted by the "rayids" are "fields" and "timestamps". The rayids route will return 0, 1, or more records (Ray IDs may not be unique).
Once a 200 response is received and read completely for a given zone and time range, the following will be true for all subsequent requests:
- The number and content of returned records will be the same.
- The order of returned records may be (and is likely to) be different.
- When set explicitly in the request url, the response fields will never change, unless a field was designated "beta," in which case it may be removed at any time
- When not set explicitly, default fields will be used; the set of default fields may change at any time.
- There are two rate limits; exceeding either one will result in a 429 error response:
- 15 requests/min per zone
- 180 requests/min per user (email address)
- Maximum time range that can queried (difference between start and end parameters) is 1 hour
- A maximum of 64M records (logs) can be returned in a single response
API Parameters in Depth
When "?count=" is provided, the response will contain up to count results. Because results are not sorted, you are likely to get different data for repeated requests.
When "?sample=" is provided, a sample of matching records is returned. If "sample=0.1", approximately 10% of records will be returned. Sampling is random: repeated calls will not only return different records, but likely will also vary slightly in number of returned records.
When "?count=" is also specified, count is applied to the number of returned records, not the sampled records. So, with "sample=0.05" and "count=7", when there is a total of 100 records available, approximately 5 will be returned. When there are 1,000 records, 7 will be returned. When there are 10,000 records, 7 will be returned.
If "fields" are not specified, by default a limited set of fields will be returned. This default field set may change at any time. The full list of all available fields can be found here:
Fields are passed to the request as a comma separated list. So, to have "ClientIP" and "RayID", use:
The order in which fields are specified doesn't matter, and the order of fields in the response is not specified.
Using Bash subshell and jq, you can download the logs with all available fields without manually copying and pasting the fields into the request like so:
curl -s -H "X-Auth-Email: email@example.com" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=$(curl -s -H "X-Auth-Email: firstname.lastname@example.org" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received/fields" | jq '. | to_entries | .key' -r | paste -sd "," -)"
Currently available fields (as of Dec 2018):
"CacheCacheStatus": "unknown | miss | expired | updating | stale | hit | ignored | bypass | revalidated",
"CacheResponseBytes":"Number of bytes returned by the cache",
"CacheResponseStatus":"HTTP status code returned by the cache to the edge: all requests (including non-cacheable ones) go through the cache: also see CacheStatus field",
"CacheTieredFill":"Tiered Cache was used to serve this request",
"ClientASN":"Client AS number",
"ClientCountry":"Country of the client IP address",
"ClientDeviceType":"Client device type",
"ClientIP":"IP address of the client",
"ClientIPClass":"Client IP class": 'unknown','clean','badHost','searchEngine','whitelist','greylist','monitoringService','securityScanner','noRecord','scan','backupService','mobilePlatform','tor'",
"ClientRequestBytes":"Number of bytes in the client request",
"ClientRequestHost":"Host requested by the client",
"ClientRequestMethod":"HTTP method of client request",
"ClientRequestPath":"URI path requested by the client",
"ClientRequestProtocol":"HTTP protocol of client request",
"ClientRequestReferer":"HTTP request referrer",
"ClientRequestURI": "URI requested by the client",
"ClientRequestUserAgent":"User agent reported by the client",
"ClientSSLCipher":"Client SSL cipher",
"ClientSSLProtocol":"Client SSL (TLS) protocol",
"ClientSrcPort":"Client source port",
"EdgeColoID":"Cloudflare edge colo id",
"EdgeEndTimestamp":"timestamp the edge finished sending response to the client",
"EdgePathingOp":"Indicates what type of response was issued for this request (unknown = no specific action)",
"EdgePathingSrc":"Details how the request was classified based on security checks (unknown = no specific classification)",
"EdgePathingStatus":"Indicates what data was used to determine the handling of this request (unknown = no data)",
"EdgeRequestHost":"Host header on the request from the edge to the origin",
"EdgeResponseBytes":"Number of bytes returned by the edge to the client",
"EdgeResponseCompressionRatio":"Edge response compression ratio",
"EdgeResponseContentType":"Edge response Content-Type header value",
"EdgeResponseStatus":"HTTP status code returned by Cloudflare to the client",
"EdgeServerIP": "IP of the edge server making a request to the origin",
"EdgeStartTimestamp":"Unix nanosecond timestamp the edge received request from the client",
"OriginIP":"IP of the origin server",
"OriginResponseBytes (deprecated)":"Number of bytes returned by the origin server",
"OriginResponseHTTPExpires":"Value of the origin 'expires' header in RFC1123 format",
"OriginResponseHTTPLastModified":"Value of the origin 'last-modified' header in RFC1123 format",
"OriginResponseStatus":"Status returned by the origin server",
"OriginResponseTime":"Number of nanoseconds it took the origin to return the response to edge",
"OriginSSLProtocol":"SSL (TLS) protocol used to connect to the origin",
"RayID":"ID of the request",
"SecurityLevel":"The security level configured at the time of this request. This is used to determine the sensitivity of the IP Reputation system",
"WAFAction":"Action taken by the WAF, if triggered",
"WAFFlags":"Additional configuration flags: simulate (0x1) | null",
"WAFMatchedVar":"The full name of the most-recently matched variable",
"WAFProfile":"WAF profile: low | med | high",
"WAFRuleID":"ID of the applied WAF rule",
"WAFRuleMessage":"Rule message associated with the triggered rule",
"ZoneID":"Internal zone ID"
By default timestamps in responses are returned as Unix nanosecond integers. The
timestamps= query parameter can be set to change the format in which response timestamps are returned. Possible values are: unix, unixnano, rfc3339. Note: unix and unixnano return timestamps as integers; rfc3339 returns timestamps as strings.
Data is timestamped by time it was written to disk at our log aggregation point.
Response data is returned in json, 1 json object (1 log message) per line. Sample log message with default fields:
Estimating daily data volumes
To estimate the amount of data for a zone per day (the number of log lines and the amount of bytes they take up), request a 1% or 10% sample of data for a 1h period (use 10% if your volume is low); note that start=2018-12-15T00:00:00Z and end=2018-12-15T01:00:00Z span a 1h period, and sample=0.1.
curl -s -H"X-Auth-Email: [REDACTED]" -H"X-Auth-Key: [REDACTED]" \ "https://api.cloudflare.com/client/v4/zones/<ZONE_ID>/logs/received?start=2018-12-15T00:00:00Z&end=2018-12-15T01:00:00Z&sample=0.1" \ >sample.log ... $ wc -l sample.log 83 sample.log ... $ ls -lh sample.log -rw-r--r-- 1 mik mik 25K Dec 17 15:49 sample.log
Based on this information, the approximate number of messages/day is 19,920 (83*10*24), and the byte size is 6MB (25K*10*24). The size estimate is based on the default response field set; changing the response field set (see the "fields" section) will change the response size.
To get a good estimate of daily traffic, it is best to get at least 30 log lines in your hourly sample. If the response size is too small (or too large), adjust the sample value, not the time range.
Responses are compressed by default (gzip). Curl transparently decompresses responses, unless called with
-H"accept-encoding: gzip". In that case, output remains gzipped. You should expect compressed data to be about 5-10% of uncompressed. This means that with a 1GB response size limit (uncompressed) compressed responses will be 50-100MB.
Features and fields marked as "beta" are still being tested and validated. They may be removed without notice.