Enterprise Log Share: Logpull REST API

Overview

Cloudflare’s Enterprise customers have access to the Enterprise Log Share service, a RESTful API that allows customers to consume request logs over HTTP. This REST API provides a method for customers to access a domain’s request logs using their existing client API key.

These logs contain data surrounding the connecting client, the request’s path through Cloudflare’s network, and the response from the origin, and are extremely useful for enriching existing logs on an origin server. 

*Note: if you are currently using the old "/logs/requests" endpoint, information about it can be found here. However, please note that it has been deprecated and will be sunset in early 2018 (as announced here).

Order of the data returned

The "logs/received" REST API route exposes data by time received (the time the event was written to disk in our log aggregation system). This is in contrast to our "logs/requests" API documented here, which orders logs by request time.

Ordering by log aggregation time instead of log generation time results in lower (faster) log share pipeline latency and deterministic log pulls. Functionally, it is similar to tailing a log file, or reading from rsyslog (albeit in chunks).

This means that if you want to obtain logs for a given time range, you can do so by issuing one call for each consecutive minute (or other time range). Because log lines are batched by time received and made available, there is no "late arriving data." A response for a given minute will never change. You do not have to repeatedly poll a given time range to receive logs as they converge on our aggregation system.

Format of the data returned

The Logpull Rest API returns data in NDJSON format, whereby each log line is a valid JSON object.  Major analysis tools like Google BigQuery and AWS Kinesis require this format.

If you'd like to turn the resulting log data into a JSON array with one array element per log line, you can use the jq tool.  Essentially, you pipe the API response into jq using the slurp (or simply s) flag:

<API request code> | jq -s

Recommended access pattern

The basic access pattern is "give me all the logs for zone Z for minute M" where the minute M refers to the time log records were written to disk in our log aggregation system. Try running your query every minute to start. If responses are too small, go up to 5 minutes (this will be appropriate for most zones). If the responses are too large, trying going down to 15 seconds. If your zone has so many logs that it takes longer than 1 minutes to read 1 minute worth of logs, run 2 workers staggered, each requesting 1 minute worth of logs every 2 minutes. 

Data returned by the API will not change on repeat calls. The order of messages in the response may be different, but the number and content of the messages will always be the same for a given query, if the response code is 200 and there is no error reading the response body.

Because data is ingested by our log processing system in batches, most zones seeing less than 1 million requests per minute will have "empty" minutes: queries for such a minute will result in responses with status 200, but no data in the body. This does not mean that there were no requests proxied by Cloudflare for that minute; it just means that our system did not process a batch of logs for that zone in that minute.

Access

/logs/received

https://api.cloudflare.com/client/v4/zones/<zone_id>/logs/received?start=<unix|rfc3339>&end=<unix|rfc3339>[&count=<int>][&sample=<float>][&fields=<fields>][&timestamps=<string>]

Required parameters:

  • start: inclusive; timestamp formatted unix (which by definition is UTC), unix nano, or rfc3339 (specifies time zone); must be at least 5 minutes in the past
  • end: exclusive; same format as start; must be no more than 7 days earlier than now

The maximum time range form start to end can't exceed 1 hour. Start is inclusive, end is exclusive. Because of that, to get all data, at minutely cadence, starting at 10AM, the proper values are: start=2018-05-15T10:00:00Z&end=2018-05-15T10:01:00Z, then start=2018-05-15T10:01:00Z&end=2018-05-15T10:02:00Z and so on; the "overlap" will be handled properly.

  •  headers:
    • X-Auth-Email
    • X-Auth-Key (Global API key)

Optional parameters:

  • count: return up to that many records
  • sample: return only a sample of records; sample=0.1 means return 10% (1 in 10) of records
  • fields: comma-separated list of fields to return; when empty default list is returned (9 fields)
  • timestamps (beta): format in which timestamp fields will be returned, one of: "unixnano" (default), "unix", "rfc3339"

Sample cURL: 

curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=RayID,ClientIP"

/logs/rayid

https://api.cloudflare.com/client/v4/zones/<zone_id>/logs/rayids/<ray_id>?[&fields=<string>][&timestamps=<strings>]

The only arguments accepted by the "rayids" are "fields" and "timestamps". The rayids route will return 0, 1, or more records (Ray IDs may not be unique).

Service Expectations

Once a 200 response is received and read completely for a given zone and time range, the following will be true for all subsequent requests:

  • The number and content of returned records will be the same.
  • The order of returned records may be (and is likely to) be different.

Response fields:

  • When set explicitly in the request url, the response fields will never change, unless a field was designated "beta," in which case it may be removed at any time
  • When not set explicitly, default fields will be used; the set of default fields may change at any time.

Limits:

  • 15 requests per minute per zone when using 1 persistent connection and up to 60 requests per minute per zone when using multiple connections (in practice, there's no need to call the API this often because the data is not processed this frequently and thus you're likely to receive empty responses)
  • Maximum time range (difference between start and end parameters is 1 hour)
  • Maximum response size is 1GB uncompressed for time ranges greater than 1 minute; for time ranges of 1 minute or less, there is no limit
    • Because responses are streamed, there is no way to determine response size ahead of time. After 1GB of data is streamed, the response will fail with a terminated connection. This allows users to tune their queries to a range that will not result in failures based on their request volume. We believe this is a better solution than hard-limiting requests to an arbitrary time range (e.g., 1 minute at a time).

 

API Parameters in Depth

count

When "?count=" is provided, the response will contain up to count results. Because results are not sorted, you are likely to get different data for repeated requests.

sample

When "?sample=" is provided, a sample of matching records is returned. If "sample=0.1", approximately 10% of records will be returned. Sampling is random: repeated calls will not only return different records, but likely will also vary slightly in number of returned records.

When "?count=" is also specified, count is applied to the number of returned records, not the sampled records. So, with "sample=0.05" and "count=7", when there is a total of 100 records available, approximately 5 will be returned. When there are 1,000 records, 7 will be returned. When there are 10,000 records, 7 will be returned.

fields

If "fields" are not specified, by default a limited set of fields will be returned. This default field set may change at any time. The full list of all available fields can be found here:

https://api.cloudflare.com/client/v4/zones/<zone_tag>/logs/received/fields

Fields are passed to the request as a comma separated list. So, to have "ClientIP" and "RayID", use:

fields=ClientIP,RayID

The order in which fields are specified doesn't matter, and the order of fields in the response is not specified.

Using Bash subshell and jq, you can download the logs with all available fields without manually copying and pasting the fields into the request like so:

curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received?start=2017-07-18T22:00:00Z&end=2017-07-18T22:01:00Z&count=1&fields=$(curl -s -H "X-Auth-Email: [email protected]" -H "X-Auth-Key: banana12345" "https://api.cloudflare.com/client/v4/zones/4f8g90r42275bce56f242596aic6fc5b/logs/received/fields" | jq '. | to_entries[] | .key' -r | paste -sd "," -)"

Currently available fields (as of May 2018):

"CacheCacheStatus": "unknown | miss | expired | updating | stale | hit | ignored | bypass | revalidated", 
"CacheResponseBytes":
"Number of bytes returned by the cache",
"CacheResponseStatus":
"HTTP status code returned by the cache to the edge: all requests (including non-cacheable ones) go through the cache: also see CacheStatus field",
"CacheTieredFill":
"Tiered Cache was used to serve this request (beta)",
"ClientASN":
"Client AS number",
"ClientCountry":
"Country of the client IP address",
"ClientDeviceType":
"Client device type",
"ClientIP":
"IP address of the client",
"ClientIPClass":
"Client IP class",
"ClientRequestBytes":
"Number of bytes in the client request",
"ClientRequestHost":
"Host requested by the client",
"ClientRequestMethod":
"HTTP method of client request",
"ClientRequestProtocol":
"HTTP protocol of client request",
"ClientRequestReferer":
"HTTP request referrer",
"ClientRequestURI": "
URI requested by the client",
"ClientRequestUserAgent":
"User agent reported by the client",
"ClientSSLCipher":
"Client SSL cipher",
"ClientSSLProtocol":
"Client SSL (TLS) protocol",
"ClientSrcPort":
"Client source port",
"EdgeColoID":
"Cloudflare edge colo id",
"EdgeEndTimestamp":
"Unix nanosecond timestamp the edge finished sending response to the client",
"EdgePathingOp":
"Indicates what type of response was issued for this request (unknown = no specific action)",
"EdgePathingSrc":
"Details how the request was classified based on security checks (unknown = no specific classification)",
"EdgePathingStatus":
"Indicates what data was used to determine the handling of this request (unknown = no data)",
"EdgeRateLimitAction":
"The action taken by the blocking rule; empty if no action taken (beta)",
"EdgeRateLimitID":
"uint64: The internal rule ID of the rate-limiting rule that triggered a block (ban) or simulate action. 0 if no action taken. (beta)",
"EdgeRequestHost":
"Host header on the request from the edge to the origin (beta)",
"EdgeResponseBytes":
"Number of bytes returned by the edge to the client",
"EdgeResponseCompressionRatio":
"Edge response compression ratio",
"EdgeResponseContentType":
"Edge response Content-Type header value (beta)",
"EdgeResponseStatus":
"HTTP status code returned by Cloudflare to the client",
"EdgeServerIP": "IP of the edge server making a request to the origin (beta)",
"EdgeStartTimestamp":
"Unix nanosecond timestamp the edge received request from the client",
"OriginIP":
"IP of the origin server",
"OriginResponseBytes":
"Number of bytes returned by the origin server",
"OriginResponseHTTPExpires":
"Value of the origin 'expires' header in RFC1123 format",
"OriginResponseHTTPLastModified":
"Value of the origin 'last-modified' header in RFC1123 format",
"OriginResponseStatus":
"Status returned by the origin server",
"OriginResponseTime":
"Number of nanoseconds it took the origin to return the response to edge",
"OriginSSLProtocol":
"SSL (TLS) protocol used to connect to the origin (beta)",
"RayID":
"ID of the request",
"SecurityLevel":
"The security level configured at the time of this request. This is used to determine the sensitivity of the IP Reputation system",
"WAFAction":
"Action taken by the WAF, if triggered",
"WAFFlags":
"Additional configuration flags: simulate (0x1) | null",
"WAFMatchedVar":
"The full name of the most-recently matched variable",
"WAFProfile":
"WAF profile: low | med | high",
"WAFRuleID":
"ID of the applied WAF rule",
"WAFRuleMessage":
"Rule message associated with the triggered rule",
"ZoneID":
"Internal zone ID"

timestamps

By default timestamps in responses are returned as Unix nanosecond integers. The timestamps= query parameter can be set to change the format in which response timestamps are returned. Possible values are: unix, unixnano, rfc3339. Note: unix and unixnano return timestamps as integers; rfc3339 returns timestamps as strings.

Response

Data is timestamped by time it was written to disk at our log aggregation point.

Response data is returned in json, 1 json object (1 log message) per line. Sample log message with default fields:

{
"ClientIP": "89.163.242.206",
"ClientRequestHost": "www.theburritobot.com",
"ClientRequestMethod": "GET",
"ClientRequestURI": "/static/img/testimonial-hipster.png",
"EdgeEndTimestamp": 1506702504461999900,
"EdgeResponseBytes": 69045,
"EdgeResponseStatus": 200,
"EdgeStartTimestamp": 1506702504433000200,
"RayID": "3a6050bcbe121a87"
}

Estimating daily data volumes

To quickly estimate the amount of data for a zone per day (the number of log lines and the amount of bytes they take up), request a 1 in a 1000 sample of data for a 24h period; note that start=2017-09-10T00:00:00Z and end=2017-09-11T00:00:00Z span a 24h period, and sample=0.001.

curl -s -H"X-Auth-Email: MONKEY" -H"X-Auth-Key: BANANA" \
    "https://api.cloudflare.com/client/v4/zones/<ZONE_TAG>/logs/received?start=2017-09-10T00:00:00Z&end=2017-09-11T00:00:00ZZ&sample=0.001" \
    >sample.log
...
$ wc -l sample.log
47146 sample.log
...
$ ls -lh sample.log
-rw-r--r-- 1 mik mik 15M Sep 11 18:56 sample.log

Based on this information, approximate number of messages /day is 47,146,000, and the byte size is 15GB. The size estimate is based on the default response field set; changing the response field set (see the "fields" section) will change the response size.

To get a good estimate of daily traffic it is best to query on entire 24h. If the response size is too small or to large, adjust the sample value, not the time range.

Compression

Responses are compressed by default (gzip). Curl transparently decompresses responses, unless called with -H"accept-encoding: gzip". In that case, output remains gzipped. You should expect compressed data to be about 5-10% of uncompressed. This means that with the 1GB response size limit (uncompressed) compressed responses will be 50-100MB.

Beta

Features and fields marked as "beta" are still being tested and validated. They may be removed without notice.

Still not finding what you need?

The Cloudflare team is here to help. 95% of questions can be answered using the search tool, but if you can’t find what you need, submit a support request.

Powered by Zendesk