Using Google Cloud Platform to Analyze Cloudflare Logs

This tutorial covers how to setup the process of obtaining Cloudflare logs via Cloudflare API pull request and utilizing GCP (Google Cloud Platform) components like Google Cloud Storage for storing the logs, importing data into Google BigQuery and running visual reports by using Google Data Studio module. 

$500 GCP credit
Google Cloud is offering a $500 credit towards a new Google Cloud account to help you get started. In order to receive a credit, please follow these instructions.

Cloudflare Enterprise customers have two options how to set this process up:
1.Manual setup for obtaining Cloudflare logs on demand 
2.Automated setup for obtaining Cloudflare logs on a regular basis which uses a cronjob task. Please note that we have recently 

Data Flow Diagram


Obtaining Data Manually

This setup is good for occasional check of logs and done on demand.

Requirements and Prerequisites

Install GoLang
Please make sure on your workstation or VM you have installed Golang 1.7+ and higher. We suggest to use the latest version of Golang which is 1.9. Download here

Working with Google Cloud

  • Select or create a Google Cloud Platform Project:
  • For working with Google Cloud Project you will need to install Google Cloud SDK which you can download here:  
  • Make sure you have configured and enabled Google Billing profile by following instructions here:  
  • Make sure you have have enabled Google API for the following components here:

         - Google Cloud Storage
         - Google BigQuery
         - Cloud Function:

After successful Google Cloud SDK installation run the following command to initialize it. Please, make sure you are running latest version of Google SDK($gcloud components update):

./google-cloud-sdk/bin/gcloud init
  • Configure Google Application Default Credentials
    Script Logshare-cli will pull logs from Cloudflare and push them into your GCS bucket. This requires authentication between your CLI (where your script is being run from) and GCS. Follow the instructions here to authenticate

Configuring Google Application Default Credentials will open your default browser (the email address you are logged into GCP) to authenticate against the Google Cloud SDK. Run the following command to get started:

gcloud auth application-default login

Building the Environment
The process is divided into two phases:

  • In the first phase, we will be setting up a Cloud Function which task is to import data from the existing GCP Storage bucket into BigQuery table. This Cloud Function is triggered whenever there is a new Cloudflare log file is uploaded to Google Cloud Storage Bucket.
  • In the second phase, we will use Logshare script which will download via Cloudflare API the log files in JSON format and upload them to predefined Google Cloud Storage bucket

Phase 1: Creating the Cloud Function
Cloud Function will be deployed in Google Cloud Storage and will be triggered every time new log file uploaded to the storage bucket. So, data from log files are imported into BigQuery table. 

Clone GCS-To-Big-Query to your local workstation:

git clone

Go tho this folder GCS-To-Big-Query. Config.json specifies the BigQuery dataset name (can be anything) and table name (will be auto-created in BigQuery) that will be used to import the data from log files stored at GCP Storage bucket. Please update it accordingly. 

Example of a proper config.json file:


Then run the following sdk command from your workstation to create Google Cloud Function:

gcloud beta functions deploy <name of the cloud function> --trigger-resource <trigger-bucket-name> 
--trigger-event --entry-point jsonLoad
--source=<path to gcsToBigQuery repository on your workstation> --stage-bucket <gs://gcs-bucket>


trigger-resource - GCP Storage bucket will be used to upload Cloudflare log files.
stage-bucket - GCP Storage bucket will be used to store and run cloud functions files.
entry-point - hardcoded value is "jsonLoad"

!Please note, that the trigger-resource (storage bucket) should not be the same as the stage bucket:

gcloud beta functions deploy cflogs-cloud-function --trigger-resource cloudflare_logs_camiliame 
--trigger-event --source=. --stage-bucket gs://cf-script-cloud-function

Once you have successfully created Cloud Function, you can move to the second phase.

Phase 2: Importing Cloudflare logs into Google Cloud Storage
Go to github and Install Cloudflare Logshare script by running the following command:

go get…

In order to run logshare-cli you will need to have the following information ready:

  • Cloudflare user account API Key (api-key)
  • Cloudflare user account email address (api-email)
  • Domain name (zone-name)
  • The timestamp in Unix seconds by default or can be requested in rfc3339 (specifies time zone) to request logs from (start-time). Defaults to 30 minutes behind the current time.  
  • The timestamp (unix/rfc3339)to request logs to (end-time). Defaults to 20 minutes behind the current time.

*Google Data Studio reports were built with timestamp in rfc3339 format. Please make sure you use this format as well otherwise you might get an error. 

You could also use Instead of end-time the count which the number (count) of logs to retrieve. Pass '-1' to retrieve all logs for the given time period (default: 1).

For more options please refer to “GLOBAL OPTIONS” under section “Available Options” here

logshare-cli --api-key=<api-key> --api-email=<email> --zone-name=<zone-name> 
--start-time <ts> --count <count> --google-storage-bucket=<trigger-bucket>


logshare-cli --api-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx [email protected] --start-time=2018-01-18T22:00:00Z --count=100
--google-storage-bucket=cloudflare_logs_camiliame --google-project-id=google-project-111111

The "logs/received" REST API route exposes data by time received (the time the event was written to disk in our log aggregation system). This is in contrast to our "logs/requests" API documented here, which orders logs by request time. Ordering by log aggregation time instead of log generation time results in lower (faster) log share pipeline latency and deterministic log pulls. Functionally, it is similar to tailing a log file, or reading from rsyslog (albeit in chunks).

This means that if you want to obtain logs for a given time range, you can do so by issuing one call for each consecutive minute (or other time range). Because log lines are batched by time received and made available, there is no "late arriving data." A response for a given minute will never change. You do not have to repeatedly poll a given time range to receive logs as they converge on our aggregation system.

After running the command successfully you should receive something like:

[logshare-cli] 18:41:59 Bucket cloudflare_logs_camiliame already exists.
[logshare-cli] 18:42:03 HTTP status 200 | 3865ms |
[logshare-cli] 18:42:03 Retrieved 100 logs

Under defined GCP Storage bucket you will be able to find newly uploaded log file.


And under GCP BigQuery predefined table should be created and populated with the log data.


Now you can run queries against the table to pull required data for analysis and monitoring purposes.

Please note that log file by default contains only the following fields:

  • EdgeStartTimestamp
  • EdgeResponseStatus
  • EdgeResponseBytes
  • EdgeEndTimestamp
  • ClientRequestURI
  • ClientRequestMethod
  • RayID
  • ClientRequestHost
  • ClientIP

For adding additional fields you will need to specify each field individually under --fields in your logshare-cli command including default fields as well.

In order to have all fields in the log file you can use the following command example:

logshare-cli --api-key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx [email protected] --start-time=2018-01-18T22:00:00Z --count=100
--fields CacheCacheStatus,CacheResponseBytes,CacheResponseStatus,ClientASN,
WAFRuleID,WAFRuleMessage,ZoneID --google-storage-bucket=cloudflare_logs_camiliame

Full list of all fields with description is listed here:
"CacheCacheStatus": "unknown | miss | expired | updating | stale | hit | ignored | bypass | revalidated",
"CacheResponseBytes": "Number of bytes returned by the cache",
"CacheResponseStatus": "HTTP status code returned by the cache to the edge: all requests (including non-cacheable ones) go through the cache: also see CacheStatus field",
"ClientASN": "Client AS number",
"ClientCountry": "Country of the client IP address",
"ClientDeviceType": "Client device type",
"ClientIP": "IP address of the client",
"ClientIPClass": "Client IP class",
"ClientRequestBytes": "Number of bytes in the client request",
"ClientRequestHost": "Host requested by the client",
"ClientRequestMethod": "HTTP method of client request",
"ClientRequestProtocol": "HTTP protocol of client request",
"ClientRequestReferer": "HTTP request referrer",
"ClientRequestURI": "URI requested by the client",
"ClientRequestUserAgent": "User agent reported by the client",
"ClientSSLCipher": "Client SSL cipher",
"ClientSSLProtocol": "Client SSL protocol",
"ClientSrcPort": "Client source port",
"EdgeColoID": "Cloudflare edge colo id",
"EdgeEndTimestamp": "Unix nanosecond timestamp the edge finished sending response to the client",
"EdgePathingOp": "Indicates what type of response was issued for this request (unknown = no specific action)",
"EdgePathingSrc": "Details how the request was classified based on security checks (unknown = no specific classification)",
"EdgePathingStatus": "Indicates what data was used to determine the handling of this request (unknown = no data)",
"EdgeResponseBytes": "Number of bytes returned by the edge to the client",
"EdgeResponseCompressionRatio": "Edge response compression ratio",
"EdgeResponseStatus": "HTTP status code returned by Cloudflare to the client",
"EdgeStartTimestamp": "Unix nanosecond timestamp the edge received request from the client",
"OriginIP": "IP of the origin server",
"OriginResponseBytes": "Number of bytes returned by the origin server",
"OriginResponseHTTPExpires": "Value of the origin 'expires' header in RFC1123 format",
"OriginResponseHTTPLastModified": "Value of the origin 'last-modified' header in RFC1123 format",
"OriginResponseStatus": "Status returned by the origin server",
"OriginResponseTime": "Number of nanoseconds it took the origin to return the response to edge",
"RayID": "Ray ID of the request",
"WAFAction": "Action taken by the WAF, if triggered",
"WAFFlags": "Additional configuration flags: simulate (0x1) | null",
"WAFMatchedVar": "The full name of the most-recently matched variable",
"WAFProfile": "WAF profile: low | med | high",
"WAFRuleID": "ID of the applied WAF rule",
"WAFRuleMessage": "Rule message associated with the triggered rule",
"ZoneID": "Internal zone ID"


Obtaining Data Automatically
This setup is good for monitoring requests in real time. 

To automate the process of obtaining Cloudflare logs in predefined intervals eg. 1 min which is the default interval, 5 min, 30 min, 1 hour, 1 day, etc. please follow the following process.

Automated Process obtaining Cloudflare access logs

The script below is using different Google Cloud modules (Google Cloud Compute, Storage, CloudFunction,BigQuery). For github instructions please click here
It will execute the following:

  • Create VM micro-instance under Google Compute Engine and install all necessary components like GO library, curl, python, etc.
  • Create bucket under Google Cloud Storage to store and run cloud function files.
    Create another bucket under Google Cloud Storage to upload Cloudflare ENT access logs in JSON format.
  • Create Cloud Function which imports Cloudflare access logs from the bucket into BigQuery. Cloud Function is triggered every time new log file is uploaded into the bucket.
  • Create a cronjob on existing VM micro-instance which pulls Cloudflare access logs in repeated intervals (default interval is 1 minute) and uploads them to the bucket.
  • Create BigQuery dataset and table to process imported data.

Please follow the steps below to set up the whole process automatically: 

  1. Select or create a Google Cloud Platform Project
  2. Clone the GCS Automation Script on your local machine:
    git clone
  3. Enable the Service Management API
    Select the Project you are working on and enable the API here
  4. Make sure you have configured and enabled Google Billing profile by following instructions here  
  5. Make sure you have have enabled Google API for the following components here:
     - Google Cloud Storage,
     - Google BigQuery,
     - Cloud Function
  6. Create a copy of default.config.json and rename to config.json
    mv config.default.json config.json
  7. Modify config.json with your cloudflare account details:
     - Cloudflare_api_key - Cloudflare API Key
     - Cloudflare_api_email - Cloudflare user account email address
     - Zone_name - Domain name, example
     - Gcs_project_id - Google Cloud Project ID
  8. Run the main orchestration script:

Please allow 5-10 minutes for the VM and other components to be setup and configured.

After running the command successfully you should receive something like:
Python 2.7.11
Updates are available for some Cloud SDK components. To install them,
please run:
$ gcloud components updateGCloud SDK already Installing. Skipping init configuration.
Updated property [core/project].

Updates are available for some Cloud SDK components. To install them,
please run:
$ gcloud components update

Creating gs://cf-els-vm-setupfiles-17415/...
Copying file://config.json [Content-Type=application/json]...
Copying file:// [Content-Type=application/x-sh]...
- [2 files][ 4.5 KiB/ 4.5 KiB]
Operation completed over 2 objects/4.5 KiB.
Creating VM...

Created [].
logshare-cli-cron-17415 us-central1-a f1-micro RUNNING

Successfully kicked off the VM provisioning steps. The VM takes between 4-6 minutes to fully provision.

If you are seeing any issues, please share them by submitting an issue to the repository. You can view the VM's startup script progress by tailing the syslog file:
tail -f /var/log/syslog


Script Monitoring
For monitoring the progress of the script, please SSH to your newly created VM micro-instance and use the following command:

tail -f /var/log/syslog

Please note that for simplicity the name of VM, Storage bucket, Cloud Function, BigQuery dataset and table contain the same number. In case of troubleshooting It helps to identify that all of these components belong to the same group.

Compute Engine VM micro-instance: logshare-cli-cron-17415
Storage Bucket: cf-els-vm-setupfiles-17415
Cloud Function: cflogs_upload_bucket_17415
BigQuery dataset: cloudflare_logs_17415
BigQuery table: cloudflare_els_17415


Analyzing data in Data Studio

To analyze and visualize logs you can use Data Studio or any other 3rd party services. Data Studio allows you within few simple steps generate graphs and charts from BigQuery table as an input data source. These reports have option to refresh the data and get real-time analytics. 

We have created a Cloudflare logs insights template which is now part of the Data Studio Report Gallery

Below is an example of reports built in Data Studio in Edit mode and final View mode.

Reports in Edit Mode


Reports in View Mode




Still not finding what you need?

The Cloudflare team is here to help. 95% of questions can be answered using the search tool, but if you can’t find what you need, submit a support request.

Powered by Zendesk