Using Google Cloud Platform to Analyze Cloudflare Logs

Introduction
This tutorial covers how to setup the process of obtaining Cloudflare logs via Cloudflare logshare API by using method "logpush" and utilizing GCP (Google Cloud Platform) components: Google Cloud Storage for storing the logs, importing raw data into Google BigQuery and Google Data Studio module for running visual reports. 

$500 GCP credit
Google Cloud is offering a $500 credit towards a new Google Cloud account to help you get started. In order to receive a credit, please follow these instructions.

Data Flow Diagram

Cloudflare_logpush_to_google_cloud_platform.png

Logpush

In order to enable logpush, Cloudflare Enterprise customers should contact their account manager and provide list of domains they wish to use logpush. Once flag is set, customers can use Cloudflare UI (Bottom of "Analytics" tab) or Cloudflare logshare API to create a logpush job which will trigger to start pushing Cloudflare logs in json format directly into the bucket.

Here is an example of Cloudflare logshare API call to create a logpush job with listing all available fields and timestamp in rfc3339 format.

 

curl -s -H "X-Auth-Email: cloudflare_email" -H "X-Auth-Key: XXX" -XPOST https://api.cloudflare.com/client/v4/zones/ZONE_ID/logpush/jobs -d'{"job_name":"domain.com","is_enabled":true,"destination_conf":"gs://bucket_name/{DATE}/","logpull_options":"fields=CacheCacheStatus,CacheResponseBytes,CacheResponseStatus,CacheTieredFill,ClientASN,ClientCountry,ClientDeviceType,ClientIP,ClientIPClass,ClientRequestBytes,ClientRequestHost,ClientRequestMethod,ClientRequestProtocol,ClientRequestReferer,ClientRequestURI,ClientRequestUserAgent,ClientSSLCipher,ClientSSLProtocol,ClientSrcPort,EdgeColoID,EdgeEndTimestamp,EdgePathingOp,EdgePathingSrc,EdgePathingStatus,EdgeRateLimitAction,EdgeRateLimitID,EdgeRequestHost,EdgeResponseBytes,EdgeResponseCompressionRatio,EdgeResponseContentType,EdgeResponseStatus,EdgeServerIP,EdgeStartTimestamp,OriginIP,OriginResponseBytes,OriginResponseHTTPExpires,OriginResponseHTTPLastModified,OriginResponseStatus,OriginResponseTime,OriginSSLProtocol,ParentRayID,RayID,SecurityLevel,WAFAction,WAFFlags,WAFMatchedVar,WAFProfile,WAFRuleID,WAFRuleMessage,WorkerCPUTime,WorkerStatus,WorkerSubrequest,WorkerSubrequestCount,ZoneID&timestamps=rfc3339"}' | jq .

where

"ZONE_ID" can be found in Cloudflare Overview tab,

"job_name" can be used domain name,

"bucket-name" name of the Google Cloud Storage bucket, where Cloudflare logs will be pushed into. 

Currently logpush supports GCP Storage bucket and AWS S3 bucket. Make sure you have a GCS bucket created and the right permissions set:

1. create a GCS bucket

2. In Storage > Browser > Bucket > Permissions, add the member [email protected] with Storage Object Admin permissions.

3. Make sure no other member has read permissions for the bucket

Please note, the logs will be pushed into the bucket every 1 minute.  

 

Googel_Cloud_Storage_Bucket_Cloudflare_logpush.png

 

Creating the Cloud Function

Once the GCS bucket is being populated with data, next step is to create a Cloud Function, which will import Cloudflare logs from GCS bucket into BigQuery (highly scalable cloud database with super-fast SQL queries). The Cloud Function is triggered whenever there is a new Cloudflare log file is uploaded to Google Cloud Storage Bucket.

To implement Cloud Function we are using Cloud Shell. Google Cloud Shell is an interactive shell environment for Google Cloud Platform. It makes it easy for you to manage your projects and resources without having to install the Google Cloud SDK and other tools on your system.

Googel_Cloud_Shell.png

Cloud Function files are stored in Google Storage bucket ("stage-bucket"). It should be a separate bucket. So you will need to have two buckets: 

1. Bucket with Cloudflare logs ("trigger-resource") - already created for logpush

2. Bucket for storing Cloud Function script files ("stage-bucket")

 

Clone GCS-To-Big-Query to your Cloud Shell:

git clone https://github.com/cloudflare/GCS-To-Big-Query

Go to this folder GCS-To-Big-Query:

cd GCS-To-Big-Query/

   
Config.json specifies the BigQuery dataset name (the name can be anything) and table name (will be auto-created in BigQuery or you can define your name by modifying config.json) that will be used to import the data from log files stored at GCP Storage bucket. So, please update it accordingly. 

Example of a proper config.json file:

configjson_example.png

Then run the following command in Cloud Shell to create Google Cloud Function:

gcloud beta functions deploy <name of the cloud function> --trigger-resource=<trigger-bucket-name> --trigger-event google.storage.object.finalize --entry-point=jsonLoad --source=<path to gcsToBigQuery repository on your workstation> --stage-bucket=<gs://gcs-bucket> 

where

trigger-resource - GCP Storage bucket which is used to upload Cloudflare log files.
stage-bucket - GCP Storage bucket which will be used to store and run cloud functions files.
entry-point - hardcoded value is "jsonLoad".

!Please note, that the trigger-resource (storage bucket) should not be the same as the stage bucket. 

Example,

gcloud beta functions deploy cflogs-cloud-function --trigger-resource=cloudflare_logs_camiliame --trigger-event google.storage.object.finalize --source=. --stage-bucket=gs://cf-script-cloud-function --entry-point=jsonLoad

 

Cloud_Function_Cloudflare_logs.png

After successful deployment of Cloud Function under GCP BigQuery a predefined table should be now created and populated with the log data every 1 minute. Now you can query any requests or visualize data with Data Studio or other 3rd party analytics tool, which supports BigQuery as an input source. 

Google_BigQuery.png

 

Analyzing data in Data Studio

To analyze and visualize logs you can use Data Studio or any other 3rd party services. Data Studio allows you within few simple steps generate graphs and charts from BigQuery table as an input data source. These reports have option to refresh the data and get real-time analytics. 

We have created a Cloudflare logs insights template which is now part of the Data Studio Report Gallery

Below is an example of reports built in Data Studio.

Cloudflare_logs_Data_Studio.png

 

Step 1 

In order to create a report from this template, click on button "Use Template".

Click_Template.png

 

Step 2 

On appeared pop-up window click on drop down menu: 

Dropdown.png

and select option "Create New Data Source":

create_new_data_source.png

 

Step 3 

Click on "Select" under Big Query tile:

Select_BigQuery.png

 

Step 4

Under "My projects" select your project following up with selection of created Big Query Data Set, Table and click button "Connect":

Connect.png

 

Step 5 

Make sure that you went through all the fields and updated field types as shown below. 

Fields 1 through 16:

1-16.png

Fields 17 through 32:

17-32.png

 

From the field #34 "EdgeStartTimestamp", by clicking three vertical dots next to it, please create a duplicate field  "Copy of Edge StartTimeStamp" with Date and Hour type:

Create a duplicate field:

34_timestamp_duplicate.png

Select "Date Hour (YYYYMMDDHH)" type:

Date_and_Hour.png

For the field #45 "Client Country" please select "Country" type:

Client_Country.png

 

Fields 33 through 48: 

33-48.png

 

Fields 49 through 56:

49-56.png

Once the fields type being updated click "Add to Report" button:

Add_to_report.png

 

Step 6 

Almost all reports should be available right away except three reports (Page 2 - "Status Codes Last 24 hours" and Page 5 - "Pathing Statuses" and "WAF Actions") where "Copy of Edge StartTimeStamp" is used.

For fixing these three reports, please select this report only and on the appeared menu on the right side of the screen under section "Data" - Dimension, click on "invalid Dimension" and search/type for "Copy of EdgeStartTimeStamp". 

Fix_report.png

This is it. All reports should be now available. 

 

 

Not finding what you need?

95% of questions can be answered using the search tool. This is the quickest way to get a response.

Powered by Zendesk