This tutorial covers how to setup the process of obtaining Cloudflare logs via Cloudflare logshare API by using method "logpush" and utilizing GCP (Google Cloud Platform) components: Google Cloud Storage for storing the logs, importing raw data into Google BigQuery and Google Data Studio module for running visual reports.
$500 GCP credit
Google Cloud is offering a $500 credit towards a new Google Cloud account to help you get started. In order to receive a credit, please follow these instructions.
Data Flow Diagram
In order to enable logpush, Cloudflare Enterprise customers should contact their account manager and provide list of domains they wish to use logpush. Once flag is set, customers can use Cloudflare UI (Bottom of "Analytics" tab) or Cloudflare logshare API to create a logpush job which will trigger to start pushing Cloudflare logs in json format directly into the bucket.
Here is an example of Cloudflare logshare API call to create a logpush job with listing all available fields and timestamp in rfc3339 format.
"ZONE_ID" can be found in Cloudflare Overview tab,
"job_name" can be used domain name,
"bucket-name" name of the Google Cloud Storage bucket, where Cloudflare logs will be pushed into.
Currently logpush supports GCP Storage bucket and AWS S3 bucket. Make sure you have a GCS bucket created and the right permissions set:
1. create a GCS bucket
2. In Storage > Browser > Bucket > Permissions, add the member firstname.lastname@example.org with Storage Object Admin permissions.
3. Make sure no other member has read permissions for the bucket
Please note, the logs will be pushed into the bucket every 1 minute.
Creating the Cloud Function
Once the GCS bucket is being populated with data, next step is to create a Cloud Function, which will import Cloudflare logs from GCS bucket into BigQuery (highly scalable cloud database with super-fast SQL queries). The Cloud Function is triggered whenever there is a new Cloudflare log file is uploaded to Google Cloud Storage Bucket.
To implement Cloud Function we are using Cloud Shell. Google Cloud Shell is an interactive shell environment for Google Cloud Platform. It makes it easy for you to manage your projects and resources without having to install the Google Cloud SDK and other tools on your system.
Cloud Function files are stored in Google Storage bucket ("stage-bucket"). It should be a separate bucket. So you will need to have two buckets:
1. Bucket with Cloudflare logs ("trigger-resource") - already created for logpush
2. Bucket for storing Cloud Function script files ("stage-bucket")
Clone GCS-To-Big-Query to your Cloud Shell:
git clone https://github.com/cloudflare/GCS-To-Big-Query
Go to this folder GCS-To-Big-Query:
Config.json specifies the BigQuery dataset name (the name can be anything) and table name (will be auto-created in BigQuery or you can define your name by modifying config.json) that will be used to import the data from log files stored at GCP Storage bucket. So, please update it accordingly.
Example of a proper config.json file:
Then run the following command in Cloud Shell to create Google Cloud Function:
gcloud beta functions deploy <name of the cloud function> --trigger-resource=<trigger-bucket-name> --trigger-event google.storage.object.finalize --entry-point=jsonLoad --source=<path to gcsToBigQuery repository on your workstation> --stage-bucket=<gs://gcs-bucket>
trigger-resource - GCP Storage bucket which is used to upload Cloudflare log files.
stage-bucket - GCP Storage bucket which will be used to store and run cloud functions files.
entry-point - hardcoded value is "jsonLoad".
!Please note, that the trigger-resource (storage bucket) should not be the same as the stage bucket.
gcloud beta functions deploy cflogs-cloud-function --trigger-resource=cloudflare_logs_camiliame --trigger-event google.storage.object.finalize --source=. --stage-bucket=gs://cf-script-cloud-function --entry-point=jsonLoad
After successful deployment of Cloud Function under GCP BigQuery a predefined table should be now created and populated with the log data every 1 minute. Now you can query any requests or visualize data with Data Studio or other 3rd party analytics tool, which supports BigQuery as an input source.
Analyzing data in Data Studio
To analyze and visualize logs you can use Data Studio or any other 3rd party services. Data Studio allows you within few simple steps generate graphs and charts from BigQuery table as an input data source. These reports have option to refresh the data and get real-time analytics.
Below is an example of reports built in Data Studio.
In order to create a report from this template, click on button "Use Template".
On appeared pop-up window click on drop down menu:
and select option "Create New Data Source":
Click on "Select" under Big Query tile:
Under "My projects" select your project following up with selection of created Big Query Data Set, Table and click button "Connect":
Make sure that you went through all the fields and updated field types as shown below.
Fields 1 through 16:
Fields 17 through 32:
From the field #34 "EdgeStartTimestamp", by clicking three vertical dots next to it, please create a duplicate field "Copy of Edge StartTimeStamp" with Date and Hour type:
Create a duplicate field:
Select "Date Hour (YYYYMMDDHH)" type:
For the field #45 "Client Country" please select "Country" type:
Fields 33 through 48:
Fields 49 through 56:
Once the fields type being updated click "Add to Report" button:
Almost all reports should be available right away except three reports (Page 2 - "Status Codes Last 24 hours" and Page 5 - "Pathing Statuses" and "WAF Actions") where "Copy of Edge StartTimeStamp" is used.
For fixing these three reports, please select this report only and on the appeared menu on the right side of the screen under section "Data" - Dimension, click on "invalid Dimension" and search/type for "Copy of EdgeStartTimeStamp".
This is it. All reports should be now available.