Who can use this feature?
- Available with Anywhere: Warehouse, an add-on for Business, Advanced, and Enterprise plans.
- Requires an Admin or Architect role to configure.
Note: Databricks syncs are supported for Databricks workspaces hosted on Google Cloud Platform (GCP) and Amazon Web Services (AWS). Support for other cloud providers may be added in the future.
Databricks is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. Fullstory customers with Anywhere: Warehouse can sync data to Databricks Unity Catalog with Ready to Analyze Views. This setup guide provides the following resources for getting started:
- Setting up Databricks (below)
- Setting up Databricks in Fullstory (below)
- Databricks developer documentation
- Ready to Analyze Views developer documentation
Setting up Databricks
Fullstory's Warehouse Data Sync for Databricks leverages Unity Catalog to manage data governance and Databricks SQL Warehouses to execute the loading processes.
The architecture utilizes Workload Identity Federation, allowing Fullstory to securely write data to a staging area and then load it into your Delta Lake managed tables without the need for long-lived static secrets.
Setting up Account level resources
-
Create a Service Principal.
In your Databricks Account Console, create a new Service Principal. This identity will be used by Fullstory to authenticate and perform sync operations. -
Create a Federation Policy for the Service Principal.
To allow Fullstory to authenticate as your Service Principal, you must create a federation policy using the following configuration:- Issuer:
https://accounts.google.com - Subject:
- NA customers:
116984388253902328461 - EU customers:
107589159240321051166
- NA customers:
- Audience:
fullstory.com
- Issuer:
-
Create a Workspace (optional).
If you wish to use an existing workspace, skip this step. Otherwise, create a new workspace.
Note down your workspace URL (e.g.,https://xxx.x.gcp.databricks.com). -
Assign Service Principal to the Workspace.
Add the Service Principal created in Step 1 as aUserof the workspace that you'd like to use.
Setting up Workspace level resources
Navigate to your workspace using the workspace URL and perform the following actions.
-
Identify or create your SQL Warehouse.
Select an existing SQL warehouse or create a new one in your workspace. Note down the Warehouse ID.
-
Identify or create your Catalog.
Select an existing Catalog or create a new one in your workspace. Note down the Catalog Name.
-
Configure Unity Catalog Permissions.
For the Catalog you wish to use, assign the following permissions to your Service Principal:-
USE CATALOG -
USE SCHEMA -
CREATE SCHEMA -
CREATE TABLE
-
-
Set up Storage Credentials.
Fullstory uses a cloud storage staging layer to load data into Databricks. The setup differs depending on your cloud provider. Expand the section below that matches your Databricks deployment.GCP - Set up Storage Credentials
If your Databricks workspace is hosted on GCP, Fullstory uses GCP as a staging layer. To ensure Databricks can read these files, a Storage Credential is used.
- In Unity Catalog, create a new Storage Credential.
- Ensure you select GCP Service Account as the credential type.
-
Assign the
READ FILESpermission to your Service Principal. - Note down the Credential Name and the GCP Service Account email associated with it.
AWS - Set up Storage Credentials
If your Databricks workspace is hosted on AWS, Fullstory uses S3 as a staging layer. To ensure Databricks can read these files, an S3 bucket and a Storage Credential are used.
-
In Unity Catalog, create a new Storage Credential.
-
Ensure you select AWS IAM Role as the credential type.
-
After the credential is created, add the Trust
Policy from Databricks to the IAM Role.
-
Additionally, add the following Trust Policy
for Fullstory to the IAM Role.
Use the subject identifier for your region:
-
NA:
116984388253902328461 -
EU:
107589159240321051166
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "accounts.google.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "accounts.google.com:aud": "<audience for your Fullstory region from above>" } } } ] } -
NA:
-
After the credential is created, add the Trust
Policy from Databricks to the IAM Role.
-
Grant permissions to the IAM Role for your S3 bucket. Create an S3 bucket if one does not already exist, then grant the IAM Role access to the bucket following the instructions from Databricks.
{ "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket", "s3:GetBucketLocation", "s3:ListBucketMultipartUploads", "s3:ListMultipartUploadParts", "s3:AbortMultipartUpload" ], "Resource": ["arn:aws:s3:::<BUCKET>/*", "arn:aws:s3:::<BUCKET>"], "Effect": "Allow" }, { "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"], "Resource": ["arn:aws:kms:<KMS-KEY>"], "Effect": "Allow" }, { "Action": ["sts:AssumeRole"], "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"], "Effect": "Allow" } ] } -
Assign the
READ FILESpermission to your Service Principal. - Note down the Credential Name, S3 Bucket, AWS Region, and the IAM Role ARN associated with it.
Setting up Databricks in Fullstory
- Log in to Fullstory and navigate to Settings > Anywhere > Warehouse.
- Select Databricks from the list of available destinations.
-
Use the Choose your cloud provider selector to pick your provider (GCP or AWS), then fill in the following fields using the information noted from the above steps:
GCP AWS Workspace URL Workspace URL Service Principal Client ID Service Principal Client ID Warehouse ID Warehouse ID Catalog Name Catalog Name Storage Credential Name Storage Credential Name GCP Service Account Email S3 Bucket IAM role Amazon resource name (ARN) Region GCP Configuration for Databricks AWS Configuration for Databricks - Click Save.
Frequently Asked Questions
Does Fullstory use Databricks Delta Share?
No, Fullstory uses a traditional warehouse connection to sync data to Databricks, not Delta Share. With this approach:
- You own your data in your Databricks warehouse.
- You have full control over data retention and governance.
- You receive the same Ready to Analyze Views available for Snowflake, Redshift, and BigQuery.
What data does Fullstory sync to Databricks?
Fullstory syncs data using Ready to Analyze Views, which provides pre-built views optimized for analytics and business intelligence. This is the same data format used for Snowflake, Redshift, and BigQuery connections. For complete details about the data model, sample queries, and sync expectations, see the Ready to Analyze Views developer documentation.