Integrating with your Data Warehouse or Data Lake

Who can use this feature?
- Available with Data Direct.
- Requires an admin or architect role to configure. 

Fullstory ingests a massive amount of raw data from your site and mobile app, cleans it, and structures it into analytics-ready event data. Using out-of-the-box integrations with Data Direct, you can send this data to warehouses like Google BigQuery, Snowflake, or Amazon Redshift or cloud storage destinations like Amazon S3, Google Cloud Storage or Azure Blob Storage without building custom ETL pipelines.

This allows you to merge Fullstory's behavioral insights with data from your other essential business platforms to get a complete picture of your business's health. Every analyst at your company can explore Fullstory data directly through their preferred SQL interface or existing Business Intelligence tool, empowering them to perform robust analyses and model KPIs. 

How does it work?

Fullstory manages a regular sync process that pushes all captured user events directly into your warehouse. We handle all of the logic around retries and deduplication to ensure that you always have the most accurate data at your fingertips.

Enable your Integration 

Use the documentation below for setup steps specific to your warehouse or cloud storage location. 

After you've completed the setup and enabled your Destination under,  Settings > Integrations > Destinations, the integration will push data to your destination for as long as it remains active. You may repeat these steps to enable different destinations if needed. 

Explore the data schema

See Fullstory's Developer Documentation for an overview of the data schema. 

Fullstory's approach to data

Built to Support Planned Workflows

Data Direct is designed for customers with dedicated analytics support who want to automate their data ingestion and reporting process. This feature will only sync data captured when the integration is active and will not sync data retroactively. To ingest all the data Fullstory processes, you should activate a destination when installing the Fullstory JS tag or SDK.

Future Proof Schemas

Fullstory has designed our data model around an expectation for two main types of data: required fields and conditional fields. For fields that are required on every event, we have extracted those fields into a dedicated top-level column. Our goal is to avoid sparse data in the raw table, so any field that is conditional to the event type is packed into the source_properties or event_properties columns as JSON objects. This data model allows Fullstory to add new event_types and properties without having to run costly data migrations every time a new event is defined. Fullstory will maintain a consistent and future-proof data model while enabling customers to handle denormalization and aggregation downstream.

Moving Analysis to the View layer

Our approach to supporting a future-proof data model means we push all of the data Fullstory captures into a pre-defined set of columns. This proves efficient from a data processing standpoint and sets clear expectations around the data contract. That said, it has some implications when approaching queries. Specifically, Fullstory's underlying events table must be unpacked and denormalized into analytics-ready models to simplify querying, especially for BI tools.

To help with this process, Fullstory has published a set of DBT SQL transformations that you can deploy in your warehouse. These views will not solve every use case possible, but they aim to provide context around how the data can be transformed and explain how many metrics within Fullstory can be calculated.

Event Source Info

Every event now contains contextual information about where that event came from and how the event was passed into the Fullstory system. To keep your custom event payload from overloading with these extra parameters, we've implemented support for Source Properties. This will make it transparent where the event came from (web, app, or server integration) and which Fullstory API endpoints were invoked to pass the event. You can easily QA your instrumentation by performing data quality checks against your other systems.

See the developer documentation for a description of the source information.

User Info

User data, such as the user ID or custom user properties you pass through FS.identify, is available in the Data Direct event data. All user data can be found within identify event types. This user information can be joined back onto all events using the device_id column. While this approach requires an additional join, it provides better support for sessions identified out of chronological order. If a session is identified after being synced to the warehouse, we emit one new identify event rather than updating existing records for that session.

Defined Events, Pages & Elements

Fullstory now includes lookup tables in the warehouse for the defined events, pages, and elements you create in the Fullstory app so that you do not need to recreate these definitions in SQL logic.

The definition tables will be updated on each sync to reflect the current name and description set in the UI. As events are captured and processed by Fullstory's event stream, we will evaluate the definitions of all active data objects at that point in time and enrich the event data with all matching object IDs. This object ID can be used to join back to the corresponding lookup tables for names and descriptions. The object ID can also be used to group and filter event data for analysis. For example, you could query for unique visits to your checkout page or all click interactions with your "Sign Up" element.

Schema for Object Definitions

Object Lookup Table Object ID Schema in Events
Events event_definitions

Most specific matching Event (used in Fullstory UI analysis):

event_properties.event_definition_id

Additional IDs are available in the data to query other matching objects:

event_properties.additional_event_definition_ids

Pages

page_definitions

Fullstory supports one and only one active page definition per URL

source_properties.page_definition_id
Elements element_definitions

Most specific matching Element (used in Fullstory UI analysis). Element matching is available for all event_types that support the `target` object:

event_properties.target.element_definition_id

Additional IDs are available in the data to query other matching objects:

event_properties.target.additional_element_definition_ids

 

Due to the nature of the event stream, object IDs are only enriched from when they were created forward, similar to how Fullstory reports activity in Journeys. This means that changing the definition of an existing object will only match applicable events from the time of that change forward. It will not apply retroactively to the historical data in your warehouse. See below for a diagram explaining the flow of how data is processed.

 

Example Data Flow Diagram for Defined Events

DXO Sync Flow Diagram@2x.png

These lookup tables may introduce a many-to-1 relationship between events and object definitions. To account for this we have separated the most specific match, the one that will be used for analysis in the Fullstory UI, from all other matching object definitions.

  • For defined events and defined elements you will also see fields in the data for additional_event_definition_ids and additional_element_definition_ids respectively. Since multiple defined events can point to the same underlying event and multiple defined elements can point to the same underlying selector, we expose these so you are able to run different types of analyses in the warehouse from the behavior we apply in the Fullstory UI.
  • For the Fullstory UI we choose the object with the most specific definition that matches the data, then we aggregate the data based on that object alone (ignoring the additional matches to avoid duplicate counts). The ID that you see in event_definition_id and element_definition_id will reflect that most-specific ID that will be used for reporting in the Fullstory UI
  • Multiple defined pages cannot point to the same URL

Note: This feature is available by default for all customers who activate Data Direct after 9/11/23. If you activated Data Direct prior to that date, please reach out to support to have this feature enabled.

FAQ

How often will data be synced between Fullstory and my data warehouse of choice?

The Fullstory pipeline syncs to your integrated warehouse hourly when your integration is active and correctly configured. Each sync contains the events processed by Fullstory in the prior hour. Due to the distributed nature of the Internet and Fullstory's capture pipeline, a complete set of events are not guaranteed to be available every hour. Any missing events will be reconciled automatically during subsequent syncs. We expose both event_time and updated_time columns in the data that can be used to tune your queries as needed.

Are there any plans to add other warehouses in the future?

Yes, we plan to add more warehouses based on customer demand. If there's a destination you'd like added, let us know.

I see more event types in the UI's event filter list than in the data. Is something missing from the data?

No, All captured data is included in the Data Direct Sync. Some of the "events" that are shown in the UI are derived from the same underlying event_type. For example, "Refreshed URL" in the UI is a navigate event_type with event_properties:navigate_reason= 'reload' in the warehouse.

There are several events that have a NULL event_type. What are those records representing?

Fullstory's data capture pipeline is heavily distributed, so there is no guarantee that events arrive (through our ingestion pipeline) in chronological order. As our pipeline processes these events, it associates them with a view_id and orders them chronologically. If an event is synced to the warehouse and then Fullstory receives a new event for the same view_id with an older timestamp, both events will be updated so they are in order. This will create two new events in the warehouse and null out the previously synced event.

Can we configure event types that we do not want to sync? For example, if we decide that we do not want "change" events, can we not have those sent?

Currently, Fullstory only supports syncing all captured events to your warehouse. Events that you are not interested in can easily be filtered out in your SQL analysis. We are exploring the possibility of adding event filters prior to the sync. If this is something you would be interested in, please reach out for more information.

I see two tables, one called events and another called raw_events but raw events is empty. Is that expected?

Yes, the raw_events table is used as a temporary staging table in the Snowflake data load process. This table may include some data while the hourly load process is running but should be empty at all other times. Consumers of the data can ignore this table and should only use the events table for operational queries.

Does Data Direct support a Bi-directional sync?

Not yet, but we would love to hear more about any use cases you've got in mind or any workflows that you'd like to enable.

I see a slight discrepancy between Fullstory UI metrics and warehouse data.

All timestamps in the warehouse are in UTC whereas the Fullstory UI localizes metrics based on your browser's local timezone. To align these timezones, apply a timezone conversion from UTC to your local timezone in your SQL query.

Data Security Warning: If Fullstory receives a data deletion request from a customer, and that customer's data has already been exported to their warehouse, Fullstory will not be responsible for deleting the previously exported data, once it is out of Fullstory's possession. Once the data has been synced to a customer's system, it is the responsibility of the customer to ensure the data is secure and that they adhere to all privacy, security, and other applicable regulations.

Need to get in touch with us?

The Fullstory Team awaits your every question.

Ask the Community Technical Support