Warehouse - Getting Started

Who can use this feature?
- Available with Anywhere: Warehouse, an add-on for Business, Advanced, and Enterprise plans.
- Requires an Admin or Architect role to configure.

Fullstory ingests a massive amount of raw data from your site and mobile app, cleans it, and structures it into analytics-ready event data. Using out-of-the-box integrations with Anywhere: Warehouse, you can send pre-transformed data to warehouses like Google BigQuery, Snowflake, or Amazon Redshift or you can send raw event data to cloud storage destinations like Amazon S3, Google Cloud Storage or Azure Blob Storage without building custom ETL pipelines.

This allows you to merge Fullstory's behavioral insights with data from your other essential business platforms to get a complete picture of your business's health. Every analyst at your company can explore Fullstory data directly through their preferred SQL interface or existing Business Intelligence tool, empowering them to perform robust analyses and model KPIs. 

How does it work?

Fullstory manages a regular hourly sync process that pushes all captured user events directly into your warehouse. We handle all of the logic around retries and deduplication to ensure that you always have the most accurate data at your fingertips.

Need data faster than hourly? Check out Anywhere: Activation. While Warehouse is comprehensive of all your behavioral data, Activation is designed to provide surgical, specific streams of high-value events or patterns.

Warehouse provides data in two formats:

  • Raw Data: normalized event data without transformations.
  • Ready to Analyze Views: pre-transformed data ready for analysis.

Setting up your data transfer

Fullstory customers with Anywhere: Warehouse will want to start with the appropriate Setup Guide for their appropriate data destination.

Destination Format Setup Guide Developer Docs Data Model
Amazon Redshift Ready to Analyze Views Setup Guide Developer Docs Data Model
Amazon S3 Raw Data Setup Guide Developer Docs Data Model
Azure Blob Storage Raw Data Setup Guide Developer Docs Data Model
BigQuery Ready to Analyze Views Setup Guide Developer Docs Data Model
Google Cloud Storage Raw Data Setup Guide Developer Docs Data Model
Snowflake Ready to Analyze Views Setup Guide Developer Docs Data Model

After you've completed the setup and enabled your Destination under Settings > Anywhere > Warehouse, the integration will push data to your destination for as long as it remains active. You may repeat these steps to enable different destinations if needed. 

Developer Resources

See Fullstory's developer documentation for an overview of the data schema for each respective format:

Additional resources, including sample queries, are available via the Setup Guide for each data destination.

Fullstory's approach to data

Built to Support Planned Workflows

Fullstory Anywhere is designed for customers with dedicated analytics support who want to automate their data ingestion and reporting process. This feature will only sync data captured when the integration is active and will not sync data retroactively. To ingest all the data Fullstory processes, you should activate a destination when installing the Fullstory JS tag or SDK.

Event Source Data

Every event now contains contextual information about where that event came from and how the event was passed into the Fullstory system. To keep your custom event payload from overloading with these extra parameters, we've implemented support for Source Properties. This will make it transparent where the event came from (web, app, or server integration) and which Fullstory API endpoints were invoked to pass the event. You can easily QA your instrumentation by performing data quality checks against your other systems.

See the developer documentation for a description of the source information:

User Data

User data refers to the user ID or custom user properties that you pass in through identity API methods, such as FS('setIdentity').

For Ready to Analyze Views destinations, you'll find a dedicated users table and user_properties table. You'll also find identify event types within the events table, just like you would for Raw Data destinations, as described below. However, with Ready to Analyze Views, the transformation is done automatically into the users table and user_properties table.

Raw Data destinations, user data is available in the Warehouse event data. All user data can be found within identify event types. This user information can be joined back onto all events using the device_id column. While this approach requires an additional join, it provides better support for sessions identified out of chronological order. If a session is identified after being synced to the warehouse, we emit one new identify event rather than updating existing records for that session.

Defined Events, Pages & Elements

Fullstory includes lookup tables in the warehouse for the defined events, pages, and elements you create in the Fullstory app so that you do not need to recreate these definitions in SQL logic.

The definition tables will be updated on each sync to reflect the current name and description set in the UI. As events are captured and processed by Fullstory's event stream, we will evaluate the definitions of all active data objects at that point in time and enrich the event data with all matching object IDs. This object ID can be used to join back to the corresponding lookup tables for names and descriptions. The object ID can also be used to group and filter event data for analysis. For example, you could query for unique visits to your checkout page or all click interactions with your "Sign Up" element.

To dive deeper into the schema for defined events, pages, and elements, see the sections for Ready to Analyze Views and Raw Data immediately below.

Schema for Ready to Analyze Views: Customer-defined labels

See Ready to Analyze Views - Customer-defined labels for full developer documentation.

Object Lookup Table Object ID Schema in Events
Events event_definitions

Most specific matching Event (used in Fullstory UI analysis):

events.event_definition_id

Additional IDs are available in the data to query other matching objects:

events.additional_event_definition_ids
Pages page_definitions

Fullstory supports one and only one active page definition per URL:

source_properties.page_definition_id
Elements element_definitions

Most specific matching Element (used in Fullstory UI analysis). Element matching is available for all event_types that support the `target` object:

events.target_element_definition_id

Additional IDs are available in the data to query other matching objects:

events.target_additional_element_definition_ids

Schema for Raw Data: Defined Objects

See Raw Data - Defined Objects for full developer documentation.

Object Lookup Table Object ID Schema in Events
Events event_definitions

Most specific matching Event (used in Fullstory UI analysis):

event_properties.event_definition_id

Additional IDs are available in the data to query other matching objects:

event_properties.additional_event_definition_ids
Pages page_definitions

Fullstory supports one and only one active page definition per URL:

source_properties.page_definition_id
Elements element_definitions

Most specific matching Element (used in Fullstory UI analysis). Element matching is available for all event_types that support the `target` object:

event_properties.target.element_definition_id

Additional IDs are available in the data to query other matching objects:

event_properties.target.additional_element_definition_ids

Example Data Flow Diagram for Defined Events

Due to the nature of the event stream, object IDs are only enriched from when they were created forward, similar to how Fullstory reports activity in Journeys. This means that changing the definition of an existing object will only match applicable events from the time of that change forward. It will not apply retroactively to the historical data in your warehouse. See below for a diagram explaining the flow of how data is processed.

DXO Sync Flow Diagram@2x.png  

These lookup tables may introduce a many-to-1 relationship between events and object definitions. To account for this we have separated the most specific match, the one that will be used for analysis in the Fullstory UI, from all other matching object definitions.

  • For defined events and defined elements you will also see fields in the data for additional_event_definition_ids and additional_element_definition_ids respectively. Since multiple defined events can point to the same underlying event and multiple defined elements can point to the same underlying selector, we expose these so you are able to run different types of analyses in the warehouse from the behavior we apply in the Fullstory UI.
  • For the Fullstory UI we choose the object with the most specific definition that matches the data, then we aggregate the data based on that object alone (ignoring the additional matches to avoid duplicate counts). The ID that you see in event_definition_id and element_definition_id will reflect that most-specific ID that will be used for reporting in the Fullstory UI.
  • Multiple defined pages cannot point to the same URL.

FAQ

How often will data be synced between Fullstory and my data warehouse of choice?

The Fullstory pipeline syncs to your integrated warehouse hourly when your integration is active and correctly configured. Each sync contains the events processed by Fullstory in the prior hour. Due to the distributed nature of the Internet and Fullstory's capture pipeline, a complete set of events are not guaranteed to be available every hour. Any missing events will be reconciled automatically during subsequent syncs. We expose both event_time and updated_time columns in the data that can be used to tune your queries as needed.

Are there any plans to add other warehouses in the future?

Yes, we plan to add more warehouses based on customer demand. If there's a destination you'd like added, let us know.

I see more event types in the UI's event filter list than in the data. Is something missing from the data?

No, All captured data is included in the Warehouse Sync. Some of the "events" that are shown in the UI are derived from the same underlying event_type. For example, "Refreshed URL" in the UI is a navigate event_type with event_properties:navigate_reason= 'reload' in the warehouse.

Can we configure event types that we do not want to sync? For example, if we decide that we do not want "change" events, can we not have those sent?

Currently, Fullstory only supports syncing all captured events to your warehouse. Events that you are not interested in can easily be filtered out in your SQL analysis. We are exploring the possibility of adding event filters prior to the sync. If this is something you would be interested in, please reach out for more information.

I see two tables, one called events and another called raw_events but raw events is empty. Is that expected?

Yes, the raw_events table is used as a temporary staging table in the Snowflake data load process. This table may include some data while the hourly load process is running but should be empty at all other times. Consumers of the data can ignore this table and should only use the events table for operational queries.

Does Warehouse support a Bi-directional sync?

Not yet, but we would love to hear more about any use cases you've got in mind or any workflows that you'd like to enable.

I see a slight discrepancy between Fullstory UI metrics and warehouse data.

All timestamps in the warehouse are in UTC whereas the Fullstory UI localizes metrics based on your browser's local timezone. To align these timezones, apply a timezone conversion from UTC to your local timezone in your SQL query.

Data Security Warning: If Fullstory receives a data deletion request from a customer, and that customer's data has already been exported to their warehouse, Fullstory will not be responsible for deleting the previously exported data, once it is out of Fullstory's possession. Once the data has been synced to a customer's system, it is the responsibility of the customer to ensure the data is secure and that they adhere to all privacy, security, and other applicable regulations.


Was this article helpful?

Got Questions?

Get in touch with a Fullstory rep, ask the community or check out our developer documentation.