Skip to content

Data Activation (Reverse ETL)

Transformations in RudderStack allow you to intercept events as they move from a Source to a Destination. Using JavaScript, you can modify these events on the fly to filter data, parse information, or enrich events with external data before they reach their final destination.

There are two ways to access the transformation editor:

  • Via the Sidebar: Click the Transformations tab in the left-hand navigation menu to see a list of all existing transformations.
  • Via the Connections View: If looking at your visual data pipeline, you can simply click on the specific Transformation node located on the line connecting your Source to your Destination.

The transformation editor uses standard JavaScript. You can write logic to manipulate the event object however you need. Common use cases include:

  • Filtering (Allow/Block Lists): You can create logic to only allow specific events to pass through. This is useful for controlling costs in downstream tools (e.g., Braze) that charge based on event volume.
  • Enrichment: You can fetch data from external APIs or add validation data to the event payload.
  • Parsing: Restructuring the data format to match the requirements of the destination.

Before deploying your code, you should validate it using the built-in testing tools located at the bottom of the editor.

Step 1: Import Test Data You do not need to wait for live data to test your code.

  1. Click the Import Event button.
  2. Select a sample event type from the list (e.g., Track Event, Identify Event, Page Event).
  3. This will populate the input window with a sample JSON payload.

Step 2: Run the Test

  1. Once your code is written and your test event is imported, click the Run Test button.
  2. The editor will process the sample event through your JavaScript code.

Step 3: Analyze the Results

  1. Review the output window to ensure the JSON is formatted correctly.
  2. Click the Difference button (if available) or toggle between input/output views to clearly see exactly what changed between the original event and the transformed event.

Once you have verified that the transformation logic produces the desired output:

  1. Click the Save Transformation button.
  2. The new event structure will immediately begin applying to data flowing through that connection.

This process involves updating the configuration code, testing locally, deploying the changes to the RudderStack platform, and mapping the new field for activation.

You must edit the YAML files in your local rs-profiles repository to define where the data comes from and how it should be calculated.

This file tells RudderStack which raw tables in your data warehouse (e.g., Databricks) to look at.

  • Open inputs.yaml in your code editor.
  • Check for existing sources: If the table containing your new data point is already defined, skip to step 2.
  • Add a new source (if necessary): If you are pulling from a new table, add a new block under inputs:
    - name: source_table_name
    table: catalog.schema.table_name
    occurred_at_col: timestamp_column_name
    ids:
    - select: "lower(email_column)"
    type: email
    entity: user
    - select: "phone_column"
    type: phone
    entity: user
    Ensure you define the occurred_at_col (for timeline stitching) and the unique identifiers (email/phone) to link this data to the Identity Graph.
2. Define the Feature Logic (profiles.yaml)
Section titled “2. Define the Feature Logic (profiles.yaml)”

This file defines the specific attribute you want to attach to the user profile.

  • Open profiles.yaml.
  • Add a new entity_var: specific the logic for the new attribute. Common logic includes selecting the first or last value seen.
    - entity_var:
    name: new_attribute_name # The name of the column in the final table
    select: last(source_column_name)
    from: inputs/source_table_name
    description: Description of what this attribute represents

Before deploying, verify that the configuration compiles and produces the expected output.

  1. Compile the Project: Run the following command in your terminal to check for syntax errors in your YAML:

    Terminal window
    pb compile

    Ensure this completes without error.

  2. Run the Project (Optional but Recommended): Run the pipeline locally to generate a test table in your development environment:

    Terminal window
    pb run

    Note: This process can take significant time depending on data volume.

  3. Verify in Warehouse: Navigate to your data warehouse (e.g., Databricks) and check the user_feature_view table (usually in a test schema like profiles_test). Confirm your new_attribute_name column exists and is populated.

Once the code is verified, you must push it to the remote repository and trigger a run in RudderStack.

  1. Commit and Push: Commit your changes to inputs.yaml and profiles.yaml and push them to your Git repository.

    Terminal window
    git add .
    git commit -m "Added new user attribute"
    git push
  2. Fetch & Run in RudderStack:

    • Log in to the RudderStack dashboard.
    • Navigate to Unify -> Profiles -> [Your Project] -> Settings.
    • Click Fetch Latest to pull the new code from Git.
    • Navigate to the History tab.
    • Click Run.
    • Wait for the run status to show a green checkmark.

After the profile run completes, the data is available in the warehouse but needs to be mapped to the destination (Braze).

  1. Navigate to the Audience: Go to Activate -> Audiences and select the relevant audience (e.g., “Eligible for Messaging”).

  2. Update the Schema:

    • Click on the Schema tab.
    • Click Update.
    • Scroll to the bottom and click Map another field.
  3. Map the New Field:

    • Warehouse Column: Select your new_attribute_name from the dropdown.
    • Destination Field: Type the name of the custom attribute as it should appear in Braze.
    • Click Save.
  4. Trigger Sync:

    • Go to the Syncs tab.
    • Click Sync now.

Once the sync completes, the new custom attribute will be available on the user profiles in Braze.

Cloud Data Ingestion (CDI) allows Braze to sync data directly from a cloud data warehouse (e.g., Databricks, Snowflake). This is typically used to sync Catalogs (e.g., product inventory, store locations) or Events (e.g., purchase events).

Location in Braze: Data Settings > Cloud Data Ingestion


To view or edit an existing sync:

  1. Navigate to the Cloud Data Ingestion page.
  2. Select the specific sync you wish to investigate.
  3. Review the Connection Details.

Key Configuration Fields:

  • Catalog/Source: The database catalog (e.g., cleaned).
  • Schema: The specific schema within the database (e.g., braze_import).
  • Table: The table view containing the prepared data (e.g., vehicles or locations).

For CDI to function correctly, the source table in your data warehouse must follow a specific schema. Braze does not ingest raw columns directly; it requires a packed JSON payload.

Column NameTypeDescription
idString/IntThe unique identifier for the catalog item or user.
updated_atTimestampCritical. Used for watermarking. Braze checks this timestamp to determine if the row has changed since the last sync.
payloadString (JSON)A JSON object containing all item attributes (e.g., description, price, image URL, category).
deletedBoolean(Optional) If true, the item is removed from the Braze Catalog.
  • Do not simply remove a row from the source table to delete it from Braze; the sync will simply ignore it.
  • To delete an item, the row must exist in the table with deleted = true and a new updated_at timestamp.

The integration uses an incremental sync strategy:

  1. The system records the Last Updated At timestamp of the previous successful run.
  2. On the next run, it queries the source table.
  3. It ingests only rows where the updated_at value is greater than the previous Last Updated At value.

If you need to add new attributes to the catalog or change how data is formatted, you must update the upstream ETL process (usually a Databricks Notebook).

  1. Locate the Table: In your data warehouse (e.g., Databricks), find the table referenced in the Braze Connection Details.
  2. Trace Lineage: Use the “Lineage” tab to find the upstream Job or Notebook that writes to this table.
  3. Edit the Code: Open the notebook to modify the transformation logic.

The standard pattern involves selecting your raw columns and packing them into the payload column while setting the updated_at timestamp.

# Pseudo-code example based on video instructions
# 1. Prepare the dataframe with necessary logic
df_transformed = source_df.select(
col("unique_id_column").alias("id"),
current_timestamp().alias("updated_at"), # Updates timestamp to now
# 2. Pack all attribute columns into a JSON string
to_json(struct(
col("attribute_1"),
col("attribute_2"),
col("price"),
col("description")
)).alias("payload")
)
# 3. Write to the table targeted by Braze
df_transformed.write.mode("overwrite").saveAsTable("braze_import.target_table")

Once the data table is updated, the sync needs to run to reflect changes in Braze.

The job will run automatically based on the frequency defined in the Braze settings (e.g., every 15 minutes, hourly).

If you have made changes and do not want to wait for the schedule:

  1. Go to Data Settings > Cloud Data Ingestion in Braze.
  2. Find your sync job.
  3. Click the Sync Now button (refresh icon).
  4. Monitor the “Sync Logs” at the bottom of the page for Success or Error statuses.