Data Activation (Reverse ETL)
Rudderstack
Section titled “Rudderstack”Updating a Transformation
Section titled “Updating a Transformation”Transformations in RudderStack allow you to intercept events as they move from a Source to a Destination. Using JavaScript, you can modify these events on the fly to filter data, parse information, or enrich events with external data before they reach their final destination.
1. Accessing Transformations
Section titled “1. Accessing Transformations”There are two ways to access the transformation editor:
- Via the Sidebar: Click the Transformations tab in the left-hand navigation menu to see a list of all existing transformations.
- Via the Connections View: If looking at your visual data pipeline, you can simply click on the specific Transformation node located on the line connecting your Source to your Destination.
2. Writing Transformation Logic
Section titled “2. Writing Transformation Logic”The transformation editor uses standard JavaScript. You can write logic to manipulate the event object however you need. Common use cases include:
- Filtering (Allow/Block Lists): You can create logic to only allow specific events to pass through. This is useful for controlling costs in downstream tools (e.g., Braze) that charge based on event volume.
- Enrichment: You can fetch data from external APIs or add validation data to the event payload.
- Parsing: Restructuring the data format to match the requirements of the destination.
3. Testing the Transformation
Section titled “3. Testing the Transformation”Before deploying your code, you should validate it using the built-in testing tools located at the bottom of the editor.
Step 1: Import Test Data You do not need to wait for live data to test your code.
- Click the Import Event button.
- Select a sample event type from the list (e.g.,
Track Event,Identify Event,Page Event). - This will populate the input window with a sample JSON payload.
Step 2: Run the Test
- Once your code is written and your test event is imported, click the Run Test button.
- The editor will process the sample event through your JavaScript code.
Step 3: Analyze the Results
- Review the output window to ensure the JSON is formatted correctly.
- Click the Difference button (if available) or toggle between input/output views to clearly see exactly what changed between the original event and the transformed event.
4. Saving and Deploying
Section titled “4. Saving and Deploying”Once you have verified that the transformation logic produces the desired output:
- Click the Save Transformation button.
- The new event structure will immediately begin applying to data flowing through that connection.
Adding a New Profile Attribute
Section titled “Adding a New Profile Attribute”This process involves updating the configuration code, testing locally, deploying the changes to the RudderStack platform, and mapping the new field for activation.
Phase 1: Code Configuration
Section titled “Phase 1: Code Configuration”You must edit the YAML files in your local rs-profiles repository to define where the data comes from and how it should be calculated.
1. Define the Input Source (inputs.yaml)
Section titled “1. Define the Input Source (inputs.yaml)”This file tells RudderStack which raw tables in your data warehouse (e.g., Databricks) to look at.
- Open
inputs.yamlin your code editor. - Check for existing sources: If the table containing your new data point is already defined, skip to step 2.
- Add a new source (if necessary): If you are pulling from a new table, add a new block under
inputs:Ensure you define the- name: source_table_nametable: catalog.schema.table_nameoccurred_at_col: timestamp_column_nameids:- select: "lower(email_column)"type: emailentity: user- select: "phone_column"type: phoneentity: useroccurred_at_col(for timeline stitching) and the unique identifiers (email/phone) to link this data to the Identity Graph.
2. Define the Feature Logic (profiles.yaml)
Section titled “2. Define the Feature Logic (profiles.yaml)”This file defines the specific attribute you want to attach to the user profile.
- Open
profiles.yaml. - Add a new
entity_var: specific the logic for the new attribute. Common logic includes selecting thefirstorlastvalue seen.- entity_var:name: new_attribute_name # The name of the column in the final tableselect: last(source_column_name)from: inputs/source_table_namedescription: Description of what this attribute represents
Phase 2: Local Testing & Verification
Section titled “Phase 2: Local Testing & Verification”Before deploying, verify that the configuration compiles and produces the expected output.
-
Compile the Project: Run the following command in your terminal to check for syntax errors in your YAML:
Terminal window pb compileEnsure this completes without error.
-
Run the Project (Optional but Recommended): Run the pipeline locally to generate a test table in your development environment:
Terminal window pb runNote: This process can take significant time depending on data volume.
-
Verify in Warehouse: Navigate to your data warehouse (e.g., Databricks) and check the
user_feature_viewtable (usually in a test schema likeprofiles_test). Confirm yournew_attribute_namecolumn exists and is populated.
Phase 3: Deployment
Section titled “Phase 3: Deployment”Once the code is verified, you must push it to the remote repository and trigger a run in RudderStack.
-
Commit and Push: Commit your changes to
inputs.yamlandprofiles.yamland push them to your Git repository.Terminal window git add .git commit -m "Added new user attribute"git push -
Fetch & Run in RudderStack:
- Log in to the RudderStack dashboard.
- Navigate to Unify -> Profiles -> [Your Project] -> Settings.
- Click Fetch Latest to pull the new code from Git.
- Navigate to the History tab.
- Click Run.
- Wait for the run status to show a green checkmark.
Phase 4: Activation (Sync to Destination)
Section titled “Phase 4: Activation (Sync to Destination)”After the profile run completes, the data is available in the warehouse but needs to be mapped to the destination (Braze).
-
Navigate to the Audience: Go to Activate -> Audiences and select the relevant audience (e.g., “Eligible for Messaging”).
-
Update the Schema:
- Click on the Schema tab.
- Click Update.
- Scroll to the bottom and click Map another field.
-
Map the New Field:
- Warehouse Column: Select your
new_attribute_namefrom the dropdown. - Destination Field: Type the name of the custom attribute as it should appear in Braze.
- Click Save.
- Warehouse Column: Select your
-
Trigger Sync:
- Go to the Syncs tab.
- Click Sync now.
Once the sync completes, the new custom attribute will be available on the user profiles in Braze.
Braze Integration
Section titled “Braze Integration”Updating a Catalog (Braze CDI)
Section titled “Updating a Catalog (Braze CDI)”1. Overview
Section titled “1. Overview”Cloud Data Ingestion (CDI) allows Braze to sync data directly from a cloud data warehouse (e.g., Databricks, Snowflake). This is typically used to sync Catalogs (e.g., product inventory, store locations) or Events (e.g., purchase events).
Location in Braze: Data Settings > Cloud Data Ingestion
2. Braze Connection Configuration
Section titled “2. Braze Connection Configuration”To view or edit an existing sync:
- Navigate to the Cloud Data Ingestion page.
- Select the specific sync you wish to investigate.
- Review the Connection Details.
Key Configuration Fields:
- Catalog/Source: The database catalog (e.g.,
cleaned). - Schema: The specific schema within the database (e.g.,
braze_import). - Table: The table view containing the prepared data (e.g.,
vehiclesorlocations).
3. Required Data Structure (Source Table)
Section titled “3. Required Data Structure (Source Table)”For CDI to function correctly, the source table in your data warehouse must follow a specific schema. Braze does not ingest raw columns directly; it requires a packed JSON payload.
Standard Columns
Section titled “Standard Columns”| Column Name | Type | Description |
|---|---|---|
id | String/Int | The unique identifier for the catalog item or user. |
updated_at | Timestamp | Critical. Used for watermarking. Braze checks this timestamp to determine if the row has changed since the last sync. |
payload | String (JSON) | A JSON object containing all item attributes (e.g., description, price, image URL, category). |
deleted | Boolean | (Optional) If true, the item is removed from the Braze Catalog. |
The “Deleted” Logic
Section titled “The “Deleted” Logic”- Do not simply remove a row from the source table to delete it from Braze; the sync will simply ignore it.
- To delete an item, the row must exist in the table with
deleted = trueand a newupdated_attimestamp.
4. Sync Logic (How it works)
Section titled “4. Sync Logic (How it works)”The integration uses an incremental sync strategy:
- The system records the
Last Updated Attimestamp of the previous successful run. - On the next run, it queries the source table.
- It ingests only rows where the
updated_atvalue is greater than the previousLast Updated Atvalue.
5. Updating Data Transformation Logic
Section titled “5. Updating Data Transformation Logic”If you need to add new attributes to the catalog or change how data is formatted, you must update the upstream ETL process (usually a Databricks Notebook).
Step-by-Step Update Process:
Section titled “Step-by-Step Update Process:”- Locate the Table: In your data warehouse (e.g., Databricks), find the table referenced in the Braze Connection Details.
- Trace Lineage: Use the “Lineage” tab to find the upstream Job or Notebook that writes to this table.
- Edit the Code: Open the notebook to modify the transformation logic.
Common Code Pattern (PySpark Example)
Section titled “Common Code Pattern (PySpark Example)”The standard pattern involves selecting your raw columns and packing them into the payload column while setting the updated_at timestamp.
# Pseudo-code example based on video instructions
# 1. Prepare the dataframe with necessary logicdf_transformed = source_df.select( col("unique_id_column").alias("id"), current_timestamp().alias("updated_at"), # Updates timestamp to now # 2. Pack all attribute columns into a JSON string to_json(struct( col("attribute_1"), col("attribute_2"), col("price"), col("description") )).alias("payload"))
# 3. Write to the table targeted by Brazedf_transformed.write.mode("overwrite").saveAsTable("braze_import.target_table")6. Execution and Troubleshooting
Section titled “6. Execution and Troubleshooting”Once the data table is updated, the sync needs to run to reflect changes in Braze.
Scheduled Syncs
Section titled “Scheduled Syncs”The job will run automatically based on the frequency defined in the Braze settings (e.g., every 15 minutes, hourly).
Manual Sync (Force Update)
Section titled “Manual Sync (Force Update)”If you have made changes and do not want to wait for the schedule:
- Go to
Data Settings>Cloud Data Ingestionin Braze. - Find your sync job.
- Click the Sync Now button (refresh icon).
- Monitor the “Sync Logs” at the bottom of the page for Success or Error statuses.