LogoLogo
HomeThe PlatformBlogSchedule a demo
  • Getting Started
    • Welcome to AdLibertas
  • The Platform
    • How it works
    • User-Level Audience Reporting
      • Creating Reports
        • Creating a New User Report
        • Creating Advanced User-Level Reports
        • Advanced Audience Builder
        • Custom Event Metrics
      • Report Layout
        • Report Module: Audience Filtering
        • Chart Type Module: Absolute vs. Relative Reports
        • Daily Totals, Per User, Cumulative Totals
        • Lifecycle Reports
        • Forecasting Module
        • Statistics Module
        • Measuring Confidence
      • Advanced Reporting Methods
        • User Measurement & Calculation Details
        • Date Ranges: Define Audience vs. Create Report
        • Exclude GAID tracking opt-outs
        • Scheduled Reports: Keep Updated & Rolling
        • Reporting on a Firebase AB test
        • Understanding “Audience Restraints”
        • Adding user time to your reports
    • Consolidated Revenue Reporting
      • Reporting Discrepancies
      • Reporting Availability & Timezones
      • Ad Network Re-Repost; Also: Revenue Reconciliation Accuracy
      • Consolidated Reporting vs. Consolidated Inventory Reporting
      • Reporting Table – Column Descriptions Common Metrics (Calculated Fields)
      • Facebook Reporting
      • Consolidated Ad Revenue with multiple mediators
    • Business Analytics
      • Analytics Layout
      • Understanding the "Explore Data" button
      • The Data Table
      • Asking a Question
      • Saving a Question
      • Creating a custom dimension
      • Setting up a pulse
    • Custom Dashboards
      • Custom Dashboard Filters
      • Combining data into a single chart
    • Direct SQL Access
    • Exporting Data
      • Ad Network Reports
      • Chart Reports
      • Custom API connections
      • Downloading & Scheduling Data Reports
      • Deprecated: Line Item Change Log
    • General
      • Change your Username & Password
      • Adding Users to your Account
      • Sharing Collaborative Links
      • AdLibertas Cost
  • Data Integrations
    • Connecting in 3 steps
    • Ad Impression-Level Revenue Connections
      • AppLovin Max User Revenue API
      • ironSource Ad Revenue Measurement Integration
      • Impression level tracking with Admob Mediation
      • Collecting MoPub Impression-Level Data as a Firebase Event
    • Ad Network & Store Connections
      • Adding Ad Network Credentials
      • How does App Store Reporting work?
      • Adding access to Google Play
      • Adding Sub User to App Store Connect
      • Getting the most from Ad Network Reports
    • Analytics Connections
      • Data Set Status
      • Connect AdLibertas to Firebase
      • Connecting AdLibertas to BigQuery
      • Firebase Install Counts in Audience Reporting
      • Setting User Campaigns in Firebase
      • Why use revenue to determine Firebase AB test winners?
      • Firebase Best Practices: keeping Google BigQuery Costs Down
    • Custom Integrations
      • Sending Events via Webhooks to AdLibertas
      • Impression level tracking with Admob Mediation
      • Connecting AdLibertas to BigQuery
      • Importing a custom data set
    • IAP Connections
      • Tracking IAP & Subscriptions in Firebase and BigQuery
      • RevenueCat Integration: WebHooks
      • RevenueCat: Setting Universal Identifiers
    • MMP Connections
      • Connecting Adjust
      • Connecting AppsFlyer
      • Connecting Kochava
  • FAQs
    • General
      • Why does AdLibertas need credentials?
    • Audience Reporting
      • Why doesn't my daily active user count match Firebase?
      • Why doesn’t my retention rate match?
      • Why aren't my install rates matching?
      • Why doesn't my relative user count match retention?
      • What is the probability projected LTV becomes actual LTV?
      • Why doesn’t Firebase and AdLibertas revenue match?
    • Reporting
      • What is “non_mopub” revenue
      • How do customers use AdLibertas?
  • Privacy & Security
    • Privacy & Security Details
Powered by GitBook
On this page
  1. Data Integrations
  2. Custom Integrations

Importing a custom data set

PreviousConnecting AdLibertas to BigQueryNextIAP Connections

Last updated 3 years ago

For most customers connecting data sources will be the primary source of data collection, for users with webhooks, we have a as well.

However, in some use-cases, you may want to import a custom dataset. This article covers the schema and procedure for importing a custom dataset to AdLibertas.

Uploading the data

To get started all we need is data uploaded to an s3 bucket, please reach out to your account manager for AWS credentials.

All files will be nested underneath an s3 folder in this format:

s3://adlibertas-<<company_name>>/athena/production/singulari/v1/s3_events_123456/

Partitioning:

If you have used hive or AWS glue/ athena before, our s3 paths should be familiar. Each upload should be categorized underneath a dataset_id and an event_date.

A dataset can be as large as you like and contain multiple apps. However, we ask that you separate uploads by an event_date (YYYY-MM-DD).

Breaking down the example file partitions, you can see the dataset_id=com.awesomeapp and event_date=2021-05-28 where *com.awesomeapp* is the dataset ID and *2021-05-28* is the event date.

Formatting:

Each file uploaded is in a JSONL file gzipped format. Please see for the schema of each JSON object (our schema is very close to the Firebase analytics schema). This is the schema used for the Hive data table. Do note most of the columns are not required, but to get the most out of the AdLibertas platform, we ask that you provide as much data as possible.

These columns should be provided at a minimum:

  • event_timestamp

  • event_name (ad_impression,in_app_purchase,user_engagement,etc)

  • event_params (this can be a revenue event [IAP or impression], engagment time in the app, user activity, etc)

  • user_id (your unique user ID)

  • user_pseudo_id (IDFV or Google App Set ID)

  • user_properties (assigned user properties, often AB test assignments)

  • user_first_touch_timestamp (Install timestamp)

  • device.vendor_id (IDFV) and/or device.advertising_id (IDFA or GAID)

  • geo.country (ISO2 country code)

  • app_info.id (app bundle ID)

  • platform (IOS or ANDROID)

  • traffic_source.name (campaign name for tracking LTV & ROAS by campaign)

Example SQL query

Here is the comprehensive schema of the Hive table that the new dataset needs to follow.

CREATE EXTERNAL TABLE `s3_events_123456`(
  `event_timestamp` bigint COMMENT 'from deserializer', 
  `event_name` string COMMENT 'from deserializer', 
  `event_params` array<struct<key:string,value:struct<string_value:string,int_value:bigint,float_value:double,double_value:double>>> COMMENT 'from deserializer', 
  `event_previous_timestamp` bigint COMMENT 'from deserializer', 
  `event_value_in_usd` double COMMENT 'from deserializer', 
  `event_bundle_sequence_id` bigint COMMENT 'from deserializer', 
  `event_server_timestamp_offset` bigint COMMENT 'from deserializer', 
  `user_id` string COMMENT 'from deserializer', 
  `user_pseudo_id` string COMMENT 'from deserializer', 
  `user_properties` array<struct<key:string,value:struct<string_value:string,int_value:bigint,float_value:double,double_value:double,set_timestamp_micros:bigint>>> COMMENT 'from deserializer', 
  `user_first_touch_timestamp` bigint COMMENT 'from deserializer', 
  `user_ltv` struct<revenue:double,currency:string> COMMENT 'from deserializer', 
  `device` struct<category:string,mobile_brand_name:string,mobile_model_name:string,mobile_marketing_name:string,mobile_os_hardware_model:string,operating_system:string,operating_system_version:string,vendor_id:string,advertising_id:string,language:string,is_limited_ad_tracking:string,time_zone_offset_seconds:bigint,browser:string,browser_version:string,web_info:string> COMMENT 'from deserializer', 
  `geo` struct<continent:string,country:string,region:string,city:string,sub_continent:string,metro:string> COMMENT 'from deserializer', 
  `app_info` struct<id:string,version:string,install_store:string,firebase_app_id:string,install_source:string> COMMENT 'from deserializer', 
  `traffic_source` struct<name:string,medium:string,source:string> COMMENT 'from deserializer', 
  `stream_id` string COMMENT 'from deserializer', 
  `platform` string COMMENT 'from deserializer', 
  `event_dimensions` string COMMENT 'from deserializer', 
  `ecommerce` string COMMENT 'from deserializer', 
  `items` array<string> COMMENT 'from deserializer')
PARTITIONED BY ( 
  `dataset_id` string, 
  `event_date` string)
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='app_info,device,ecommerce,event_bundle_sequence_id,event_dimensions,event_name,event_params,event_previous_timestamp,event_server_timestamp_offset,event_timestamp,event_value_in_usd,geo,items,platform,stream_id,traffic_source,user_first_touch_timestamp,user_id,user_ltv,user_properties,user_pseudo_id') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://adlibertas-<<COMPANY>>/athena/production/singulari/v1/s3_events_123456'
custom webhook integration
Here is an example of the output
this example file