Prior to starting a historical data migration, ensure you do the following:
- Create a project on our US or EU Cloud.
- Sign up to a paid product analytics plan on the billing page (historic imports are free but this unlocks the necessary features).
- Raise an in-app support request with the Data pipelines topic detailing where you are sending events from, how, the total volume, and the speed. For example, "we are migrating 30M events from a self-hosted instance to EU Cloud using the migration scripts at 10k events per minute."
- Wait for the OK from our team before starting the migration process to ensure that it completes successfully and is not rate limited.
- Set the
historical_migration
option totrue
when capturing events in the migration.
Migrating from Amplitude is a two step process:
Export your data from Amplitude using the organization settings export, Amplitude Export API, or the S3 export.
Import data into PostHog using PostHog's Python SDK or
batch
API with thehistorical_migration
option set totrue
. Other libraries don't support historical migrations yet.
Exporting data from Amplitude
There are three ways to export data from Amplitude.
1. Organization settings export
The simplest way is to go to your project in your organization settings and click the Export Data button.
2. Export API
To export data using Amplitude's Export API, start by getting your API and secret key for your project from your organization settings.
You can then use these in a request to get the data like this:
curl --location --request GET 'https://amplitude.com/api/2/export?start=<starttime>&end=<endtime>' \-u '{api_key}:{secret_key}'
3. S3 export
If your data exceeds Amplitude's export size limitation, you can use their S3 export.
Importing Amplitude data into PostHog
Amplitude exports data in a zipped archive of JSON files. To get this data into PostHog, you need to:
- Unzip and read the data
- Convert the events from Amplitude's schema to PostHog's
- Capture the events into PostHog using the
historical_migration
option - Alias device IDs to user IDs
Steps 1, 3, and 4 are relatively straightforward, but step 2 requires more explanation.
Converting Amplitude events
Although Amplitude events have a similar structure, you need to convert them to PostHog's schema. Many events and properties have different keys. For example, autocaptured events and properties in PostHog often start with $
.
You can see Amplitude's event structure in their Export API documentation and PostHog's autocapture event structure in our autocapture docs.
Some conversions needed include:
- Changing event names like
[Amplitude] Page Viewed
to$pageview
- Changing event property keys like
[Amplitude] Page Location
to$current_url
- Translating
EMPTY
values inuser_properties
tonull
- Changing
event_time
to an ISO 8601 formattedtimestamp
- Using
$set
and$set_once
for person properties
Converting the data ensures that it matches the data PostHog captures and can be integrated in analysis.
Example Amplitude migration script
Below is a script that gets Amplitude data from the export folder, unzips it, converts the data to PostHog's schema, and then captures it in PostHog. It gives you a start, but likely needs to be modified to fit your infrastructure and data structure.
from posthog import Posthogfrom datetime import datetimeimport jsonimport osimport gzip# PostHog Python Clientposthog = Posthog(<ph_project_api_key>,host='https://us.i.posthog.com',debug=True,historical_migration=True)# Convert and capture Amplitude datadef capture_entry(entry):distinct_id = entry.get("user_id") or entry.get("device_id")event_name = entry["event_type"]if event_name == "session_start":returnif event_name == "[Amplitude] Page Viewed":event_name = "$pageview"if event_name in ["[Amplitude] Element Clicked", "[Amplitude] Element Changed"]:event_name = "$autocapture"timestamp = datetime.strptime(entry.get("event_time"), "%Y-%m-%d %H:%M:%S.%f")device_type = entry.get("device_type")if device_type == "Windows" or device_type == "Linux":device_type = "Desktop"elif device_type == "iOS" or device_type == "Android":device_type = "Mobile"else:device_type = Nonepayload = {"event": event_name,"distinct_id": distinct_id,"properties": {"$os": entry.get("device_type"),"$browser": entry.get("os_name"),"$browser_version": int(entry.get("os_version")),"$device_type": device_type,"$current_url": entry.get("event_properties").get("[Amplitude] Page URL"),"$host": entry.get("event_properties").get("[Amplitude] Page Domain"),"$pathname": entry.get("event_properties").get("[Amplitude] Page Path"),"$viewport_height": entry.get("event_properties").get("[Amplitude] Viewport Height"),"$viewport_width": entry.get("event_properties").get("[Amplitude] Viewport Width"),"$referrer": entry.get("event_properties").get("referrer"),"$referring_domain": entry.get("event_properties").get("referring_domain"),"$device_id": entry.get("device_id"),"$ip": entry.get("ip_address"),"$geoip_city_name": entry.get("city"),"$geoip_subdivision_1_name": entry.get("region"),"$geoip_country_name": entry.get("country"),"$set_once": {"$initial_referrer": None if entry.get("user_properties").get("initial_referrer") == "EMPTY" else entry.get("user_properties").get("initial_referrer"),"$initial_referring_domain": None if entry.get("user_properties").get("initial_referring_domain") == "EMPTY" else entry.get("user_properties").get("initial_referring_domain"),"$initial_utm_source": None if entry.get("user_properties").get("initial_utm_source") == "EMPTY" else entry.get("user_properties").get("initial_utm_source"),"$initial_utm_medium": None if entry.get("user_properties").get("initial_utm_medium") == "EMPTY" else entry.get("user_properties").get("initial_utm_medium"),"$initial_utm_campaign": None if entry.get("user_properties").get("initial_utm_campaign") == "EMPTY" else entry.get("user_properties").get("initial_utm_campaign"),"$initial_utm_content": None if entry.get("user_properties").get("initial_utm_content") == "EMPTY" else entry.get("user_properties").get("initial_utm_content"),},"$set": {"$os": entry.get("device_type"),"$browser": entry.get("os_name"),"$device_type": device_type,"$current_url": entry.get("event_properties").get("[Amplitude] Page URL"),"$pathname": entry.get("event_properties").get("[Amplitude] Page Path"),"$browser_version": entry.get("os_version"),"$referrer": entry.get("event_properties").get("referrer"),"$referring_domain": entry.get("event_properties").get("referring_domain"),"$geoip_city_name": entry.get("city"),"$geoip_subdivision_1_name": entry.get("region"),"$geoip_country_name": entry.get("country"),}},"timestamp": timestamp}posthog.capture(event=payload["event"],distinct_id=payload["distinct_id"],properties=payload["properties"],timestamp=payload["timestamp"],)# Get Amplitude data from folder, unzip it, and use the capture functiondef get_entries_from_folder_and_capture(folder_name):count = 0for filename in os.listdir(folder_name):if filename.endswith('.json.gz'):file_path = os.path.join(folder_name, filename)with gzip.open(file_path, 'rt', encoding='utf-8') as f:for line in f:entry = json.loads(line)capture_entry(entry)count += 1if count >= 6:breakfolder_name = '609539'get_entries_from_folder_and_capture(folder_name)
This script may need modification depending on the structure of your Amplitude data, but it gives you a start.
Aliasing device IDs to user IDs
In addition to capturing the events, we want to combine anonymous and identified users. For Amplitude, events rely on the device ID before identification and the user ID after:
Event | User ID | Device ID |
---|---|---|
Application installed | null | 551dc114-7604-430c-a42f-cf81a3059d2b |
Login | 123 | 551dc114-7604-430c-a42f-cf81a3059d2b |
Purchase | 123 | 551dc114-7604-430c-a42f-cf81a3059d2b |
We want to attribute "Application installed" to the user with ID 123, so we need to also call alias with both the device ID and user ID:
posthog = Posthog('<ph_project_api_key>',host='https://us.i.posthog.com',debug=True,historical_migration=True)posthog.alias(previous_id=device_id, distinct_id=user_id)
Since you only need to do this once per user, ideally you'd store a record (e.g. a SQL table) of which users you'd already sent to PostHog, so that you don't end up sending the same events multiple times.