How to track Amundsen user metric

After you have deployed Amundsen into production, you want to track how user interacts with Amundsen for various reasons.

The easier way is to leverage Google Analytics for basic user tracking. You could first get the analytics token for your domain and put it as the frontend config

Besides implementing Google Analytics, we provide a way called action_logging to do fine grained user action tracking. The action_logging is a decorator to allow you to integrate user info and pipe it to your inhouse event tracking system(e.g Kafka).

You need to put the custom method into entry_points following this example.

And here is the IDL proto we used at Lyft to send the event message:

message UserAction {
    // Sending host name
    google.protobuf.StringValue host_name = 1;
    // start time in epoch ms
    google.protobuf.Int64Value start_epoch_ms = 2;
    // end time in epoch ms
    google.protobuf.Int64Value end_epoch_ms = 3;
    // json array contains positional arguments
    common.LongString pos_args_json = 4;
    // json object contains key word arguments
    common.LongString keyword_args_json = 5;
    // json object contains output of command
    common.LongString output = 6;
    // an error message or exception stacktrace
    common.LongString error = 7;
    // `user`
    google.protobuf.StringValue user = 8;
}

It matches the action log model defined in here.

Once you have the event in your data warehouse, you could start building different KPI user metric:

  1. WAU

Sample query if the event table named as default.event_amundsenfrontend_user_action

SELECT date_trunc('week', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
   COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-04-21 00:00:00.000000'
AND "ds" <= '2020-10-21 05:31:14.000000'
GROUP BY date_trunc('week', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 10000

  1. DAU

Sample query if the event table named as default.event_amundsenfrontend_user_action

SELECT date_trunc('day', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
   COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-07-21 00:00:00.000000'
AND "ds" <= '2020-10-21 00:00:00.000000'
GROUP BY date_trunc('day', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 50000

You could also exclude weekends:

SELECT date_trunc('day', CAST("ds" AS TIMESTAMP)) AS "__timestamp",
   COUNT(DISTINCT user_value) AS "count_distinct_active_users"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "ds" >= '2020-04-21 00:00:00.000000'
AND "ds" <= '2020-10-21 05:33:11.000000'
AND day_of_week(logged_at) NOT IN (6,
                                 7)
GROUP BY date_trunc('day', CAST("ds" AS TIMESTAMP))
ORDER BY "count_distinct_active_users" DESC
LIMIT 50000

  1. User Penetration per role

Sample query if the event table named as default.event_amundsenfrontend_user_action and a table for user:

SELECT "title" AS "title",
   COUNT(DISTINCT email) * 100 / MAX(role_count) AS "penetration_percent"
FROM
(SELECT e.occurred_at,
      u.email,
      u.title,
      tmp.role_count
FROM default.family_user u
JOIN default.event_amundsenfrontend_user_action e ON u.email = e.user_value
JOIN
 (SELECT title,
         count(*) role_count
  FROM default.family_user
  GROUP BY 1) as tmp ON u.title = tmp.title
where ds is not NULL) AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-10-14T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
AND "role_count" > 20
GROUP BY "title"
ORDER BY "penetration_percent" DESC
LIMIT 100

  1. Usage breakdown per role_count

sample query:

SELECT "title" AS "title",
   count("email") AS "COUNT(email)"
FROM
(SELECT e.occurred_at,
      u.email,
      u.title,
      tmp.role_count
FROM default.family_user u
JOIN default.event_amundsenfrontend_user_action e ON u.email = e.user_value
JOIN
 (SELECT title,
         count(*) role_count
  FROM default.family_user
  GROUP BY 1) as tmp ON u.title = tmp.title
where ds is not NULL) AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-10-14T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
GROUP BY "title"
ORDER BY "COUNT(email)" DESC
LIMIT 15

  1. search click through rate

sample query:

SELECT date_trunc('day', CAST("occurred_at" AS TIMESTAMP)) AS "__timestamp",
   SUM(CASE
           WHEN CAST(json_extract_scalar(keyword_args_json, '$.index') AS BIGINT) <= 3 THEN 1
           ELSE 0
       END) * 100 / COUNT(*) AS "click_through_rate"
FROM
(SELECT *
FROM default.event_amundsenfrontend_user_action
WHERE ds > '2019-09-01') AS "expr_qry"
WHERE "occurred_at" >= from_iso8601_timestamp('2020-09-21T00:00:00.000000')
AND "occurred_at" <= from_iso8601_timestamp('2020-10-21T00:00:00.000000')
AND "command" IN ('_get_table_metadata',
                '_get_dashboard_metadata',
                '_log_get_user')
AND json_extract_scalar(keyword_args_json, '$.source') IN ('search_results',
                                                         'inline_search')
GROUP BY date_trunc('day', CAST("occurred_at" AS TIMESTAMP))
ORDER BY "click_through_rate" DESC
LIMIT 10000

  1. Top 50 active user

  2. Top search term

  3. Top popular tables

  4. Search click index

  5. Metadata edits

  6. Metadata edit leaders

  7. Amundsen user per role (by joining with employee data)