Metadata API Structure

Amundsen metadata service consists of three packages, API, Entity, and Proxy.

API package

A package that contains Flask Restful resources that serves Restful API request. The routing of API is being registered here.

Proxy package

Proxy package contains proxy modules that talks dependencies of Metadata service. There are currently three modules in Proxy package, Neo4j, Statsd and Atlas

Selecting the appropriate proxy (Neo4j or Atlas) is configurable using a config variable PROXY_CLIENT, which takes the path to class name of proxy module available here.

Note: Proxy’s host and port are configured using config variables PROXY_HOST and PROXY_PORT respectively. Both of these variables can be set using environment variables.

Neo4j proxy module

Neo4j proxy module serves various use case of getting metadata or updating metadata from or into Neo4j. Most of the methods have Cypher query for the use case, execute the query and transform into entity.

Apache Atlas proxy module

Apache Atlas proxy module serves all of the metadata from Apache Atlas, using pyatlasclient. More information on how to setup Apache Atlas to make it compatible with Amundsen can be found here

Statsd utilities module

Statsd utilities module has methods / functions to support statsd to publish metrics. By default, statsd integration is disabled and you can turn in on from Metadata service configuration. For specific configuration related to statsd, you can configure it through environment variable.

Entity package

Entity package contains many modules where each module has many Python classes in it. These Python classes are being used as a schema and a data holder. All data exchange within Amundsen Metadata service use classes in Entity to ensure validity of itself and improve readability and mainatability.


There are different settings you might want to change depending on the application environment like toggling the debug mode, setting the proxy, and other such environment-specific things.