IDs: under the hood - Kloudless 科迪股份有限公司, a Netskope company

Few attributes are as routinely used in the Kloudless API as the ID. This string that uniquely identifies an object is core to all of Kloudless’ REST endpoints. After all, almost the entire Kloudless API centers around either CRUD operations on objects or queries that return an object or set of objects.

Best practices in the developer community call for certain core properties for IDs:

IDs are commonly expected to be immutable; they don’t change when the other attributes of an object do.
IDs are also expected to be unique. Two different IDs are assumed to represent different objects. As a corollary, two different objects must not share the same ID.
IDs representing the same object type share the same data format. For example, all User IDs may be integers, and all File IDs may be or strings. This property usually applies to other attributes of an object as well.

While these principles are straightforward to adhere to when the origin of data is entirely under one’s own control, they begin to prove challenging when dealing with myriad sources of data and a variety of third-party API implementations.

The Kloudless API inherently functions as an abstraction layer for multiple third-party APIs, some RESTful and some not. In this blog post, we’ll look at the challenges that arise with IDs in unified APIs such as Kloudless, and the steps we’ve taken to address them.

Challenges

Data Format

The Kloudless API attempts to impart a uniform format to IDs by symmetrically encrypting them in all API responses. This encoding ensures that IDs can always be treated as arbitrary strings. The encryption further guarantees malicious users are less likely to be able to guess or provide an expected ID to the API.

When decoded, the Kloudless ID could be anything from a simple string representing the third-party API’s object ID, to a complex JSON object Kloudless constructs to uniquely identify the object. The structure of the raw ID varies by service and object type. Check out the Raw Data encoding endpoint for a full description of how to convert raw upstream object IDs into Kloudless object IDs usable with the Kloudless API, as well as more information on the data that each raw ID includes.

Kloudless encodes any information that a third-party API requires in addition to an object ID to uniquely identify that object within each ID Kloudless constructs, so that Kloudless IDs can be used stand-alone.

Immutability

Usually, Kloudless retrieves the upstream service’s attributes for an object and maps them to the Kloudless unified API format. For example, check out our unified representation of a Calendar Event object here. This includes IDs as well. However, the upstream service may not possess unique immutable IDs.

For example, the upstream service may be an SMB file share or WebDAV server. In this case, the mutable file path uniquely identifies a file or folder at a point in time. Kloudless encodes the path and returns it as the ID since no other IDs are present. However, the previous path is no longer valid if the file moves, since the ID changes to the new path. This makes it hard to guarantee immutability.

Another location in the unified Cloud Storage API that path-based IDs are present is in files’ “ancestor” data. Usually, Kloudless only knows the path to each parent folder since upstream APIs don’t return the list of all IDs in the folder hierarchy leading to a file. For example, if the path to a file is /A/B/C/D, Kloudless definitely knows the ID for D, sometimes also C, but rarely B or A, other than the paths /A/B and /A respectively. Therefore, Kloudless uses those paths as the ID in parent folder sub-object metadata.

This means that the API now returns both ID-based IDs in response to metadata requests for /A/B but also a path-based ID representing that folder when referred to in its child files’ metadata. Here’s an example:

{
    "id": "fL3Rlc3QucG5n",
    "id_type": "default",
    "name": "test.png",
    "size": 353953,
    "type": "file",
    "created": "2015-01-22T08:15:30.424173Z",
    "modified": "2015-03-17T20:42:18.627533Z",
    "account" : 123,
    "path": "/All Files/Test folder/test.png",
    "parent": {
        "id": "fL2Hp",
        "name": "Test folder",
    },
    "ancestors": [
        {
            "name": "Test folder",
            "id": "fL2Hp"
        },
        {
            "name": "All Files",
            "id": "FrZekE=="
        }
    ],
    "owner": {
        "id": "ua2xvdWRmJ7XFSHh4u2Bnwm9a38RoeUBnbWFpbC5jb20="
    },
    "mime_type": "image/png",
    "downloadable": true,
    "api": "storage",
    "ids": {
        "default": "fL3Rlc3QucG5n",
        "path": "FyeUi9p3CY_WMHfKToZSg50f2opUe0rQBoJ69ukvd188="
    }
}

{

"id": "fL3Rlc3QucG5n",

"id_type": "default",

"name": "test.png",

"size": 353953,

"type": "file",

"created": "2015-01-22T08:15:30.424173Z",

"modified": "2015-03-17T20:42:18.627533Z",

"account" : 123,

"path": "/All Files/Test folder/test.png",

"parent": {

"id": "fL2Hp",

"name": "Test folder",

"ancestors": [

{

"name": "Test folder",

"id": "fL2Hp"

{

"name": "All Files",

"id": "FrZekE=="

}

"owner": {

"id": "ua2xvdWRmJ7XFSHh4u2Bnwm9a38RoeUBnbWFpbC5jb20="

"mime_type": "image/png",

"downloadable": true,

"api": "storage",

"ids": {

"default": "fL3Rlc3QucG5n",

"path": "FyeUi9p3CY_WMHfKToZSg50f2opUe0rQBoJ69ukvd188="

}

In the example above, the Kloudless API creates IDs for each ancestor item in the ancestors field, as well as uniquely identifies the file itself via the id attribute. It also includes an ids attribute in the response to clearly describe the different kinds of IDs that Kloudless returns to reference this specific object. This helps us with our next challenge: uniqueness.

Uniqueness

The path-based IDs mentioned above are only unique at a single point in time and could overlap based on how files are renamed. In addition, paths are not unique across user accounts in an Enterprise File Sync and Share (EFSS) tenant. For example, two users within a single Dropbox Business account could both have separate folders at /A or both have the default file Getting Started.pdf in their personal root directories.

Although the Kloudless API tolerates IDs that overlap, apps built on the platform may not. Therefore, Kloudless takes steps to identify which kind of id is present in each object by including an id_type attribute whenever an id is present for an object with multiple IDs. id_type is only present if retrieving the object’s metadata would return an ids attribute with multiple values.

Revisiting the Dropbox Business example above, users within a Dropbox Business account could also share a common folder /B. Dropbox provides a unique identifier for this folder within each user’s account, but also provides a common Shared ID that indicates it is the same folder regardless of which user the app accesses the folder through. The Kloudless API therefore returns yet another ID for Dropbox data, with the id_type set to shared.

Together, an object’s ids provide apps with a mechanism to resolve the IDs returned by other API endpoints. Instead of storing just the id, apps can cache all ids when retrieving an object’s metadata and reference it in the future when receiving any of the types of IDs that appear via the API. Apps can also always resolve any ID returned by the API by using it in an API request, such as to retrieve the object’s metadata, but this incurs an extra API request just to resolve an ID and adds unnecessary latency.

The current status quo

Today, all Kloudless objects include the following attributes in addition to id:

id_type: Refers to the type of id. Helpful for objects with multiple IDs.
ids: Only returned when querying the object’s metadata. Includes all possible id_type: id mappings the Kloudless API could return for the object.

Unified APIs have no choice but to adopt creative ways to make data from third-parties conform closer to best practices. One such step Kloudless uses is to generalize IDs and include meta attributes that identify the IDs themselves. This ensures a uniform data representation regardless of which third-party API the data originates from under the hood.

Check out the Kloudless unified APIs on our docs for more information and try us out to accelerate your time to market with integrations!

Why Kloudless?

Resources

Unified API Platform

Enterprise

Industry

Use Case

Documentation

Tools

Resources

Kloudless Blog

Challenges

Data Format

Immutability

Uniqueness

The current status quo

Related Articles

Kloudless API v1 Launches Today

How to monitor activity in S3, using SNS and SQS

Business Cards for the Future: An ABBYY Case Study