Metrics and Monitoring at Kloudless

This post was written by our engineering intern, Matthew Soh.

Metrics and monitoring are important for any service. They provide critical information needed to detect and respond to incidents and issues. Kloudless deals with millions of requests, and keeping track of everything can be difficult. Kloudless has recently integrated a new metrics system to tackle this challenge.

Metrics from Kloudless can be sent to the new analytics platform for collection, analysis, and alerting. The analytics integration is available to all operators of Kloudless systems: both our own DevOps team administering our cloud version as well as operators of self-hosted Kloudless Enterprise appliances.

Usage

Let’s walk through a simple use case. Let’s say that your application uses the Kloudless Storage API and is failing to store files uploaded to your service. The potential issue could be anywhere in the stack. With the new metrics system, dashboards that display request metrics are easily accessible. Now you can quickly isolate the issue based on these metrics.

blog-post-dashboard

Chronograf dashboard with graphs of Kloudless metrics.

Here we have a simple dashboard. Core health checks to the Kloudless appliance appear to be fine. However, there is a sharp spike in the graph of Request Failures! This graph shows failures for outbound requests to upstream services. Hovering over the graph, we can look at the tags on each data series and see that there’s been a large increase in 500 errors to Box. This suggests that the issue is likely with the upstream service rather than the Kloudless API or the appliance itself. Knowing this allows us to narrow down which logs we need to look at to learn more about the error and take further steps to assess the root cause.

Request status is the mere tip of the iceberg when it comes to metrics provided by the Kloudless appliance. For a more detailed reference of available metrics, please refer to the Kloudless Enterprise Configuration guide.

How it works

The dashboard used above is built with Chronograf. It is one part of the metrics system deployed at Kloudless that uses the TICK stack, by InfluxData. TICK is comprised of Telegraf (collector), InfluxDB (datastore), Chronograf  (visualization), and Kapacitor (monitoring).

blog-post-influx

The TICK Stack. © 2017 InfluxData, Inc.

The metrics processing chain begins with Telegraf, the metrics collection daemon. Telegraf is designed to aggregate data from different sources and send them to various datastores. Sources include sysstat (a system information tool) and statsd (a common metrics daemon). The default datastore is InfluxDB, though others such as Graphite and CloudWatch can also be used. If required, the Telegraf output in the Kloudless appliance can be modified, enabling existing metrics collection or storage infrastructure to be used instead of InfluxDB.

The metrics then proceed to InfluxDB, which is designed for storing time-series data. This means that it has some neat features such as simple data retention policies and continuous queries. Data retention policies allow for time limits on metrics to expire old data. Continuous queries run at regular intervals on InfluxDB to summarise detailed data into broad overviews. For example, summations over counts of API requests are used to build daily summaries. Together, these features allow InfluxDB resource usage to be managed effectively.

Once the data is stored, Chronograf and Kapacitor work in tandem to help understand the collected metrics. Chronograf enables dashboard visualizations like the one described above to be built, while Kapacitor provides automatic monitoring so that there is no need to stare at the dashboard all day. Kapacitor’s monitoring and alerts are managed through TICKScripts which can be configured using either the Kapacitor command line client or Chronograf. We have provided sample TICKScripts which cover some common use cases.

Why we moved to the TICK stack

As the Kloudless Platform has grown, the volume and complexity of metrics data has grown with it. Previously, we were using with StatsD as a collector and Graphite to visualize metrics. This simple solution was easy to work with, but our needs have changed. Here are some of the unsupported scenarios we encountered:

  • StatsD doesn’t support tags. Tags are useful for providing context for the measurement, such as status or type of a request. Our workaround was to append tags onto the measurement name, similar to the tags used by DataDog’s DogStatsD. The downside of this approach was messy measurement names that were difficult to query.
  • Each StatsD measurement only has one value. It is sometimes useful to have a tuple of data grouped together in a single metric, or associate data such as application IDs to a metric. This isn’t possible with the StatsD+Graphite solution either.
  • The language used to query Graphite is limited to nesting of functions, and performing aggregations can be slow since the metrics are stored in flat files. This results in slow queries across multiple metrics and prevents easy use of complex operations.

These factors led us to look for a better alternative. InfluxDB seemed promising as its design was tailored for high volume time-series data and metrics collection. InfluxDB is built around measurements, which are in turn made up of many data points. A data point can have one or more values associated with it, and as many tags as needed. This addressed our first two issues right away and allowed for better querying.

For example, let’s try to determine which Kloudless applications were associated with failed API requests in the past day. We can do this with the following query:

select status, path, application_id from request_metrics_api_requests 
where time > now() - 24h and status=~/[^2][0-9]{2}/

In just one query, we’ve selected multiple values (status, path, application_id), filtered by tag values (status is not a 2XX code) and with a time frame limit (last 24 hours). This would have been messier and more tedious with our previous metrics system which did not allow associated metadata to be recorded. Additionally, InfluxDB’s time-series database design allows for tag-based queries to execute more efficiently since all tags are indexed.

Next Steps

We’ve been using the new metrics system at Kloudless to better understand, measure and monitor the performance of our hosted cloud platform. We’ve found that it has saved us time and effort when triaging issues, and we think you’ll find it useful as well. We’re excited to make this metrics system available to our enterprise customers using the Kloudless appliance. Customers using our cloud platform will also see analytics data for their Kloudless applications exposed in the developer portal in the upcoming months. As always, we would love to hear your thoughts and feedback on Twitter, comments below, or at hello@kloudless.com.

Calendar API: Availability Endpoint now… Available!

This post was written by our software engineer, Ryan Connor.

Finding an appropriate meeting time can take a lot of effort, even with just a few participants who have only a single calendar. Finding a meeting time that is sure to work for many participants with multiple calendars managed by several cloud services? Good luck!

Fortunately, Kloudless now helps you do just that. We are proud to announce that the Calendar Availability endpoint is online and ready to help your application find meeting times that work for any combination of user accounts and their calendars.

Using the Calendar Availability Endpoint

The Calendar Availability endpoint returns all available time windows among a set of calendars that match a specific meeting duration and desired time windows for the meeting. The endpoint also supports requests involving multiple calendars for multiple accounts.

To illustrate, imagine your app has 100 users who have two personal (Google) calendars and two work (Outlook) calendars for a total of four hundred calendars. One call to the Calendar Availability endpoint can retrieve available meeting times (if any) for all 100 users subject to all four of their calendars. Alternatively, you can retrieve available meeting times for just a subset of users and/or their calendars.

Kloudless seamlessly integrates with Google and Outlook Calendar behind the scenes—all you have to provide are Account IDs, Calendar IDs, a meeting duration, and time constraints. Let’s take a look at exactly how to do that.

Formatting the Request and Parsing the Response

Here are the details for the request and response format of the Calendar Availability endpoint based on our docs. You can scroll down further to see concrete examples.

Request Format

URL: https://api.kloudless.com/v1/accounts/{account_id,account_id,…}/cal/availability

Method: POST

Headers:

  • Authorization: Bearer [Token]
  • Content-Type: application/json

Body:

  • calendars: List of Calendar IDs. Uses the default calendar if empty. (Optional)
  • meeting_duration: ISO 8601 format for duration. (Required)
  • constraints: A dictionary of constraint keys and values. (Required)
    • time_windows: List of desired time slots with the following values.
      • start: ISO 8601 datetime format
      • end: ISO 8601 datetime format

Response Format

Headers:

  • Content-Type: application/json

Body:

  • time_windows: List of desired time slots with the following values.
    • start: ISO 8601 datetime format
    • end: ISO 8601 datetime format

The start and end times of each time_window in the response bookend (inclusively) the time periods in which all of the accounts are available given the constraints. Note that the times are returned in the GMT time zone, so you may want to convert to the time zone of your choice.

Concretely, if the response includes the “2 – 5 PM” time window and you wanted a 30-minute meeting, you can safely schedule the meeting to start and end at any time within 2 – 5 PM, inclusive (e.g., 2 – 2:30, 2:01-2:31, …, 4:29-4:59, 4:30-5:00).

Example Usage

Single account, two calendars specified

curl -H 'Authorization: Bearer [TOKEN]' \
    -H 'Content-Type: application/json' \
    -XPOST -d '{
        "calendars": ["ra5werWRsZXQzLcRik5BudGltb6RwwUNnbWFpbC5jb21=",
                      "fa2xvdWRsZXNzLnRlc3QudGltb3RoeUBnbWFpbC5jb20=”],
        "meeting_duration": "PT1H",
        "constraints": {
            "time_windows": [{
                "start": "2017-05-20T08:00:00+07:00",
                "end": "2017-05-20T12:00:00+07:00"
            },{
                "start": "2017-05-21T08:00:00+07:00",
                "end": "2017-05-21T12:00:00+07:00"
            }]
        }
    }' \
    'https://api.kloudless.com/v1/accounts/123/cal/availability'

 

{
  "time_windows": [
    {
      "start": "2017-05-20T02:00:00Z",
      "end": "2017-05-20T04:00:00Z"
    },
    {
      "start": "2017-05-21T03:00:00Z",
      "end": "2017-05-21T04:00:00Z"
    }
  ]
}

In this case, the requestor wants to find an appropriate meeting time for a one hour meeting during the 8 AM – 12 PM GMT+7 time window on either May 20, 2017 or May 21, 2017. The requestor is requiring one account as a participant in the meeting and further specifying two calendars from that account.

Note that the primary calendar within an account is automatically considered if no calendars are provided. But, if any calendars are provided, the primary calendar must be included to be considered. For example, if the primary calendar ID in this case is neither ra5werWRsZXQzLcRik5BudGltb6RwwUNnbWFpbC5jb21= nor fa2xvdWRsZXNzLnRlc3QudGltb3RoeUBnbWFpbC5jb20=, the primary calendar would be ignored when retrieving availability. To also consider the primary calendar, the requestor should provide it as a third calendar in the given calendar list.

The response indicates that any one-hour slot between either 9 AM – 11 AM GMT+7 on May 20, 2017 or 10 AM – 11 AM on May 21, 2017 works for the meeting. Of course, the desired meeting length is exactly as long as the boundaries of the second available time window, so there is only one slot that works on that date.

Multiple accounts, no calendars specified

curl -H 'Authorization: Bearer [TOKEN]' \
    -H 'Content-Type: application/json' \
    -XPOST -d '{
        "meeting_duration": "PT45M",
        "constraints": {
            "time_windows": [{
                "start": "2017-06-15T08:00:00-03:00",
                "end": "2017-06-15T17:00:00-03:00"
            }]
        }
    }' \
    'https://api.kloudless.com/v1/accounts/123,456,789/cal/availability'
{
  "time_windows": [
    {
      "start": "2017-06-15T05:00:00Z",
      "end": "2017-06-15T06:30:00Z"
    },
    {
      "start": "2017-06-15T07:45:00Z",
      "end": "2017-06-15T10:00:00Z"
    },
    {
      "start": "2017-06-15T12:30:00Z", 
      "end": "2017-06-15T14:00:00Z"
    }
  ]
}

Here, the requestor wants to find a time for a 45-minute meeting during the 8 AM – 5 PM GMT-3 time window on June 15, 2017. The requestor is requiring three accounts as participants in the meeting, and since no calendars are specified, is searching for availability based solely on each account’s primary calendar.

Note that you can pass time window start and end times in any time zone (and, in fact, differing time zones within the same request). Furthermore, the calendars to be searched on need not be in the same time zone as the constraints. The important point is that the times are passed as ISO 8601-formatted strings—Kloudless will handle the rest.

The response indicates that any 45-minute slot between 9 AM – 10:30 AM GMT-3, 0:45 AM – 1 PM GMT-3, or 3:30 – 5 PM GMT-3 on June 15, 2017 works for the meeting.

Looking to the Future

The Calendar Availability endpoint provides a simple way to coordinate meetings among multiple calendars and calendar accounts. Going forward, we will monitor usage of the endpoint and listen to your feedback to determine the best way to expand the endpoint’s functionality and flexibility. We already have some great suggestions for making the endpoint more customizable, including:

  • The start_times constraint. If the requestor provides start_times in addition to (or in lieu of) time_windows, the endpoint would return the subset of start_times that work as the start time for the meeting. Essentially, this lets the requestor ask “Which of these exact start times are OK for the meeting?”
  • The ​min_attend_percent constraint. Right now, a time window will only be included in the response if every given account is available in that window—that is, if one relevant calendar in one account is not available during a window, that account is not available and so that window is not returned. With the min_attend_percent constraint, a time window would be included if at least the specified percentage of accounts are available during that window.e
  • The work_or_personal constraint. The requestor could provide a work_or_personal argument to limit the times considered for work or personal hours. This could be especially useful when trying to coordinate across widely varying time zones. For example, a colleague across the world may technically be “available” for a meeting at the time the requestor desires, but that’s of little use if it’s 2 AM for that colleague!

Wrapping Up

We’re excited to rollout the Calendar Availability endpoint to the developers on our platform. What do you like about the endpoint right now, what questions do you have about it, and what do you wish it would do in the future? We would love to hear your thoughts and feedback on Twitter, our developer forums, comments below, or at hello@kloudless.com.

 

An Eventful Update

Kloudless developers can now manage their events even more efficiently using the new Events Endpoint updates. Check out what our engineers have been tinkering with below!

Kloudless Enterprise Events

Connect your Admin account and get access to organization-wide events. Enterprise Events can obtained through the normal events endpoint. The user responsible for the event is specified where applicable. 

Events Endpoint Pagination

The Events endpoint now supports requests of a specific page size and also returns the number of remaining events. It also supports only the retrieval events created after the cloud account has been connected to the Kloudless application. Additionally, a more granular list of event types is also now available, instead of + and -.

S3 Event Notifications

Event data and webhook notifications are now available for changes to data in S3 accounts. Any S3 accounts requiring this feature must be reconnected.

Whether you’re using the cloud, private installs, or Enterprise version of Kloudless, this new update enables your application to respond to activity in cloud storage more effectively.

Not a Kloudless developer yet? Click here to get started. Questions or feedback? Feel free to drop a line at hello@kloudless.com

Sharing files with Citrix ShareFile: a Look at the API

Disclaimer: This is coming from my personal experience with the Citrix ShareFile API and other cloud storage APIs. It is meant as a summary of the good aspects as well as the “gotchas” that I have encountered. Hopefully it will provide some insight into decisions that were made when designing the Kloudless API.
Developing for Enterprise Cloud Storage

Google and Dropbox are household names while Box is in the headlines for its ongoing IPO. However, the enterprise cloud storage space is a completely different landscape, with various companies like SugarSync, Egnyte, Bitcasa, and Citrix ShareFile all competing for companies’ cloud storage needs. What should you, as a developer, consider when addressing enterprise customers’ concerns?

Citrix ShareFile Features

ShareFile recently revamped their API, transitioning from an HTTPS endpoint to an ODATA specific HTTP Rest API. As a developer, the new API looks like many others, offering a familiarity and ease to integrate functionality. However, a few unique features separate ShareFile from the rest.

Control Planes (with Subdomains)
Like many other API providers, Citrix ShareFile implements the OAuth 2.0 protocol for authorization. ShareFile’s endpoints are:

  • Request Token
  • Access Token
  • Refresh Token
  • API requests

The authentication endpoint is separate from API requests based on Control Planes. The Control Plane separates user authentication, access control, reporting, and brokering from where any corporate data is stored. Enterprises can now feel safe about their data as Citrix’s service offers an API to interact with that data.  In addition, the subdomains allow for user creation, which is extremely important for CIOs, enterprises, and other groups. As a developer, I notice that the <appcp> corresponds to a specific control plane (sharefile.com, securevdr.com, etc.), which must be tracked.

On Premise Storage Zones
connectors

In this diagram, you’ll notice the second feature of Citrix ShareFile’s architecture: Storage Zones. Citrix ShareFile gives you the flexibility to choose where corporate data is stored with Citrix-managed Storage Zones or Customer-managed Storage Zones in two flavors: Amazon S3 or Microsoft Azure. Plainly, some companies want their corporate data on premise or on their own servers. This is a great feature for an Enterprise cloud storage provider. Now, as a developer, how does all of this affect me?

ShareFile API

The underlying product architecture of Citrix ShareFile gives insight into how the API is structured. Most endpoints look familiar, but I will highlight the key similarities and differences.

Items endpoint
The Items endpoint is the typical interface to a user’s files and folders. ShareFile has specifically exposed the following entities: File, Folder, Note, Link, and Symbolic Links. Each item entity has its own OData representation with the corresponding functions to create folders, retrieve folder contents, update an item, and even create links to specific items.

Storage Centers and Zones endpoint
The Zones and Storage Centers allow for interaction through the API. This is extremely important if companies want to deploy private storage centers or zones. Other cloud storage providers do not have or expose this functionality because of the architecture. One thing to keep in mind as a developer is that a user’s data may be spread across different storage centers and zones, but to a user, it appears as a single account.

Kloudless and ShareFile

At Sharefile’s Synergy Conference in early May 2014, interesting new features were announced. ShareFile can now connect not only to Sharepoint but a few other enterprise content platforms like Alfresco, Documentum, and Filenet. The connection theme continues, with the Kloudless API allowing developers to connect to enterprise and consumer cloud storage services through a standard API interface. Kloudless gives the developer flexibility in choosing what cloud storage features to integrate into their product including native functionality and user interface components. If you want to develop for users with both personal and company cloud storage accounts, you can get started quickly and easily with Kloudless — we’ll help!

Take a look at developers.kloudless.com as we continue to improve our developer friendly resources (SDKs, API mashups, and example apps)! Have any ideas or questions about the Kloudless API? Leave your questions and comments below, or drop a note to hello@kloudless.com.

Migrating Google Docs to Google Drive

Google’s Data Standardization

Google has been known as the king of (big) data, and Kloudless integrates with Gmail to move data from email to cloud storage. Google’s push for organizing the world’s information and making it universally accessible and useful was ahead of its time. This clearly shows in Google’s design of a data protocol for developers to develop products on Google’s platform. When Kloudless integrated Google Docs, it was part of a larger list of “GData” APIs.

The Documents List API was part of a greater set of APIs following the Google Data Protocol. Besides docs, there were:

  • analytics
  • apps
  • blogger
  • books
  • calendar
  • contacts
  • exif
  • finance
  • geo
  • health
  • marketplace
  • photos
  • sites
  • youtube

When Google Drive was introduced in 2012, Kloudless had an opportunity to retool its functionality. Recently, the rise of JSON has led to APIs moving to a different data standard.

Migrating from Documents List to Drive SDK

The Google Drive SDK uses the same infrastructure as the Documents List API; however, there are a few key differences beyond the inherent syntactical changes.

Authorization Mechanisms and Scopes

At Kloudless, we wanted to facilitate users’ account creation and management. Part of this process was to help users keep track of their identity with OpenID, OAuth 1.0 and OAuth 2.0. Switching from Docs to Drive meant that beyond just switching scopes, we would be moving to a pure OAuth 2.0 implementation to authorize users. Google overhauled their entire authentication system beyond just Google Docs to promote Google+ sign in. The OAuth protocol is an open standard for authorization and many services moved away from the OAuth 1.0 RFC specification to the OAuth 2.0 RFC. OAuth 2.0 focuses on client developer simplicity while providing specific authorization flows for web applications, desktop applications, mobile phones, and living room devices.

Design and Data types and Atom Pub XML to JSON

While Kloudless predominantly uses JSON in most of its API, Google’s Data protocol allows for differentiating data types more succinctly and creating resources with both JSON and Atom Pub XML.  The underlying file store of GDrive allows for more robust querying based on the numerous metadata attributes in the XML (now also in JSON).  Furthermore, Google Drive allows you to have multiple files with the same name and to have multiple parent folders.  The structure of Google Drive is extremely flexible although Collections are now deprecated.

File uploads and secure file storage!

Google Drive allows for 10GB uploads to their service.  While other services allow for unlimited file size like Dropbox (through chunked uploads / desktop client) and Bitcasa, Google Drive’s free tier has a 15GB limit with the cheapest options to upgrade for storage.

You also have the most comprehensive access controls for a consumer application of files with Google Drive.

ss1

ss2

Advantage: Kloudless

Kloudless works hard to have the most current API, so developers can work on user focused applications without worrying about features that no longer exist.  With every major revision to an underlying cloud storage API, Kloudless will update its back-end infrastructure to account for changes with seamless integration.  So for example, any user of the Kloudless product would not have noticed any change when moving from Google Docs to Google Drive.  We aim to provide the same seamless upgrade for all of our cloud storage services in our API.

Let us know if you’ve migrated from Google Docs to Google Drive and what you think!

A primer on debugging Native Client code in Chrome

This isn’t your father’s your average client-side app.

Chromium NaCl

Native what?

Yesterday, I was faced with an unfamiliar 10k line C program that did custom image manipulation. It takes in two arguments: an input image file and a destination for the resulting output file.

This is straightforward to run on the server-side, but I don’t want to maintain varying compute capacity just for an infrequently run, on-demand image conversion script! Before you mention it, no, I am not about to rewrite the entire program in JavaScript, no matter how fun that sounds.

Enter Chromium Native Client, an “open-source technology for running native compiled code in the browser”. Combined with Pepper.JS, I can run binaries in a browser window! Before you bring out the pitchforks, consider that the C program is almost a perfect fit for this! As long as I write a little code to manage that pesky file I/O, it should be smooth sailing from here, right? Not exactly. Turns out the Pepper.JS docs weren’t kidding about getting your hands dirty.

Continue reading

Automating Development Environments with Vagrant and Puppet

Vagrant and Puppet!

Your Situation

This could be quite varied, you could be:

  1. A solo developer looking for a fast/easy way to have a local dev environment that resembles your production environment (say you develop on OS X, but are deploying to an infrastructure running some distribution of Linux. As a bonus, there is also an easy way to deploy to Amazon’s EC2 if you have a solid setup locally.
  2. A member of a team, where everyone has their own development style and want to avoid the headaches of cross-platform support.
  3. Someone who normally sets up servers in a third-party hosting environment, but you want to test your deployment without paying a bunch of money in wasted servers (this is where I am!)

Continue reading