Lessons I’ve learned deploying SaaS software on-prem

service_management_integrations

If the best way to learn something is to teach it, then I’ve learned a surprising amount by guiding customers through on-premises deployments of our SaaS application.

Dealing with flaky DB migrations on Windows machines is a far cry from operating our fine-tuned AWS VPC. I’m glad we bit the bullet and deployed on-prem early on—it’s shaped our application and system design decisions in ways I couldn’t have imagined.

Background

We launched our integrations platform, Kloudless, in 2014 as a multi-tenant web app on AWS. As we began to work with our first enterprise customers, we quickly realized that our SaaS application didn’t come close to meeting the various expected regulatory compliance and audit requirements. Several organizations simply couldn’t use Kloudless until we certified compliance with PCI and HIPAA, or underwent a SOC 2 audit.

As CTO, I challenged my team to find a solution. We soon introduced Kloudless Enterprise, the on-premises version of our cloud solution, to allow our customers to run Kloudless as part of their on-prem stack. Today, Kloudless Enterprise Docker containers, AMIs, and OVAs power applications used by Fortune 50 companies and some of the nation’s largest banks. This year, we crossed 500M upstream API requests a month.

Here are some of the major lessons I’ve learned along the way.

Kloudless Enterprise (source)
\==============_=_/ ____.---'---`---._____
             \_ \    \----._________.----/
               \ \   /  /    `-_-'
           __,--`.`-'..'-_
          /____          ||
               `--.____,-'

Ops

Deploying is hard, so make it as easy as possible

Kloudless Enterprise is installed and updated often, so our customers require this process to be completed quickly. Limiting the number of exposed moving parts exposed enables us to provide updates that are as simple as deploying new Docker containers. Developers can start out using local PostgreSQL and Redis services and configure external ones later for scalable production deployments. This structure is similar to GitLab’s Omnibus packages, which are a great example.

We tied licenses to individual release channels for images and containers since a patch release is sometimes required for specific customers or a specific version. Secret: We also provide Kloudless operators the ability to update individual application code packages to the next patch release within a container, immutability be damned. While this is an anti-pattern and is definitely not the recommended option, it’s occasionally proven to be the path of least resistance in dealing with necessary change management review processes.

Preserving all application state outside of the container, like licenses, encryption keys, and meta-configuration, allows updates to be seamless. Using external storage, such as Docker volumes and network-attached storage like EBS, preserves configuration during updates.

You will thank your support tooling

SLAs for Enterprise and on-prem deployments often require the ability to quickly diagnose issues, ideally with self-serve tools like command-line utilities and monitoring integrations. Even so, our engineering team has had to dive into remotely deployed Kloudless development clusters to walk operators through configuration and diagnose hard-to-reproduce issues. Making this as easy as troubleshooting any cloud environment is a major benefit. When things don’t go as planned, support staff need tools available to assist customers with.

We built command line utilities to do everything from reading and sending log data, to managing users, maintenance windows, and configuration updates. Our most frequently used tool authenticates an outbound connection from Kloudless Enterprise instances to a secure remote bastion in our network for engineers to assist via a reverse SSH tunnel. Definitely use autossh for this kind of stuff. This utility lets us access our customers’ instances behind firewalls at their request while also maintaining their compliance requirements.

Scaling is harder when everyone does it their own way

Private Kloudless deployments reach upwards of several machines running 100+ cores combined. You can prevent unnecessary stress down the line by providing canonical scaling guides and items to consider.

Here are some issues we’ve seen while deploying privately hosted instances:

  • Postgres running out of disk due to insufficient capacity planning for activity stream data.
  • 10+ small AWS instances instead of 5 larger ones. Fun fact: AWS network throughput is based on instance vCPU, so adding more instances isn’t the answer to network bottlenecks!
  • Unhandled network partition issues. We quickly switched from a stateful, primary/secondary clustering format using etcd to a masterless model. This was not only simpler for our customers’ Ops teams, but turned out to be more robust and better able to handle large deployments.
  • Diverse deployment environments and tooling.
    Our initial version had an interactive setup process that required less preparation, but turned out to be a stumbling block for large production environments. Non-interactive deployment improves automation as well as integration with existing tools like AWS CloudFormation and Docker Compose.

Documentation makes things official

While on the topic of writing, I’ve noticed an entirely distinct set of documentation we’ve had to provide our on-prem customers. We expected to contribute API docs and information around setup, but we’ve also had to create far more documentation, including:

  • SOPs and security policy docs to provide security and legal teams
  • HA, scalability, and monitoring guides for operations
  • Solution walkthroughs and examples of configuration combinations for engineering
  • Detailed release notes and update paths
  • System troubleshooting guides

It’s important to be upfront about all data transmissions and to ensure tools initiating outbound connections are opt-in by default to not raise the ire of a customer’s security team.

The Application

On-prem vs. Cloud is best as a feature flag

The worst deployment strategy would be to fork the cloud application for on-prem deployments. In addition to increased maintenance costs, this results in bugs that dogfooding would have otherwise caught. Introducing a feature flag that indicates whether the application is run on-prem helps to easily toggle functionality like license restrictions and alternate web application UI.

Cloud to On-prem is an upgrade away

But wait, there’s more: using the same application and data stack makes it easy to transition cloud customers to private deployments or on-prem solutions without a complex data migration. While environment-specific items like encryption keys necessitate a documented upgrade procedure, it is possible to simplify the transition by limiting the deviation in application behavior. Plus, sales teams love the frictionless path to increasing revenue.

Configuration should stay as configuration

Ask the front-end design team to sit this one out. A friend of mine at GitHub contributes to GitHub Enterprise, which inspired a lot of our design choices when building out our own solution. We heeded one valuable bit of advice to reduce the overhead of configuration. We scrapped fancy settings dashboard web UIs and just provided a YAML file to edit. This saved us an inordinate amount of time shipping features. We’ve also:

  • had fewer questions than expected since the point-and-click folks stay away
  • accessed the full suite of YAML data structures to represent state
  • inherited configuration from org-wide, license-specific, and deployment-specific settings into each machine’s configuration
  • been able to access and manage configuration easily via utilities like Salt

Leverage open source

If you’d like a fun conversation with engineering and procurement teams at enterprises, then tell them they need to purchase even more on-prem software to actually view analytics, logs, monitoring, and more. Using third-party SaaS vendors works great for a cloud product, but on-prem software needs to come with batteries included for core functionality.

Fortunately, allowing a variety of open-source tools to be configured helps package functionality and prevent these concerns. For instance, Kloudless provides options to send system and application metrics to StatsD and InfluxDB as well as ship log data externally via standard protocols. This allows Kloudless to integrate with existing monitoring infrastructure with minimal effort.

Open source utilities also help users easily configure application security. For example, take advantage of Let’s Encrypt to provision SSL/TLS certificates for custom domain names quickly.

Application Security is as good as its documentation

Security is like a sine wave that peaks throughout this blog post at periodic intervals.

Application security goes beyond technical solutions that satisfy engineering. For example, summaries of compliance with certifications and items like pen test results are equally valuable. Security posture is relevant at every stage since it is one of the primary reasons for deploying on-prem. Providing a well-written, thorough information security policy covering everything from SDLC processes like code reviews, to information on internal controls, helps satisfy customers’ voracious need for security-related collateral.

“Enterprise” is an anagram for “Re-enter IPs”

We’ve had multiple requests for items like SSO, access controls for team members, activity logs, IP whitelists, and more. Making these easily configurable allows for frictionless deployments. However, we’ve noticed that these haven’t been a deal-breaker in most cases—we released Kloudless Enterprise without several features popular with larger customers. This could be specific to our type of application, but I’m glad we chose to ship our product ASAP instead of building features we assumed would be required. Check out enterpriseready.io for more details on features frequently included in software for enterprise customers.

Side-note: We also adjusted our build pipeline to compile and obfuscate application code from the very beginning to protect a different kind of IP, but in hindsight, it is debatable as to whether this was even necessary.

Technical solutions for licensing save time till they don’t

At the end of the day, a customer is likely purchasing some concept of a license “key” to run the application privately. We’ve found that the minimum effort required here is to build in validation checks and detail what data they transmit.

Avoid technical solutions for payments early on

Let’s say you create a self-hosted version of your product and you have a great billing system running on your cloud version—even using the latest Stripe Billing API—with a clear licensing model. I’ve found it’s still best to invoice customers manually until you’ve identified a repeatable process. We avoided prematurely optimizing payments with click-through solutions that would get in the way of customized pricing schemes, especially since the volume of enterprise deals is relatively low early on. Unlike in SaaS apps, it usually isn’t an option to disable accounts due to a Net 30 invoice delayed a few days. License terms also vary from quarterly and annual subscriptions to multi-year contracts, so I recommend setting aside the monthly cron job or equivalent solution.

We built an internal dashboard to update licenses entirely independently of payment, and we avoided directing customers purchasing on-prem solutions to the billing options provided to cloud customers. It’s hard enough to build in feature flags to deal with varied license requirements; requiring a single payment flow will simply slow things down.

APIs all the way down

Kloudless integrates APIs, but also provides a Meta API to manage the API integration platform. My goal this year is to convince our engineering team to integrate Kloudless with itself. In all seriousness, programmatic manipulation of the application is a massive benefit for engineering teams deploying the solution on-prem. It enables workflows that the customer may not have even bothered bringing up earlier, like requiring key rotation every 90 days.

I highly recommend an API that manages all data in the application, especially provisioning and deprovisioning workflows.

Would we do it again?

Yes, but I’d probably charge more right off the bat. 🙂

Ultimately, our GTM strategy incorporated deploying our application where the customer felt comfortable using it; either our multi-tenant cloud, a private deployment managed by us, or entirely self-hosted VMs and containers. A key decision to make for vendors considering on-prem deployments is simply whether the overhead described above is worth the benefits it brings.

Kloudless provides unified APIs to integrate applications with several software services in one go. We deploy where software services run, either in the cloud or on-prem. Sound interesting? We’re hiring!

 

Metrics and Monitoring at Kloudless

This post was written by our engineering intern, Matthew Soh.

Metrics and monitoring are important for any service. They provide critical information needed to detect and respond to incidents and issues. Kloudless deals with millions of requests, and keeping track of everything can be difficult. Kloudless has recently integrated a new metrics system to tackle this challenge.

Metrics from Kloudless can be sent to the new analytics platform for collection, analysis, and alerting. The analytics integration is available to all operators of Kloudless systems: both our own DevOps team administering our cloud version as well as operators of self-hosted Kloudless Enterprise appliances.

Usage

Let’s walk through a simple use case. Let’s say that your application uses the Kloudless Storage API and is failing to store files uploaded to your service. The potential issue could be anywhere in the stack. With the new metrics system, dashboards that display request metrics are easily accessible. Now you can quickly isolate the issue based on these metrics.

blog-post-dashboard

Chronograf dashboard with graphs of Kloudless metrics.

Here we have a simple dashboard. Core health checks to the Kloudless appliance appear to be fine. However, there is a sharp spike in the graph of Request Failures! This graph shows failures for outbound requests to upstream services. Hovering over the graph, we can look at the tags on each data series and see that there’s been a large increase in 500 errors to Box. This suggests that the issue is likely with the upstream service rather than the Kloudless API or the appliance itself. Knowing this allows us to narrow down which logs we need to look at to learn more about the error and take further steps to assess the root cause.

Request status is the mere tip of the iceberg when it comes to metrics provided by the Kloudless appliance. For a more detailed reference of available metrics, please refer to the Kloudless Enterprise Configuration guide.

How it works

The dashboard used above is built with Chronograf. It is one part of the metrics system deployed at Kloudless that uses the TICK stack, by InfluxData. TICK is comprised of Telegraf (collector), InfluxDB (datastore), Chronograf  (visualization), and Kapacitor (monitoring).

blog-post-influx

The TICK Stack. © 2017 InfluxData, Inc.

The metrics processing chain begins with Telegraf, the metrics collection daemon. Telegraf is designed to aggregate data from different sources and send them to various datastores. Sources include sysstat (a system information tool) and statsd (a common metrics daemon). The default datastore is InfluxDB, though others such as Graphite and CloudWatch can also be used. If required, the Telegraf output in the Kloudless appliance can be modified, enabling existing metrics collection or storage infrastructure to be used instead of InfluxDB.

The metrics then proceed to InfluxDB, which is designed for storing time-series data. This means that it has some neat features such as simple data retention policies and continuous queries. Data retention policies allow for time limits on metrics to expire old data. Continuous queries run at regular intervals on InfluxDB to summarise detailed data into broad overviews. For example, summations over counts of API requests are used to build daily summaries. Together, these features allow InfluxDB resource usage to be managed effectively.

Once the data is stored, Chronograf and Kapacitor work in tandem to help understand the collected metrics. Chronograf enables dashboard visualizations like the one described above to be built, while Kapacitor provides automatic monitoring so that there is no need to stare at the dashboard all day. Kapacitor’s monitoring and alerts are managed through TICKScripts which can be configured using either the Kapacitor command line client or Chronograf. We have provided sample TICKScripts which cover some common use cases.

Why we moved to the TICK stack

As the Kloudless Platform has grown, the volume and complexity of metrics data has grown with it. Previously, we were using with StatsD as a collector and Graphite to visualize metrics. This simple solution was easy to work with, but our needs have changed. Here are some of the unsupported scenarios we encountered:

  • StatsD doesn’t support tags. Tags are useful for providing context for the measurement, such as status or type of a request. Our workaround was to append tags onto the measurement name, similar to the tags used by DataDog’s DogStatsD. The downside of this approach was messy measurement names that were difficult to query.
  • Each StatsD measurement only has one value. It is sometimes useful to have a tuple of data grouped together in a single metric, or associate data such as application IDs to a metric. This isn’t possible with the StatsD+Graphite solution either.
  • The language used to query Graphite is limited to nesting of functions, and performing aggregations can be slow since the metrics are stored in flat files. This results in slow queries across multiple metrics and prevents easy use of complex operations.

These factors led us to look for a better alternative. InfluxDB seemed promising as its design was tailored for high volume time-series data and metrics collection. InfluxDB is built around measurements, which are in turn made up of many data points. A data point can have one or more values associated with it, and as many tags as needed. This addressed our first two issues right away and allowed for better querying.

For example, let’s try to determine which Kloudless applications were associated with failed API requests in the past day. We can do this with the following query:

select status, path, application_id from request_metrics_api_requests 
where time > now() - 24h and status=~/[^2][0-9]{2}/

In just one query, we’ve selected multiple values (status, path, application_id), filtered by tag values (status is not a 2XX code) and with a time frame limit (last 24 hours). This would have been messier and more tedious with our previous metrics system which did not allow associated metadata to be recorded. Additionally, InfluxDB’s time-series database design allows for tag-based queries to execute more efficiently since all tags are indexed.

Next Steps

We’ve been using the new metrics system at Kloudless to better understand, measure and monitor the performance of our hosted cloud platform. We’ve found that it has saved us time and effort when triaging issues, and we think you’ll find it useful as well. We’re excited to make this metrics system available to our enterprise customers using the Kloudless appliance. Customers using our cloud platform will also see analytics data for their Kloudless applications exposed in the developer portal in the upcoming months. As always, we would love to hear your thoughts and feedback on Twitter, comments below, or at hello@kloudless.com.

Calendar API: Availability Endpoint now… Available!

This post was written by our software engineer, Ryan Connor.

Finding an appropriate meeting time can take a lot of effort, even with just a few participants who have only a single calendar. Finding a meeting time that is sure to work for many participants with multiple calendars managed by several cloud services? Good luck!

Fortunately, Kloudless now helps you do just that. We are proud to announce that the Calendar Availability endpoint is online and ready to help your application find meeting times that work for any combination of user accounts and their calendars.

Using the Calendar Availability Endpoint

The Calendar Availability endpoint returns all available time windows among a set of calendars that match a specific meeting duration and desired time windows for the meeting. The endpoint also supports requests involving multiple calendars for multiple accounts.

To illustrate, imagine your app has 100 users who have two personal (Google) calendars and two work (Outlook) calendars for a total of four hundred calendars. One call to the Calendar Availability endpoint can retrieve available meeting times (if any) for all 100 users subject to all four of their calendars. Alternatively, you can retrieve available meeting times for just a subset of users and/or their calendars.

Kloudless seamlessly integrates with Google and Outlook Calendar behind the scenes—all you have to provide are Account IDs, Calendar IDs, a meeting duration, and time constraints. Let’s take a look at exactly how to do that.

Formatting the Request and Parsing the Response

Here are the details for the request and response format of the Calendar Availability endpoint based on our docs. You can scroll down further to see concrete examples.

Request Format

URL: https://api.kloudless.com/v1/accounts/{account_id,account_id,…}/cal/availability

Method: POST

Headers:

  • Authorization: Bearer [Token]
  • Content-Type: application/json

Body:

  • calendars: List of Calendar IDs. Uses the default calendar if empty. (Optional)
  • meeting_duration: ISO 8601 format for duration. (Required)
  • constraints: A dictionary of constraint keys and values. (Required)
    • time_windows: List of desired time slots with the following values.
      • start: ISO 8601 datetime format
      • end: ISO 8601 datetime format

Response Format

Headers:

  • Content-Type: application/json

Body:

  • time_windows: List of desired time slots with the following values.
    • start: ISO 8601 datetime format
    • end: ISO 8601 datetime format

The start and end times of each time_window in the response bookend (inclusively) the time periods in which all of the accounts are available given the constraints. Note that the times are returned in the GMT time zone, so you may want to convert to the time zone of your choice.

Concretely, if the response includes the “2 – 5 PM” time window and you wanted a 30-minute meeting, you can safely schedule the meeting to start and end at any time within 2 – 5 PM, inclusive (e.g., 2 – 2:30, 2:01-2:31, …, 4:29-4:59, 4:30-5:00).

Example Usage

Single account, two calendars specified

curl -H 'Authorization: Bearer [TOKEN]' \
    -H 'Content-Type: application/json' \
    -XPOST -d '{
        "calendars": ["ra5werWRsZXQzLcRik5BudGltb6RwwUNnbWFpbC5jb21=",
                      "fa2xvdWRsZXNzLnRlc3QudGltb3RoeUBnbWFpbC5jb20=”],
        "meeting_duration": "PT1H",
        "constraints": {
            "time_windows": [{
                "start": "2017-05-20T08:00:00+07:00",
                "end": "2017-05-20T12:00:00+07:00"
            },{
                "start": "2017-05-21T08:00:00+07:00",
                "end": "2017-05-21T12:00:00+07:00"
            }]
        }
    }' \
    'https://api.kloudless.com/v1/accounts/123/cal/availability'

 

{
  "time_windows": [
    {
      "start": "2017-05-20T02:00:00Z",
      "end": "2017-05-20T04:00:00Z"
    },
    {
      "start": "2017-05-21T03:00:00Z",
      "end": "2017-05-21T04:00:00Z"
    }
  ]
}

In this case, the requestor wants to find an appropriate meeting time for a one hour meeting during the 8 AM – 12 PM GMT+7 time window on either May 20, 2017 or May 21, 2017. The requestor is requiring one account as a participant in the meeting and further specifying two calendars from that account.

Note that the primary calendar within an account is automatically considered if no calendars are provided. But, if any calendars are provided, the primary calendar must be included to be considered. For example, if the primary calendar ID in this case is neither ra5werWRsZXQzLcRik5BudGltb6RwwUNnbWFpbC5jb21= nor fa2xvdWRsZXNzLnRlc3QudGltb3RoeUBnbWFpbC5jb20=, the primary calendar would be ignored when retrieving availability. To also consider the primary calendar, the requestor should provide it as a third calendar in the given calendar list.

The response indicates that any one-hour slot between either 9 AM – 11 AM GMT+7 on May 20, 2017 or 10 AM – 11 AM on May 21, 2017 works for the meeting. Of course, the desired meeting length is exactly as long as the boundaries of the second available time window, so there is only one slot that works on that date.

Multiple accounts, no calendars specified

curl -H 'Authorization: Bearer [TOKEN]' \
    -H 'Content-Type: application/json' \
    -XPOST -d '{
        "meeting_duration": "PT45M",
        "constraints": {
            "time_windows": [{
                "start": "2017-06-15T08:00:00-03:00",
                "end": "2017-06-15T17:00:00-03:00"
            }]
        }
    }' \
    'https://api.kloudless.com/v1/accounts/123,456,789/cal/availability'
{
  "time_windows": [
    {
      "start": "2017-06-15T05:00:00Z",
      "end": "2017-06-15T06:30:00Z"
    },
    {
      "start": "2017-06-15T07:45:00Z",
      "end": "2017-06-15T10:00:00Z"
    },
    {
      "start": "2017-06-15T12:30:00Z", 
      "end": "2017-06-15T14:00:00Z"
    }
  ]
}

Here, the requestor wants to find a time for a 45-minute meeting during the 8 AM – 5 PM GMT-3 time window on June 15, 2017. The requestor is requiring three accounts as participants in the meeting, and since no calendars are specified, is searching for availability based solely on each account’s primary calendar.

Note that you can pass time window start and end times in any time zone (and, in fact, differing time zones within the same request). Furthermore, the calendars to be searched on need not be in the same time zone as the constraints. The important point is that the times are passed as ISO 8601-formatted strings—Kloudless will handle the rest.

The response indicates that any 45-minute slot between 9 AM – 10:30 AM GMT-3, 0:45 AM – 1 PM GMT-3, or 3:30 – 5 PM GMT-3 on June 15, 2017 works for the meeting.

Looking to the Future

The Calendar Availability endpoint provides a simple way to coordinate meetings among multiple calendars and calendar accounts. Going forward, we will monitor usage of the endpoint and listen to your feedback to determine the best way to expand the endpoint’s functionality and flexibility. We already have some great suggestions for making the endpoint more customizable, including:

  • The start_times constraint. If the requestor provides start_times in addition to (or in lieu of) time_windows, the endpoint would return the subset of start_times that work as the start time for the meeting. Essentially, this lets the requestor ask “Which of these exact start times are OK for the meeting?”
  • The ​min_attend_percent constraint. Right now, a time window will only be included in the response if every given account is available in that window—that is, if one relevant calendar in one account is not available during a window, that account is not available and so that window is not returned. With the min_attend_percent constraint, a time window would be included if at least the specified percentage of accounts are available during that window.e
  • The work_or_personal constraint. The requestor could provide a work_or_personal argument to limit the times considered for work or personal hours. This could be especially useful when trying to coordinate across widely varying time zones. For example, a colleague across the world may technically be “available” for a meeting at the time the requestor desires, but that’s of little use if it’s 2 AM for that colleague!

Wrapping Up

We’re excited to rollout the Calendar Availability endpoint to the developers on our platform. What do you like about the endpoint right now, what questions do you have about it, and what do you wish it would do in the future? We would love to hear your thoughts and feedback on Twitter, our developer forums, comments below, or at hello@kloudless.com.

 

An Eventful Update

Kloudless developers can now manage their events even more efficiently using the new Events Endpoint updates. Check out what our engineers have been tinkering with below!

Kloudless Enterprise Events

Connect your Admin account and get access to organization-wide events. Enterprise Events can obtained through the normal events endpoint. The user responsible for the event is specified where applicable. 

Events Endpoint Pagination

The Events endpoint now supports requests of a specific page size and also returns the number of remaining events. It also supports only the retrieval events created after the cloud account has been connected to the Kloudless application. Additionally, a more granular list of event types is also now available, instead of + and -.

S3 Event Notifications

Event data and webhook notifications are now available for changes to data in S3 accounts. Any S3 accounts requiring this feature must be reconnected.

Whether you’re using the cloud, private installs, or Enterprise version of Kloudless, this new update enables your application to respond to activity in cloud storage more effectively.

Not a Kloudless developer yet? Click here to get started. Questions or feedback? Feel free to drop a line at hello@kloudless.com

Sharing files with Citrix ShareFile: a Look at the API

Disclaimer: This is coming from my personal experience with the Citrix ShareFile API and other cloud storage APIs. It is meant as a summary of the good aspects as well as the “gotchas” that I have encountered. Hopefully it will provide some insight into decisions that were made when designing the Kloudless API.
Developing for Enterprise Cloud Storage

Google and Dropbox are household names while Box is in the headlines for its ongoing IPO. However, the enterprise cloud storage space is a completely different landscape, with various companies like SugarSync, Egnyte, Bitcasa, and Citrix ShareFile all competing for companies’ cloud storage needs. What should you, as a developer, consider when addressing enterprise customers’ concerns?

Citrix ShareFile Features

ShareFile recently revamped their API, transitioning from an HTTPS endpoint to an ODATA specific HTTP Rest API. As a developer, the new API looks like many others, offering a familiarity and ease to integrate functionality. However, a few unique features separate ShareFile from the rest.

Control Planes (with Subdomains)
Like many other API providers, Citrix ShareFile implements the OAuth 2.0 protocol for authorization. ShareFile’s endpoints are:

  • Request Token
  • Access Token
  • Refresh Token
  • API requests

The authentication endpoint is separate from API requests based on Control Planes. The Control Plane separates user authentication, access control, reporting, and brokering from where any corporate data is stored. Enterprises can now feel safe about their data as Citrix’s service offers an API to interact with that data.  In addition, the subdomains allow for user creation, which is extremely important for CIOs, enterprises, and other groups. As a developer, I notice that the <appcp> corresponds to a specific control plane (sharefile.com, securevdr.com, etc.), which must be tracked.

On Premise Storage Zones
connectors

In this diagram, you’ll notice the second feature of Citrix ShareFile’s architecture: Storage Zones. Citrix ShareFile gives you the flexibility to choose where corporate data is stored with Citrix-managed Storage Zones or Customer-managed Storage Zones in two flavors: Amazon S3 or Microsoft Azure. Plainly, some companies want their corporate data on premise or on their own servers. This is a great feature for an Enterprise cloud storage provider. Now, as a developer, how does all of this affect me?

ShareFile API

The underlying product architecture of Citrix ShareFile gives insight into how the API is structured. Most endpoints look familiar, but I will highlight the key similarities and differences.

Items endpoint
The Items endpoint is the typical interface to a user’s files and folders. ShareFile has specifically exposed the following entities: File, Folder, Note, Link, and Symbolic Links. Each item entity has its own OData representation with the corresponding functions to create folders, retrieve folder contents, update an item, and even create links to specific items.

Storage Centers and Zones endpoint
The Zones and Storage Centers allow for interaction through the API. This is extremely important if companies want to deploy private storage centers or zones. Other cloud storage providers do not have or expose this functionality because of the architecture. One thing to keep in mind as a developer is that a user’s data may be spread across different storage centers and zones, but to a user, it appears as a single account.

Kloudless and ShareFile

At Sharefile’s Synergy Conference in early May 2014, interesting new features were announced. ShareFile can now connect not only to Sharepoint but a few other enterprise content platforms like Alfresco, Documentum, and Filenet. The connection theme continues, with the Kloudless API allowing developers to connect to enterprise and consumer cloud storage services through a standard API interface. Kloudless gives the developer flexibility in choosing what cloud storage features to integrate into their product including native functionality and user interface components. If you want to develop for users with both personal and company cloud storage accounts, you can get started quickly and easily with Kloudless — we’ll help!

Take a look at developers.kloudless.com as we continue to improve our developer friendly resources (SDKs, API mashups, and example apps)! Have any ideas or questions about the Kloudless API? Leave your questions and comments below, or drop a note to hello@kloudless.com.

Migrating Google Docs to Google Drive

Google’s Data Standardization

Google has been known as the king of (big) data, and Kloudless integrates with Gmail to move data from email to cloud storage. Google’s push for organizing the world’s information and making it universally accessible and useful was ahead of its time. This clearly shows in Google’s design of a data protocol for developers to develop products on Google’s platform. When Kloudless integrated Google Docs, it was part of a larger list of “GData” APIs.

The Documents List API was part of a greater set of APIs following the Google Data Protocol. Besides docs, there were:

  • analytics
  • apps
  • blogger
  • books
  • calendar
  • contacts
  • exif
  • finance
  • geo
  • health
  • marketplace
  • photos
  • sites
  • youtube

When Google Drive was introduced in 2012, Kloudless had an opportunity to retool its functionality. Recently, the rise of JSON has led to APIs moving to a different data standard.

Migrating from Documents List to Drive SDK

The Google Drive SDK uses the same infrastructure as the Documents List API; however, there are a few key differences beyond the inherent syntactical changes.

Authorization Mechanisms and Scopes

At Kloudless, we wanted to facilitate users’ account creation and management. Part of this process was to help users keep track of their identity with OpenID, OAuth 1.0 and OAuth 2.0. Switching from Docs to Drive meant that beyond just switching scopes, we would be moving to a pure OAuth 2.0 implementation to authorize users. Google overhauled their entire authentication system beyond just Google Docs to promote Google+ sign in. The OAuth protocol is an open standard for authorization and many services moved away from the OAuth 1.0 RFC specification to the OAuth 2.0 RFC. OAuth 2.0 focuses on client developer simplicity while providing specific authorization flows for web applications, desktop applications, mobile phones, and living room devices.

Design and Data types and Atom Pub XML to JSON

While Kloudless predominantly uses JSON in most of its API, Google’s Data protocol allows for differentiating data types more succinctly and creating resources with both JSON and Atom Pub XML.  The underlying file store of GDrive allows for more robust querying based on the numerous metadata attributes in the XML (now also in JSON).  Furthermore, Google Drive allows you to have multiple files with the same name and to have multiple parent folders.  The structure of Google Drive is extremely flexible although Collections are now deprecated.

File uploads and secure file storage!

Google Drive allows for 10GB uploads to their service.  While other services allow for unlimited file size like Dropbox (through chunked uploads / desktop client) and Bitcasa, Google Drive’s free tier has a 15GB limit with the cheapest options to upgrade for storage.

You also have the most comprehensive access controls for a consumer application of files with Google Drive.

ss1

ss2

Advantage: Kloudless

Kloudless works hard to have the most current API, so developers can work on user focused applications without worrying about features that no longer exist.  With every major revision to an underlying cloud storage API, Kloudless will update its back-end infrastructure to account for changes with seamless integration.  So for example, any user of the Kloudless product would not have noticed any change when moving from Google Docs to Google Drive.  We aim to provide the same seamless upgrade for all of our cloud storage services in our API.

Let us know if you’ve migrated from Google Docs to Google Drive and what you think!

A primer on debugging Native Client code in Chrome

This isn’t your father’s your average client-side app.

Chromium NaCl

Native what?

Yesterday, I was faced with an unfamiliar 10k line C program that did custom image manipulation. It takes in two arguments: an input image file and a destination for the resulting output file.

This is straightforward to run on the server-side, but I don’t want to maintain varying compute capacity just for an infrequently run, on-demand image conversion script! Before you mention it, no, I am not about to rewrite the entire program in JavaScript, no matter how fun that sounds.

Enter Chromium Native Client, an “open-source technology for running native compiled code in the browser”. Combined with Pepper.JS, I can run binaries in a browser window! Before you bring out the pitchforks, consider that the C program is almost a perfect fit for this! As long as I write a little code to manage that pesky file I/O, it should be smooth sailing from here, right? Not exactly. Turns out the Pepper.JS docs weren’t kidding about getting your hands dirty.

Continue reading

Automating Development Environments with Vagrant and Puppet

Vagrant and Puppet!

Your Situation

This could be quite varied, you could be:

  1. A solo developer looking for a fast/easy way to have a local dev environment that resembles your production environment (say you develop on OS X, but are deploying to an infrastructure running some distribution of Linux. As a bonus, there is also an easy way to deploy to Amazon’s EC2 if you have a solid setup locally.
  2. A member of a team, where everyone has their own development style and want to avoid the headaches of cross-platform support.
  3. Someone who normally sets up servers in a third-party hosting environment, but you want to test your deployment without paying a bunch of money in wasted servers (this is where I am!)

Continue reading