Lessons I’ve learned deploying SaaS software on-prem

service_management_integrations

If the best way to learn something is to teach it, then I’ve learned a surprising amount by guiding customers through on-premises deployments of our SaaS application.

Dealing with flaky DB migrations on Windows machines is a far cry from operating our fine-tuned AWS VPC. I’m glad we bit the bullet and deployed on-prem early on—it’s shaped our application and system design decisions in ways I couldn’t have imagined.

Background

We launched our integrations platform, Kloudless, in 2014 as a multi-tenant web app on AWS. As we began to work with our first enterprise customers, we quickly realized that our SaaS application didn’t come close to meeting the various expected regulatory compliance and audit requirements. Several organizations simply couldn’t use Kloudless until we certified compliance with PCI and HIPAA, or underwent a SOC 2 audit.

As CTO, I challenged my team to find a solution. We soon introduced Kloudless Enterprise, the on-premises version of our cloud solution, to allow our customers to run Kloudless as part of their on-prem stack. Today, Kloudless Enterprise Docker containers, AMIs, and OVAs power applications used by Fortune 50 companies and some of the nation’s largest banks. This year, we crossed 500M upstream API requests a month.

Here are some of the major lessons I’ve learned along the way.

Kloudless Enterprise (source)
\==============_=_/ ____.---'---`---._____
             \_ \    \----._________.----/
               \ \   /  /    `-_-'
           __,--`.`-'..'-_
          /____          ||
               `--.____,-'

Ops

Deploying is hard, so make it as easy as possible

Kloudless Enterprise is installed and updated often, so our customers require this process to be completed quickly. Limiting the number of exposed moving parts exposed enables us to provide updates that are as simple as deploying new Docker containers. Developers can start out using local PostgreSQL and Redis services and configure external ones later for scalable production deployments. This structure is similar to GitLab’s Omnibus packages, which are a great example.

We tied licenses to individual release channels for images and containers since a patch release is sometimes required for specific customers or a specific version. Secret: We also provide Kloudless operators the ability to update individual application code packages to the next patch release within a container, immutability be damned. While this is an anti-pattern and is definitely not the recommended option, it’s occasionally proven to be the path of least resistance in dealing with necessary change management review processes.

Preserving all application state outside of the container, like licenses, encryption keys, and meta-configuration, allows updates to be seamless. Using external storage, such as Docker volumes and network-attached storage like EBS, preserves configuration during updates.

You will thank your support tooling

SLAs for Enterprise and on-prem deployments often require the ability to quickly diagnose issues, ideally with self-serve tools like command-line utilities and monitoring integrations. Even so, our engineering team has had to dive into remotely deployed Kloudless development clusters to walk operators through configuration and diagnose hard-to-reproduce issues. Making this as easy as troubleshooting any cloud environment is a major benefit. When things don’t go as planned, support staff need tools available to assist customers with.

We built command line utilities to do everything from reading and sending log data, to managing users, maintenance windows, and configuration updates. Our most frequently used tool authenticates an outbound connection from Kloudless Enterprise instances to a secure remote bastion in our network for engineers to assist via a reverse SSH tunnel. Definitely use autossh for this kind of stuff. This utility lets us access our customers’ instances behind firewalls at their request while also maintaining their compliance requirements.

Scaling is harder when everyone does it their own way

Private Kloudless deployments reach upwards of several machines running 100+ cores combined. You can prevent unnecessary stress down the line by providing canonical scaling guides and items to consider.

Here are some issues we’ve seen while deploying privately hosted instances:

  • Postgres running out of disk due to insufficient capacity planning for activity stream data.
  • 10+ small AWS instances instead of 5 larger ones. Fun fact: AWS network throughput is based on instance vCPU, so adding more instances isn’t the answer to network bottlenecks!
  • Unhandled network partition issues. We quickly switched from a stateful, primary/secondary clustering format using etcd to a masterless model. This was not only simpler for our customers’ Ops teams, but turned out to be more robust and better able to handle large deployments.
  • Diverse deployment environments and tooling.
    Our initial version had an interactive setup process that required less preparation, but turned out to be a stumbling block for large production environments. Non-interactive deployment improves automation as well as integration with existing tools like AWS CloudFormation and Docker Compose.

Documentation makes things official

While on the topic of writing, I’ve noticed an entirely distinct set of documentation we’ve had to provide our on-prem customers. We expected to contribute API docs and information around setup, but we’ve also had to create far more documentation, including:

  • SOPs and security policy docs to provide security and legal teams
  • HA, scalability, and monitoring guides for operations
  • Solution walkthroughs and examples of configuration combinations for engineering
  • Detailed release notes and update paths
  • System troubleshooting guides

It’s important to be upfront about all data transmissions and to ensure tools initiating outbound connections are opt-in by default to not raise the ire of a customer’s security team.

The Application

On-prem vs. Cloud is best as a feature flag

The worst deployment strategy would be to fork the cloud application for on-prem deployments. In addition to increased maintenance costs, this results in bugs that dogfooding would have otherwise caught. Introducing a feature flag that indicates whether the application is run on-prem helps to easily toggle functionality like license restrictions and alternate web application UI.

Cloud to On-prem is an upgrade away

But wait, there’s more: using the same application and data stack makes it easy to transition cloud customers to private deployments or on-prem solutions without a complex data migration. While environment-specific items like encryption keys necessitate a documented upgrade procedure, it is possible to simplify the transition by limiting the deviation in application behavior. Plus, sales teams love the frictionless path to increasing revenue.

Configuration should stay as configuration

Ask the front-end design team to sit this one out. A friend of mine at GitHub contributes to GitHub Enterprise, which inspired a lot of our design choices when building out our own solution. We heeded one valuable bit of advice to reduce the overhead of configuration. We scrapped fancy settings dashboard web UIs and just provided a YAML file to edit. This saved us an inordinate amount of time shipping features. We’ve also:

  • had fewer questions than expected since the point-and-click folks stay away
  • accessed the full suite of YAML data structures to represent state
  • inherited configuration from org-wide, license-specific, and deployment-specific settings into each machine’s configuration
  • been able to access and manage configuration easily via utilities like Salt

Leverage open source

If you’d like a fun conversation with engineering and procurement teams at enterprises, then tell them they need to purchase even more on-prem software to actually view analytics, logs, monitoring, and more. Using third-party SaaS vendors works great for a cloud product, but on-prem software needs to come with batteries included for core functionality.

Fortunately, allowing a variety of open-source tools to be configured helps package functionality and prevent these concerns. For instance, Kloudless provides options to send system and application metrics to StatsD and InfluxDB as well as ship log data externally via standard protocols. This allows Kloudless to integrate with existing monitoring infrastructure with minimal effort.

Open source utilities also help users easily configure application security. For example, take advantage of Let’s Encrypt to provision SSL/TLS certificates for custom domain names quickly.

Application Security is as good as its documentation

Security is like a sine wave that peaks throughout this blog post at periodic intervals.

Application security goes beyond technical solutions that satisfy engineering. For example, summaries of compliance with certifications and items like pen test results are equally valuable. Security posture is relevant at every stage since it is one of the primary reasons for deploying on-prem. Providing a well-written, thorough information security policy covering everything from SDLC processes like code reviews, to information on internal controls, helps satisfy customers’ voracious need for security-related collateral.

“Enterprise” is an anagram for “Re-enter IPs”

We’ve had multiple requests for items like SSO, access controls for team members, activity logs, IP whitelists, and more. Making these easily configurable allows for frictionless deployments. However, we’ve noticed that these haven’t been a deal-breaker in most cases—we released Kloudless Enterprise without several features popular with larger customers. This could be specific to our type of application, but I’m glad we chose to ship our product ASAP instead of building features we assumed would be required. Check out enterpriseready.io for more details on features frequently included in software for enterprise customers.

Side-note: We also adjusted our build pipeline to compile and obfuscate application code from the very beginning to protect a different kind of IP, but in hindsight, it is debatable as to whether this was even necessary.

Technical solutions for licensing save time till they don’t

At the end of the day, a customer is likely purchasing some concept of a license “key” to run the application privately. We’ve found that the minimum effort required here is to build in validation checks and detail what data they transmit.

Avoid technical solutions for payments early on

Let’s say you create a self-hosted version of your product and you have a great billing system running on your cloud version—even using the latest Stripe Billing API—with a clear licensing model. I’ve found it’s still best to invoice customers manually until you’ve identified a repeatable process. We avoided prematurely optimizing payments with click-through solutions that would get in the way of customized pricing schemes, especially since the volume of enterprise deals is relatively low early on. Unlike in SaaS apps, it usually isn’t an option to disable accounts due to a Net 30 invoice delayed a few days. License terms also vary from quarterly and annual subscriptions to multi-year contracts, so I recommend setting aside the monthly cron job or equivalent solution.

We built an internal dashboard to update licenses entirely independently of payment, and we avoided directing customers purchasing on-prem solutions to the billing options provided to cloud customers. It’s hard enough to build in feature flags to deal with varied license requirements; requiring a single payment flow will simply slow things down.

APIs all the way down

Kloudless integrates APIs, but also provides a Meta API to manage the API integration platform. My goal this year is to convince our engineering team to integrate Kloudless with itself. In all seriousness, programmatic manipulation of the application is a massive benefit for engineering teams deploying the solution on-prem. It enables workflows that the customer may not have even bothered bringing up earlier, like requiring key rotation every 90 days.

I highly recommend an API that manages all data in the application, especially provisioning and deprovisioning workflows.

Would we do it again?

Yes, but I’d probably charge more right off the bat. 🙂

Ultimately, our GTM strategy incorporated deploying our application where the customer felt comfortable using it; either our multi-tenant cloud, a private deployment managed by us, or entirely self-hosted VMs and containers. A key decision to make for vendors considering on-prem deployments is simply whether the overhead described above is worth the benefits it brings.

Kloudless provides unified APIs to integrate applications with several software services in one go. We deploy where software services run, either in the cloud or on-prem. Sound interesting? We’re hiring!

 

Office 365 PowerShell queries via REST: Maximizing the Kloudless Pass-through API

In our previous post, we announced the availability of the Kloudless Pass-through API. The Pass-through API enables your application to make API requests directly to third-party services, while still using Kloudless’s unified APIs. In this blog post, we’ll discuss how to access the Office 365 PowerShell via the Kloudless REST API to perform administrative tasks in Office 365.

Building with our easy-to-use REST API offers many benefits. We handle the complexities of integrating with each service behind the scenes so you  don’t have to. This speeds up your integration time and decreases future maintenance. We’ve extended the same principle to our Pass-through API by introducing special capabilities such as enabling PowerShell queries for Office 365 admin accounts.

Office 365 PowerShell provides several remote management commands that can be used to administer your Office 365 tenant, similar to how you would via the Office 365 admin center web application. For example, a broad set of security and compliance features can be accessed via theSecurity & Compliance Center cmdlets by connecting to Office 365 using remote PowerShell. Normally, you would need access to a PowerShell prompt in order to access this functionality. The Kloudless REST API handles the heavy-lifting and enables you to access this functionality via our REST API.

Invoking Office 365 Security Center cmdlets via the Kloudless Pass-through API

To begin, connect a SharePoint Online admin account to your Kloudless application. The easiest way to do this is by logging into your Kloudless account and then navigating to the Interactive Docs. Click the “Add Account” button and then click on SharePoint Online under the “Admin accounts” section towards the bottom of the pop-up that opens.

idjg03n

Once you’ve connected your account, you will receive a Kloudless Account ID that can be used for API requests to the Kloudless API. You are now ready to make Pass-through API requests to this admin Office 365 account!

While SharePoint Online REST API requests can be performed without any additional configuration, the PowerShell queries described in this blog post are available in Kloudless Enterprise and also require special permission to access by Kloudless Enterprise developers. Please contact us at support@kloudless.com to learn how to enable this capability for your Kloudless Enterprise instances.

Request Format

The format of PowerShell pass-through API requests is as follows:

URL: https://api.kloudless.com/v1/accounts/{account_id}/raw
  • {account_id} is the Kloudless account ID of the SharePoint Online admin account connected.
Headers (described in the Pass-through API docs):
  • X-Kloudless-Raw-URI: http://powershell/ This special value indicates the request should be translated to a PowerShell query.
  • X-Kloudless-Raw-Method: POST
  • Authorization: Bearer {account_bearer_token} OR Authorization: APIKey {application_api_key} See our Authentication Docs for more information on authorizing API requests.
Body
JSON data in the format below:

    {
      "category": "o365-security",
      "command": {cmdlet_name},
      "options": {
        ... option name: value mappings if required ...
      }
    }
At the current time, only Office 365 Security and Compliance Center cmdlets ("category": "o365-security") and Exchange Online cmdlets ("category": "exchange") are available via the Kloudless API. If you would like access to other remote PowerShell cmdlets, please contact us at support@kloudless.com.

Examples of Requests

An example of a curl request with the format described above would be:

curl -H "Authorization: APIKey {api_key}" \
     -H "X-Kloudless-Raw-URI: http://powershell/ \
    https://api.kloudless.com/v1/accounts/{account_id}/raw \
    --data '{body}'

Please replace the {api_key}, {account_id} and {body} values with your API Key, connected account’s ID and JSON data for PowerShell respectively.

Here are some examples of Body data to use in {body} for specific cmdlets:

Get-ComplianceCase
Obtaining a list of compliance cases.

{
  "category": "o365-security",
  "command": "Get-ComplianceCase"
}
An example of a curl request for this would be:

curl -H "Authorization: APIKey 123ABC" \
     -H "X-Kloudless-Raw-URI: http://powershell/" \
     https://api.kloudless.com/v1/accounts/123/raw \
     --data '{"category": "o365-security", "command": "Get-ComplianceCase"}'
New-ComplianceCase
Create a new compliance case.

{
  "category": "o365-security",
  "command": "New-ComplianceCase",
  "options": {
    "Name": "test new case 2",
    "Description": "This case is created via curl"
  }
}
An example curl request would be identical to the one used for Get-ComplianceCase but with the new value above for --data instead.
New-CaseHoldPolicy
Create a new hold policy for a case.

{
  "category": "o365-security",
  "command": "New-CaseHoldPolicy",
  "options": {
    "Case": "3b4de8d5-13cb-4291-bdd0-b6e2bb82a08e",
    "Name": "New Hold",
    "SharePointLocation": "https://kloudless.sharepoint.com/test subsite/"
  }
}
where "3b4de8d5-13cb-4291-bdd0-b6e2bb82a08e" is the Identity GUID of the Case to add the legal hold policy to. This corresponds to the Locations section of a Hold when editing a Case’s Holds at https://protection.office.com/#/ediscovery.
New-CaseHoldRule
Create a rule to add to a hold policy.

{
  "category": "o365-security",
  "command": "New-CaseHoldRule",
  "options": {
    "Policy": "New Hold",
    "Name": "New Rule",
    "ContentMatchQuery": "SSN"
  }
}
where "New Hold" is the name of the hold I created previously. This corresponds to the Conditions section of a Hold.

Similarly, other policies such as retention policies can also be created, and existing objects can be deleted:

New-RetentionCompliancePolicy
Create a new preservation policy.

{
  "category": "o365-security",
  "command": "New-RetentionCompliancePolicy",
  "options": {
    "Name": "Test new policy",
    "SharePointLocation": "https://kloudless.sharepoint.com/"
  }
}
This creates a policy but has not yet created a rule to add to it, which can be done with .
Remove-ComplianceCase
Deleting a compliance case.

{
  "category": "o365-security",
  "command": "Remove-ComplianceCase",
  "options": {
    "Identity": "94b99324-5574-4220-b081-1b689cb386af",
    "Confirm:$false": null
  }
}
where "94b99324-5574-4220-b081-1b689cb386af" is the Identity GUID of the Case to remove.

Future capabilities of the Pass-through API

The Pass-through API provides provides a powerful new way to access third-party features via Kloudless as shown above. We’re excited to make this available to all developers on our platform and would love to hear any feedback or suggestions in our developer forum.

Announcing the Kloudless Pass-through API

We are excited to announce the general availability of the new Kloudless Pass-through API endpoint. The Pass-through API enables your application to access the full functionality of the third-party software service by performing API requests directly to the service. Kloudless continues to ensure any relevant authentication information like refreshed access tokens are included in the API request.

While Kloudless provides unified APIs, there may be times when your application needs to access unique functionality in a third-party service that is not available in Kloudless’s unified APIs. The Pass-through API solves this problem by enabling your application to send raw data to a third-party API’s endpoint and receive the raw response streamed directly from the service through Kloudless.

Check out our API docs for more information on how to use the Pass-through API.

A primer on debugging Native Client code in Chrome

This isn’t your father’s your average client-side app.

Chromium NaCl

Native what?

Yesterday, I was faced with an unfamiliar 10k line C program that did custom image manipulation. It takes in two arguments: an input image file and a destination for the resulting output file.

This is straightforward to run on the server-side, but I don’t want to maintain varying compute capacity just for an infrequently run, on-demand image conversion script! Before you mention it, no, I am not about to rewrite the entire program in JavaScript, no matter how fun that sounds.

Enter Chromium Native Client, an “open-source technology for running native compiled code in the browser”. Combined with Pepper.JS, I can run binaries in a browser window! Before you bring out the pitchforks, consider that the C program is almost a perfect fit for this! As long as I write a little code to manage that pesky file I/O, it should be smooth sailing from here, right? Not exactly. Turns out the Pepper.JS docs weren’t kidding about getting your hands dirty.

Continue reading