Security & Design

Our approach, architecture, and safeguards

Overview

SkyPlug is a cloud infrastructure automation system provided to customers through a software-as-a-service model. As such, its management, maintenance, and security are delegated to the service by the customer. Like any platform responsible for delegated automation or management, SkyPlug is granted a degree of trust to perform its functions reliably and securely.

Customers rightly seek to understand what measures SkyPlug has in place to ensure security of the system in all that entails, and thereby minimize risk associated with its use. This below is intended to provide such context.

Cloud Integration Model – Microsoft Azure

SkyPlug integrates with Microsoft Azure to perform automated infrastructure changes such as changing the power state of a virtual machine in a customer Azure subscription (see Figure 1). To achieve this, SkyPlug requires a "Connection" to the customer Azure cloud environment. The Connection represents the information used by SkyPlug to access Azure and in particular the resources in the customer's subscription.

A Connection involves:

Subscription ID: identifies the subscription containing the resources to be managed
App Registration: an App Registration is created to represent SkyPlug in the customer's Azure Active Directory instance. This registers an identity for the SkyPlug service within the customer environment and is the principal associated with all operations as recorded in Azure's activity and audit logs.
Authentication certificate: X.509 certificate pair is used as the authentication credential from SkyPlug to Azure. The public certificate is added to the App Registration and becomes the sole credential by which SkyPlug is authenticated (using its private key) to perform operations against customer resources in Azure.
Subscription access: permission is granted to SkyPlug within the customer subscription to perform the necessary operations, such as discovering resources and modifying their power states. Permissions are assigned via Azure's role-based access control (RBAC) features.

Given that the Connection represents the access surface area to the customer environment being created during SkyPlug integration, we employ several precautions to protect it from abuse.

SkyPlug connection architecture — Figure 1: High-level integration model

No exposed secrets

SkyPlug follows the best practice of certificate authentication to ensure strong cryptographic protection while preventing the exposure of secrets. During Connection creation a public/private key pair is generated by the SkyPlug service. The private key is immediately encrypted and stored securely at rest within our database. A public certificate can then be generated against this private key and provided to Azure. Given that this exported certificate is public, it represents no disclosure risk and has no value to bad actors.

The private key never leaves the SkyPlug service for any reason and is always stored in a form decryptable only by the service itself for ephemeral use during authentication requests. In the unlikely event of a database exposure, an attacker cannot use the persisted Connection data to connect to the customer environment.

Least privilege access

The App Registration created for SkyPlug provides it a security principal within the customer subscription. This principal is then granted specific access to resources via RBAC to perform required operations. We recommend the built-in "Virtual Machine Contributor" role which has the required permissions for SkyPlug purposes. More granular permissions can be granted to reduce the access scope even further, for example to specific resource groups within a subscription. This second level mitigation ensures that in the unlikely event a bad actor somehow leveraged SkyPlug access maliciously, they would only possess limited privileges.

SkyPlug access can be revoked at any time by removing permissions from the associated App Registration or by deleting the App Registration itself.

Strong Cryptography

The authentication certificate is a self-signed X.509 v3 certificate generated internally by SkyPlug during Connection creation with a 2048-bit key length and SHA512 signature algorithm. This is protected at rest using the ASP.NET Core Data Protection API, which provides authenticated encryption using the AES-256-CBC cipher and HMACSHA256 authentication. A per-connection password protects the certificate private key, which is then itself wrapped in the native platform encryption.

The certificate private key operations are delegated to the Linux operating system which is the service hosting runtime.

Fully audited

SkyPlug interacts with the customer Azure environment using a dedicated service principal. All its operations are therefore audited by Microsoft's native logging mechanisms. Azure security tools can be used to monitor the service's activity and alert on any discrepancies.

Architecture

This section describes the SkyPlug system and software architecture, focusing on aspects relevant to security.

Hosting Environment

The core functionality of SkyPlug is hosted as a collection of web services in Microsoft Azure datacenters. It therefore inherits the base security posture of Azure, including physical security, network security, infrastructure maintenance, and general management processes. Azure also represents the bulk of the security perimeter for SkyPlug because its core operations and data reside within Azure boundaries.

SkyPlug is hosted within Azure App Service Linux containers. The underlying infrastructure is fully managed by Microsoft, including the maintenance of the host server and container operating systems for proper function, hardening, and patching vulnerabilities. All updates and configuration changes in this regard are automatically applied and transparent to the customer, and do not rely on SkyPlug administrator intervention.

Components

The overall architecture is broken out into several distinct components that are deployed and operated as independent services contributing to the whole. This approach is primarily intended to organize functionality and allow for more targeted deployment, maintenance, and monitoring. It also has the security benefit of facilitating granular security configuration and posture for each service.

Figure 2: System Architecture shows the component breakdown:

Traffic Manager: as a DNS routing mechanism, does not intercept user requests, which terminate at our proxy.
Proxy: as front line for requests, it filters and rate-limits public traffic, ensuring only valid requests pass through to service instances.
API: all API requests handled here. Strict enforcement of authorization and validity.
Auth: handles only authentication-related operations including login, carefully secured
Web: serves public web assets, effectively read-only for system data
Worker: background processing and job execution, not public facing
Cosmos DB: consolidated, single location for system data

Resilience & Scale

The architecture takes care to eliminate single points of failure to the greatest extent possible. We implement a multi-region strategy in which fully independent instances of SkyPlug services operate simultaneously in different geographic areas. Currently, there are two: East US and West US. As independent instances, either region can operate without the other, providing high availability during regional outages or maintenance. Customers also benefit from opportunity to be physically closer to a SkyPlug instance and may experience lower latency as a result.

As shown in Figure 3, the key components for achieving this distribution:

Azure Cosmos DB: the fully-distributed cloud-native database that houses all SkyPlug data. Replication of the database is ongoing and automatic, maintaining a separate and synchronized copy of all data in every SkyPlug region. There is no data risk from loss of an Azure region because the entire database is continuously replicated multiple times within a region and between regions.
Azure Traffic Manager: a distributed DNS-based routing mechanism that directs customers to the nearest healthy SkyPlug region. In the event of a regional outage, automatic failover will direct users to a separate healthy region.

In addition to regional redundancy, individual services also have various mechanisms of built-in redundancy. In each case SkyPlug inherits additional robustness.

The scalability properties of the system benefit from this redundancy, as it is an active-active model. All regions divide and share the overall workload at any given time. Increased demand over time can be satisfied either by adding regions, or by increasing capacity in existing regions. All capacity parameters are tunable at any time.

Network Communications

SkyPlug is a cloud-native system. As such, its primary interface is HTTP requests over the public internet from users to SkyPlug endpoints. These occur either in the form of web browser access to SkyPlug web assets or the SkyPlug API. In either case the same security considerations apply, i.e., confidentiality, authentication, and authorization.

All communications to SkyPlug endpoints are required to occur over secure HTTPS/TLS connections using modern ciphers. These are secured by the public SkyPlug certificate, ensuring all communications between users and SkyPlug are encrypted. This is also true for communications between SkyPlug and Azure, which occur during operations where SkyPlug must read or modify Azure data.

Moreover, all requests are authenticated by default by either a same-site HTTP-only cookie (for browser-based web and API requests) or by JWT tokens (for machine-machine API requests). These must be obtained via an OAuth 2.0 flow involving successful assertion of valid SkyPlug credentials. Unauthenticated requests are rejected with a 401 response.

Finally, all authenticated requests are further authorized against the specific privileges allowed for the user within SkyPlug. SkyPlug's role-based access control system ensures all operations are checked against the user's assigned role. Unauthorized requests are rejected with a 403 response.

SkyPlug enforces request rate limiting to prevent various kinds of abuse including spam and password guessing. Throttled requests are rejected with a 429 response.

Multitenancy

SkyPlug is a multitenant system hosting many customers within converged, efficient infrastructure. A paramount security property of any multitenant system is the mechanism enforcing tenant boundaries, ensuring each customer is isolated from one another. In the case of SkyPlug the division is enforced throughout the software stack, starting with the data itself and moving up through the application layer.

At the database level, SkyPlug utilizes Cosmos DB, which among its other features is a partitioned database. Each record is assigned a key that identifies its partition, and queries specify the partition within which they are scoped. SkyPlug leverages this property to partition all customer data with a unique tenant-specific key and issues all queries using the key securely derived from a user's tenant association. Thus, it can guarantee that data returned for a user's query is always and only data associated with the correct tenant. Users from one tenant cannot reach data from another tenant given this mechanism.

In the application layer, every request is identified and scoped to a particular tenant immediately upon being received. The user session is extracted from a verified cookie or JWT access token and the associated tenant identifier is bound to the subsequent operations. As such, all code paths from API endpoints to database queries are tenant-scoped and the boundaries are thereby maintained.

A unique feature of SkyPlug is to allow users to participate in multiple tenants (“Teams”). User 1 might be a member of both Team A and Team B and can switch freely between the two using the SkyPlug app's team switching feature. In this scenario, the currently selected team determines the context of data returned from the SkyPlug API and displayed in the client application. There is no concurrent comingling of data for different teams in a user context.

API-centric Approach

SkyPlug is API-centric by design: a comprehensive, documented API represents the public interface to all SkyPlug data and operations. The SkyPlug web application accomplishes all tasks through the SkyPlug API. Moreover, customers can use the API directly to implement their own custom processes.

From a security perspective, this makes the system surface narrower and more legible. There are no undocumented interfaces or back channels that could become forgotten or neglected points of vulnerability. The public API is tightly defined and scrutinized for correctness and security.

Authentication & Authorization

All user operations within SkyPlug require that a user demonstrate their identity and possess requisite privilege for the action they are taking. We implement standard best practices in these areas.

Authentication

The SkyPlug authentication server is built on the ASP.NET Core authentication and identity libraries, part of a vetted open-source Microsoft framework. A secure cookie is issued upon successful authentication. Multifactor authentication using time-based onetime passwords (TOTP) can be enabled by users to add an additional security layer. No third party services are involved at any time in the authentication process, which provides a narrow security boundary for credential storage and processing.

User Accounts and Passwords

SkyPlug users are represented by a user account record. This contains minimal information to identify the user: an email address (the SkyPlug username) and the user's password. This record is stored in a SkyPlug database container separate from customer data.

The email address is provided during user registration and verified with a unique code email loop before being allowed to generate a user record. Email addresses are unique in the system and registrations cannot be made with existing email addresses.

Passwords are hashed and encrypted using Microsoft's open-source hashing implementation, specifically PBKDF2 with HMAC-SHA512, 128-bit salt, 256-bit subkey, 100000 iterations. The user login process occurs on the authentication server which verifies the password against the hashed value. The plain-text password value is never logged or stored.

Users can perform self-service password resets against their email address. A verification loop confirms ownership of the email account via a random, ephemeral confirmation code.

Passwords are required to be at least 10 characters in length and users are encouraged to use longer passwords with high complexity. Passwords do not expire per latest best practices (NIST guidance).

Role-based Access Control

SkyPlug implements a granular access control system based on roles and permissions. A permission is an action against a given data type:

None (default)
Read
List
Create
Update
Delete

A role is a collection of such permissions. For example:

Connnections
- Read
Jobs
- Read
- List
Resources
- Read
- List
- Create
- Update
- Delete

A user is granted one or more roles via membership in a tenant (“Team”). This defines their privileges as a member of that team. A user with memberships in multiple teams will have a role per team, which can confer altogether different permissions.

SkyPlug provides several standard roles and supports defining custom roles to meet a team's needs.

Standard roles:

Administrator: All-powerful wizard permitted all actions and data
Job Manager: Can create and change jobs and resources; can't change connections or team
Read Only: Permitted to see any data; can't make changes

Software Design & Developer Operations

SkyPlug is custom-developed software created by our team. This section provides insight into security-relevant considerations around the software design and components.

Frameworks

SkyPlug is a combination of backend applications implemented with Microsoft .NET: .NET 8 and a frontend web application created with Vue.js. The backend applications are all developed on the ASP.NET Core web framework. These are widely used open-source frameworks with strong track records and active developer communities. Recent .NET versions have been subject to very few known vulnerabilities and the few discovered ones have been fixed quickly.

All code is strongly typed and verified by compilation prior to deployment. The backend API and worker is written in C#, and the frontend client is written in strict TypeScript.

Dependencies

We leverage a minimal set of well-vetted open-source libraries to provide common functionality. These libraries are selected based on their history, quality, and development activity. We do not use obsolete or abandoned libraries specifically for the assurance that updates can be expected if a vulnerability is discovered in a library or its dependencies.

A list of dependencies is available upon request for the curious.

No Installation

SkyPlug users interact with the system using a browser-based web application. There is nothing to install and thus no risk to user devices.

Secret Isolation

SkyPlug requires a limited set of secrets to secure various aspects of functionality. Such secrets are strong cryptographic values generated randomly and stored in Azure Key Vault, where they can be accessed only by specifically allowed app instances and users.

No secret keys or sensitive values are stored as plain-text configuration values at any time.

Common Threats & Mitigations

There are well-understood threats for a web-based system like SkyPlug. These have been considered and mitigations implemented throughout the development process. These include:

Denial of service: SkyPlug runs in Azure datacenters which provide baseline and automatic protection against denial-of-service flood attacks. We also enforce rate limiting and have multiple proxies between the internet and the backend API endpoints.
Malformed input: the SkyPlug API validates all requests for properly formatted parameter values. Malformed requests are rejected with a 400 response.
Broken access control: SkyPlug is careful with access control at multiple points in the software stack. We implement a least privilege model that denies by default. All requests are authenticated and authorized. Cross-Origin Resource Sharing policy is configured to prevent interaction from unauthorized domains. Sessions are time-restricted and invalidated after expiration.
Weak cryptography: we leverage modern, strong cryptographic algorithms and techniques throughout, including for encryption ciphers, secure hashing, random number generation, signature verification, and certificates.
Injection: SkyPlug database queries are always constructed by the service, not the user. All user input is strictly interpreted and parameterized for safe and limited use with data queries to prevent opportunity for injection-style manipulation.
Misconfiguration: all system configuration and values are well-defined in code. Infrastructure configuration changes are performed through validated definition deployments rather than manually.
Request forgery: in any case a user-supplied URL is processed by the system, it is sanitized and validated against expected parameters.

Data Auditing

Changes to SkyPlug specific customer data are audited to capture an immutable history of changes. The audit information includes a summary of the changes including previous and new values, the user performing the change, and a timestamp of when it occurred.

Audit records are generated and stored inline during data modification operations.

Management & Operations

This section covers aspects of the security-relevant processes involved in the operation of the SkyPlug service.

Access Management

As a cloud-hosted infrastructure, there are various services we use to host and operate SkyPlug. In all cases, user accounts for these services have strong passwords and two-factor authentication enabled. Least privilege is followed in this context as well, with accounts given minimal permissions to accomplish the required management tasks.

Deployment & Updates

Changes to the codebase are deployed using an automated pipeline. There are two stages: test and production. The test stage provides a safe first stop to fully evaluate the changes in a live environment that matches production. Once the test stage deployment and evaluation are complete, the production deployment can be triggered manually at an opportune time selected by us. Deployments follow a staggered regional rollout to ensure that only one region is being updated at a time and other regions remain functional. This approach aims to prevent downtime and obviate the need for maintenance windows.

In the case of a problematic deployment, the system can be rolled back to the previous release version using the same process.

Given this setup, updating the system with security patches is straightforward and can be accomplished in less than 10 minutes if the circumstances warrant. We are aggressive with keeping current and typically deploy the latest versions of dependencies within a day or two of their release. This means that the system is typically using the latest available version of everything on a given day.

Builds and deployments occur automatically on a secure, self-hosted build agent. We never build or deploy manually outside of this process.

As with access to other systems, two-factor authentication protects login to the dev ops infrastructure to ensure developer credentials aren't used to inject malicious changes via deployment or release hijacking.

Monitoring & Logging

A variety of logs are recorded at several levels of the system, including by the underlying infrastructure and within the SkyPlug software stack itself. These logs are consolidated by Azure Monitor to provide a single query scope for anomaly monitoring and alerting.

The online services are monitored for uptime and health. Alerts are sent if an unhealthy state or outage is detected.

No sensitive user information is tracked or logged. We care only that the service is doing its job correctly and reliably.

Access Reviews

We regularly review access to our systems to ensure only necessary and valid permissions are granted to as few accounts as possible. Access is revoked immediately after valid use is no longer required.

Further Information & Contact

Please contact us with any questions or concerns. We're happy to provide further information and clarification around any of the topics described above or otherwise.

Reporting

If you discover a security concern related to any aspect of our service, please report it to us. We actively monitor communications and will respond promptly to investigate and resolve any vulnerability or incident.

Email: security@skyplug.io