Security & Design
Our approach, architecture, and safeguards
SkyPlug is a cloud infrastructure automation system provided to customers through a software-as-a-service model. As such, its management, maintenance, and security are delegated to the service by the customer. Like any platform responsible for delegated automation or management, SkyPlug is granted a degree of trust to perform its functions reliably and securely.
Customers rightly seek to understand what measures SkyPlug has in place to ensure security of the system in all that entails, and thereby minimize risk associated with its use. This below is intended to provide such context.
Cloud Integration Model – Microsoft Azure
SkyPlug integrates with Microsoft Azure to perform automated infrastructure changes such as changing the power state of a virtual machine in a customer Azure subscription (see Figure 1). To achieve this, SkyPlug requires a "Connection" to the customer Azure cloud environment. The Connection represents the information used by SkyPlug to access Azure and in particular the resources in the customer's subscription.
A Connection involves:
- Subscription ID: identifies the subscription containing the resources to be managed
- App Registration: an App Registration is created to represent SkyPlug in the customer's Azure Active Directory instance. This registers an identity for the SkyPlug service within the customer environment and is the principal associated with all operations as recorded in Azure's activity and audit logs.
- Authentication certificate: X.509 certificate pair is used as the authentication credential from SkyPlug to Azure. The public certificate is added to the App Registration and becomes the sole credential by which SkyPlug is authenticated (using its private key) to perform operations against customer resources in Azure.
- Subscription access: permission is granted to SkyPlug within the customer subscription to perform the necessary operations, such as discovering resources and modifying their power states. Permissions are assigned via Azure's role-based access control (RBAC) features.
Given that the Connection represents the access surface area to the customer environment being created during SkyPlug integration, we employ several precautions to protect it from abuse.
No exposed secrets
SkyPlug follows the best practice of certificate authentication to ensure strong cryptographic protection while preventing the exposure of secrets. During Connection creation a public/private key pair is generated by the SkyPlug service. The private key is immediately encrypted and stored securely at rest within our database. A public certificate can then be generated against this private key and provided to Azure. Given that this exported certificate is public, it represents no disclosure risk and has no value to bad actors.
The private key never leaves the SkyPlug service for any reason and is always stored in a form decryptable only by the service itself for ephemeral use during authentication requests. In the unlikely event of a database exposure, an attacker cannot use the persisted Connection data to connect to the customer environment.
Least privilege access
The App Registration created for SkyPlug provides it a security principal within the customer subscription. This principal is then granted specific access to resources via RBAC to perform required operations. We recommend the built-in "Virtual Machine Contributor" role which has the required permissions for SkyPlug purposes. More granular permissions can be granted to reduce the access scope even further, for example to specific resource groups within a subscription. This second level mitigation ensures that in the unlikely event a bad actor somehow leveraged SkyPlug access maliciously, they would only possess limited privileges.
SkyPlug access can be revoked at any time by removing permissions from the associated App Registration or by deleting the App Registration itself.
The authentication certificate is a self-signed X.509 v3 certificate generated internally by SkyPlug during Connection creation with a 2048-bit key length and SHA512 signature algorithm. This is protected at rest using the ASP.NET Core Data Protection API, which provides authenticated encryption using the AES-256-CBC cipher and HMACSHA256 authentication. A per-connection password protects the certificate private key, which is then itself wrapped in the native platform encryption.
The certificate private key operations are delegated to the Linux operating system which is the service hosting runtime.
SkyPlug interacts with the customer Azure environment using a dedicated service principal. All its operations are therefore audited by Microsoft's native logging mechanisms. Azure security tools can be used to monitor the service's activity and alert on any discrepancies.
This section describes the SkyPlug system and software architecture, focusing on aspects relevant to security.
The core functionality of SkyPlug is hosted as a collection of web services in Microsoft Azure datacenters. It therefore inherits the base security posture of Azure, including physical security, network security, infrastructure maintenance, and general management processes. Azure also represents the bulk of the security perimeter for SkyPlug because its core operations and data reside within Azure boundaries.
SkyPlug is hosted within Azure App Service Linux containers. The underlying infrastructure is fully managed by Microsoft, including the maintenance of the host server and container operating systems for proper function, hardening, and patching vulnerabilities. All updates and configuration changes in this regard are automatically applied and transparent to the customer, and do not rely on SkyPlug administrator intervention.
The overall architecture is broken out into several distinct components that are deployed and operated as independent services contributing to the whole. This approach is primarily intended to organize functionality and allow for more targeted deployment, maintenance, and monitoring. It also has the security benefit of facilitating granular security configuration and posture for each service.
Figure 2: System Architecture shows the component breakdown:
- Traffic Manager: as a DNS routing mechanism, does not intercept user requests, which terminate at our proxy.
- Proxy: as front line for requests, it filters and rate-limits public traffic, ensuring only valid requests pass through to service instances.
- API: all API requests handled here. Strict enforcement of authorization and validity.
- Auth: handles only authentication-related operations including login, carefully secured
- Web: serves public web assets, effectively read-only for system data
- Worker: background processing and job execution, not public facing
- Cosmos DB: consolidated, single location for system data
Resilience & Scale
The architecture takes care to eliminate single points of failure to the greatest extent possible. We implement a multi-region strategy in which fully independent instances of SkyPlug services operate simultaneously in different geographic areas. Currently, there are two: East US and West US. As independent instances, either region can operate without the other, providing high availability during regional outages or maintenance. Customers also benefit from opportunity to be physically closer to a SkyPlug instance and may experience lower latency as a result.
As shown in Figure 3, the key components for achieving this distribution:
- Azure Cosmos DB: the fully-distributed cloud-native database that houses all SkyPlug data. Replication of the database is ongoing and automatic, maintaining a separate and synchronized copy of all data in every SkyPlug region. There is no data risk from loss of an Azure region because the entire database is continuously replicated multiple times within a region and between regions.
- Azure Traffic Manager: a distributed DNS-based routing mechanism that directs customers to the nearest healthy SkyPlug region. In the event of a regional outage, automatic failover will direct users to a separate healthy region.
In addition to regional redundancy, individual services also have various mechanisms of built-in redundancy. In each case SkyPlug inherits additional robustness.
The scalability properties of the system benefit from this redundancy, as it is an active-active model. All regions divide and share the overall workload at any given time. Increased demand over time can be satisfied either by adding regions, or by increasing capacity in existing regions. All capacity parameters are tunable at any time.
SkyPlug is a cloud-native system. As such, its primary interface is HTTP requests over the public internet from users to SkyPlug endpoints. These occur either in the form of web browser access to SkyPlug web assets or the SkyPlug API. In either case the same security considerations apply, i.e., confidentiality, authentication, and authorization.
All communications to SkyPlug endpoints are required to occur over secure HTTPS/TLS connections using modern ciphers. These are secured by the public SkyPlug certificate, ensuring all communications between users and SkyPlug are encrypted. This is also true for communications between SkyPlug and Azure, which occur during operations where SkyPlug must read or modify Azure data.
Moreover, all requests are authenticated by default by either a same-site HTTP-only cookie (for browser-based web and API requests) or by JWT tokens (for machine-machine API requests). These must be obtained via an OAuth 2.0 flow involving successful assertion of valid SkyPlug credentials. Unauthenticated requests are rejected with a 401 response.
Finally, all authenticated requests are further authorized against the specific privileges allowed for the user within SkyPlug. SkyPlug's role-based access control system ensures all operations are checked against the user's assigned role. Unauthorized requests are rejected with a 403 response.
SkyPlug enforces request rate limiting to prevent various kinds of abuse including spam and password guessing. Throttled requests are rejected with a 429 response.
SkyPlug is a multitenant system hosting many customers within converged, efficient infrastructure. A paramount security property of any multitenant system is the mechanism enforcing tenant boundaries, ensuring each customer is isolated from one another. In the case of SkyPlug the division is enforced throughout the software stack, starting with the data itself and moving up through the application layer.
At the database level, SkyPlug utilizes Cosmos DB, which among its other features is a partitioned database. Each record is assigned a key that identifies its partition, and queries specify the partition within which they are scoped. SkyPlug leverages this property to partition all customer data with a unique tenant-specific key and issues all queries using the key securely derived from a user's tenant association. Thus, it can guarantee that data returned for a user's query is always and only data associated with the correct tenant. Users from one tenant cannot reach data from another tenant given this mechanism.
In the application layer, every request is identified and scoped to a particular tenant immediately upon being received. The user session is extracted from a verified cookie or JWT access token and the associated tenant identifier is bound to the subsequent operations. As such, all code paths from API endpoints to database queries are tenant-scoped and the boundaries are thereby maintained.
A unique feature of SkyPlug is to allow users to participate in multiple tenants (“Teams”). User 1 might be a member of both Team A and Team B and can switch freely between the two using the SkyPlug app's team switching feature. In this scenario, the currently selected team determines the context of data returned from the SkyPlug API and displayed in the client application. There is no concurrent comingling of data for different teams in a user context.
SkyPlug is API-centric by design: a comprehensive, documented API represents the public interface to all SkyPlug data and operations. The SkyPlug web application accomplishes all tasks through the SkyPlug API. Moreover, customers can use the API directly to implement their own custom processes.
From a security perspective, this makes the system surface narrower and more legible. There are no undocumented interfaces or back channels that could become forgotten or neglected points of vulnerability. The public API is tightly defined and scrutinized for correctness and security.
Authentication & Authorization
All user operations within SkyPlug require that a user demonstrate their identity and possess requisite privilege for the action they are taking. We implement standard best practices in these areas.
Authentication is accomplished using an identity provider model and standard OAuth 2.0 flows. The SkyPlug authentication server is built on the OpenIddict library, an open-source and vetted OpenID server. This dedicated service is responsible for the user login process where a username/password credential pair is entered to obtain a secure cookie representing the user session. In the case of machine-machine API use, the authentication server provides a token endpoint to participate in a client credentials flow resulting in a JWT token.
The authentication server is an isolated app instance designed to require a minimal set of dependencies and minimize public surface area.
The authentication server provides a standard endpoint for inspecting the OpenID configuration.
User Accounts and Passwords
SkyPlug users are represented by a user account record. This contains minimal information to identify the user: an email address (the SkyPlug username) and the user's password. This record is stored in a SkyPlug database container separate from customer data.
The email address is provided during user registration and verified with a unique code email loop before being allowed to generate a user record. Email addresses are unique in the system and registrations cannot be made with existing email addresses.
Passwords are hashed and encrypted using Microsoft's open-source hashing implementation, specifically PBKDF2 with HMAC-SHA512, 128-bit salt, 256-bit subkey, 100000 iterations. The user login process occurs on the authentication server which verifies the password against the hashed value. The plain-text password value is never logged or stored.
Users can perform self-service password resets against their email address. A verification loop confirms ownership of the email account via a random, ephemeral confirmation code.
Passwords are required to be at least 10 characters in length and users are encouraged to use longer passwords with high complexity. Passwords do not expire per latest best practices (NIST guidance).
Role-based Access Control
SkyPlug implements a granular access control system based on roles and permissions. A permission is an action against a given data type:
- None (default)
A role is a collection of such permissions. For example:
A user is granted one or more roles via membership in a tenant (“Team”). This defines their privileges as a member of that team. A user with memberships in multiple teams will have a role per team, which can confer altogether different permissions.
SkyPlug provides several standard roles and supports defining custom roles to meet a team's needs.
- Administrator: All-powerful wizard permitted all actions and data
- Job Manager: Can create and change jobs and resources; can't change connections or team
- Read Only: Permitted to see any data; can't make changes
Software Design & Developer Operations
SkyPlug is custom-developed software created by our team. This section provides insight into security-relevant considerations around the software design and components.
SkyPlug is a combination of backend applications implemented with Microsoft .NET: .NET 7 and a frontend web application created with Vue.js. The backend applications are all developed on the ASP.NET Core web framework. These are widely used open-source frameworks with strong track records and active developer communities. Recent .NET versions have been subject to very few known vulnerabilities and the few discovered ones have been fixed quickly.
All code is strongly typed and verified by compilation prior to deployment. The backend API and worker is written in C#, and the frontend client is written in strict TypeScript.
We leverage a minimal set of well-vetted open-source libraries to provide common functionality. These libraries are selected based on their history, quality, and development activity. We do not use obsolete or abandoned libraries specifically for the assurance that updates can be expected if a vulnerability is discovered in a library or its dependencies.
A list of dependencies is available upon request for the curious.
SkyPlug users interact with the system using a browser-based web application. There is nothing to install and thus no risk to user devices.
SkyPlug requires a limited set of secrets to secure various aspects of functionality. Such secrets are strong cryptographic values generated randomly and stored in Azure Key Vault, where they can be accessed only by specifically allowed app instances and users.
No secret keys or sensitive values are stored as plain-text configuration values at any time.
Common Threats & Mitigations
There are well-understood threats for a web-based system like SkyPlug. These have been considered and mitigations implemented throughout the development process. These include:
- Denial of service: SkyPlug runs in Azure datacenters which provide baseline and automatic protection against denial-of-service flood attacks. We also enforce rate limiting and have multiple proxies between the internet and the backend API endpoints.
- Malformed input: the SkyPlug API validates all requests for properly formatted parameter values. Malformed requests are rejected with a 400 response.
- Broken access control: SkyPlug is careful with access control at multiple points in the software stack. We implement a least privilege model that denies by default. All requests are authenticated and authorized. Cross-Origin Resource Sharing policy is configured to prevent interaction from unauthorized domains. Sessions are time-restricted and invalidated after expiration.
- Weak cryptography: we leverage modern, strong cryptographic algorithms and techniques throughout, including for encryption ciphers, secure hashing, random number generation, signature verification, and certificates.
- Injection: SkyPlug database queries are always constructed by the service, not the user. All user input is strictly interpreted and parameterized for safe and limited use with data queries to prevent opportunity for injection-style manipulation.
- Misconfiguration: all system configuration and values are well-defined in code. Infrastructure configuration changes are performed through validated definition deployments rather than manually.
- Request forgery: in any case a user-supplied URL is processed by the system, it is sanitized and validated against expected parameters.
Changes to SkyPlug specific customer data are audited to capture an immutable history of changes. The audit information includes a summary of the changes including previous and new values, the user performing the change, and a timestamp of when it occurred.
Audit records are generated and stored inline during data modification operations.
Management & Operations
This section covers aspects of the security-relevant processes involved in the operation of the SkyPlug service.
As a cloud-hosted infrastructure, there are various services we use to host and operate SkyPlug. In all cases, user accounts for these services have strong passwords and two-factor authentication enabled. Least privilege is followed in this context as well, with accounts given minimal permissions to accomplish the required management tasks.
Deployment & Updates
Changes to the codebase are deployed using an automated pipeline. There are two stages: test and production. The test stage provides a safe first stop to fully evaluate the changes in a live environment that matches production. Once the test stage deployment and evaluation are complete, the production deployment can be triggered manually at an opportune time selected by us. Deployments follow a staggered regional rollout to ensure that only one region is being updated at a time and other regions remain functional. This approach aims to prevent downtime and obviate the need for maintenance windows.
In the case of a problematic deployment, the system can be rolled back to the previous release version using the same process.
Given this setup, updating the system with security patches is straightforward and can be accomplished in less than 10 minutes if the circumstances warrant. We are aggressive with keeping current and typically deploy the latest versions of dependencies within a day or two of their release. This means that the system is typically using the latest available version of everything on a given day.
Builds and deployments occur automatically on a secure, self-hosted build agent. We never build or deploy manually outside of this process.
As with access to other systems, two-factor authentication protects login to the dev ops infrastructure to ensure developer credentials aren't used to inject malicious changes via deployment or release hijacking.
Monitoring & Logging
A variety of logs are recorded at several levels of the system, including by the underlying infrastructure and within the SkyPlug software stack itself. These logs are consolidated by Azure Monitor to provide a single query scope for anomaly monitoring and alerting.
The online services are monitored for uptime and health. Alerts are sent if an unhealthy state or outage is detected.
No sensitive user information is tracked or logged. We care only that the service is doing its job correctly and reliably.
We regularly review access to our systems to ensure only necessary and valid permissions are granted to as few accounts as possible. Access is revoked immediately after valid use is no longer required.
Further Information & Contact
Please contact us with any questions or concerns. We're happy to provide further information and clarification around any of the topics described above or otherwise.
If you discover a security concern related to any aspect of our service, please report it to us. We actively monitor communications and will respond promptly to investigate and resolve any vulnerability or incident.