Easily and Securely Setting Up Future Wireless Devices

Have you ever paired your smartphone or computer with a Bluetooth device, added a ZigBee device to your home ZigBee network, or used a mobile app to set up an IoT device? Have you ever wondered if your connection is secure or if there is an attacker intercepting all the traffic? Are you scared that you might be pairing with a malicious device that will inject keystrokes, rather than with the device you actually intended to pair with? Unless one has a Faraday cage (and sometimes even if one does), it’s almost impossible to know the answers to these for certain. If you are interested in a solution, read on!

What’s The Problem?

Bluetooth’s pairing process guarantees that you securely connected to some device. However, it does not guarantee that you securely connected to the correct device! If one pairs with a malicious or compromised device, the device might be able to act as a keyboard and inject keystrokes. Other devices have even worse provisioning protocols, some of which require vendor-provided apps (eww!). If you are lucky, you have to hope that nobody is performing an active monster-in-the-middle (MITM) attack on the pairing process. If you aren’t, you have to hope that nobody is sniffing on your wireless communication, either during setup (for the somewhat-bad devices) or at all (for the really bad ones).

What Went Wrong?

The reason that secure setup is so hard, and so rarely implemented, is that many devices have incredibly poor I/O capabilities. You can’t reasonably enter a passcode on a device with no keyboard, and a device without a display can’t display a code.

How Should Things Work?

The device comes with a QR code. You scan it with your phone, and it tells what kind of device the device is. You press the “Connect” button, and your phone tells you to push a button on the device. Afterwards, your phone and the device are connected. You can then control the device via standardized Bluetooth protocols or via a web interface.

Fixing This

This proposal is to fix the problem once and (hopefully) for all, by using a QR code on the device containing the device’s public key. It requires only that the device have two buttons and (unlike SmartStart) has no timing dependencies at all. This means that you can take a break and come back, and the device will always be in the exact state you left it as far as the protocol is concerned. Furthermore, it is secure against an attacker who can perform monster-in-the-middle attacks on all traffic except for scanning a QR code. Since visual MITM is easy to exclude with the human eye, that’s enough.

How does this work? The key idea is for the controlling device, or controller (such as a laptop or smartphone), to establish a secure connection to the device being controlled. Since the controlled device accepts only one controller at a time, a controller knows that there are no other controllers connected. Therefore, the controller can tell the user to press a button labeled “Confirm Connection,” which tells the device that it can trust the host and accept commands from it.

The Formal State Machine

Device starts in ready to provision state.
User uses the hosts camera to scan a QR code on the device. On error, go to step 1.
Host makes secure connection with device. This fails if there is a different host’s key stored in non-volatile memory. In this case, user must use reset to return device to step 1.
Device stores the host’s long-term key in non-volatile memory.
The host displays that it is securely connected and directs the user to press the “add controller” button on the device.
User presses “add controller” button on device to confirm.
Device marks the host long-term key as trusted.

Device Side

The device either has zero or one long-term host keys stored in its non-volatile memory. The host key may either be trusted (able to send commands to the device) or untrusted (not able to send commands). A device with no host key stored will store the host key of the first host that connects, but MUST NOT mark it as trusted until the user presses a button on the device. Otherwise, any host could take control of the device. If the device has a stored host key, it MUST NOT accept connections from any host with a different key. Otherwise, another host could connect to the device before the user presses the “Confirm Connection” button. However, it MUST accept connections from a host with the same key, ensuring that an interrupted connection can always be resumed. A device MUST NOT accept commands from a host with a key that is not marked as trusted, as any host can connect to a device if no other host has connected already.

Devices MUST allow the user to clear a host key. This SHOULD take the form of a button on the device. This MUST NOT require that the host be present or available, as the host might not be exist anymore, and MUST require physical or otherwise privileged access to the device. The only exception is if allowing one to reuse the device with a different host without authorization would create a specific, exploitable security vulnerability. For instance, an alarm control panel might not consider anyone with physical access to be trusted.

If the key provided by the host is different than the one the device has stored, the device MUST indicate this in its reply to the host. This allows the host to display a useful error message, such as “Device is already paired to a different host. Please press button ABC to reset the device and connect anyway.” Resetting the device MUST erase any sensitive information that is accessible to the host for security reasons.

Host Side

The host side is simpler. The host obtains the device key and some other information (such as the connection type) by scanning a QR code on the device itself. Hosts MUST prompt the user with the type of connection being used and the type of device that will be connected to before they continue with a connection attempt, unless the host will only be checking if the device has a stored host key.

Open Questions

What should the format of the public key be? What should the format of the protocol messages be? This is just a high-level description, and doesn’t provide any of the details needed for a concrete implementation.
Should devices be required to expose their status via LEDs, or is it sufficient for them to expose their status to any host that asks them?
Should there be mandatory rotation of host and device keys?
What algorithms should be supported?
Should the long-term device key be required to be in a secure hardware module, such as a secure element? This is logical as it is a long-term key, but might create obstacles to adoption.