🤖AI-generated documentation✓ curatedAI Generated
About content generation types
(e.g., docs generated from codebase analysis)
(e.g., livestream → blog post, meeting notes → docs)
(e.g., hand-written tutorial)
WebSocket Protocol
What is a WebSocket?
A WebSocket is a persistent, two-way connection between a client (like a web browser) and a server. Unlike regular HTTP requests (which are one-shot: you ask, you get a response, the connection closes), a WebSocket stays open so the server can push data to the client continuously — perfect for streaming live camera frames. SkellyCam uses WebSocket for real-time video streaming and also for sending log messages and status updates.
Learn more: WebSocket (Wikipedia)
The WebSocket endpoint at /skellycam/websocket/connect carries three types of traffic:
- Binary frame payloads (server -> client) — Multi-camera JPEG frames
- JSON messages (server -> client) — Log records, framerate updates, application state
- Text messages (client -> server) — Frame acknowledgments and ping/pong
Connection Lifecycle
- Client opens a WebSocket connection to
/skellycam/websocket/connect. - Server accepts the connection and starts four concurrent tasks: image relay, log relay, state sender, and client message handler.
- Client begins receiving messages immediately.
- When the client disconnects (or the server shuts down), all tasks are cancelled and the connection is closed.
Binary Frame Payload Format
When cameras are active, the server sends binary (bytes) messages containing JPEG-compressed frames from all cameras in the active group. Each binary message is a self-contained multi-frame payload with the following structure:
+--------------------------------------------+
| Payload Header (24 bytes) |
+--------------------------------------------+
| Frame Header Camera 0 (56 bytes) |
+--------------------------------------------+
| JPEG Data Camera 0 (variable) |
+--------------------------------------------+
| Frame Header Camera 1 (56 bytes) |
+--------------------------------------------+
| JPEG Data Camera 1 (variable) |
+--------------------------------------------+
| ... |
+--------------------------------------------+
| Frame Header Camera N (56 bytes) |
+--------------------------------------------+
| JPEG Data Camera N (variable) |
+--------------------------------------------+
| Payload Footer (24 bytes) |
+--------------------------------------------+
All multi-byte integers are little-endian. Structures use aligned layout (numpy align=True), which introduces padding bytes for natural alignment.
Payload Header (24 bytes)
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 1 | uint8 | message_type | Always 0 (PAYLOAD_HEADER) |
| 1-7 | 7 | — | (padding) | Alignment padding for 8-byte frame_number |
| 8 | 8 | int64 | frame_number | Monotonically increasing frame counter |
| 16 | 4 | int32 | number_of_cameras | Number of camera frames in this payload |
| 20-23 | 4 | — | (padding) | Struct alignment padding |
Frame Header (56 bytes, one per camera)
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 1 | uint8 | message_type | Always 1 (FRAME_HEADER) |
| 1-7 | 7 | — | (padding) | Alignment padding |
| 8 | 8 | int64 | frame_number | Same frame number as the payload header |
| 16 | 16 | ascii | camera_id | Null-terminated ASCII string, zero-padded to 16 bytes |
| 32 | 4 | int32 | camera_index | Integer index of the camera |
| 36 | 4 | int32 | image_width | Width of the JPEG image in pixels |
| 40 | 4 | int32 | image_height | Height of the JPEG image in pixels |
| 44 | 4 | int32 | color_channels | Number of color channels (typically 3) |
| 48 | 4 | int32 | jpeg_string_length | Length of the following JPEG data in bytes |
| 52-55 | 4 | — | (padding) | Struct alignment padding |
Immediately following each frame header is the raw JPEG data (jpeg_string_length bytes). There is no padding between the JPEG data and the next frame header.
Payload Footer (24 bytes)
| Offset | Size | Type | Field | Description |
|---|---|---|---|---|
| 0 | 1 | uint8 | message_type | Always 2 (PAYLOAD_FOOTER) |
| 1-7 | 7 | — | (padding) | Alignment padding |
| 8 | 8 | int64 | frame_number | Must match the payload header's frame number |
| 16 | 4 | int32 | number_of_cameras | Must match the payload header's camera count |
| 20-23 | 4 | — | (padding) | Struct alignment padding |
The footer serves as a consistency check — parsers can verify that frame_number and number_of_cameras match the header.
Message Type Constants
| Value | Name | Description |
|---|---|---|
0 | PAYLOAD_HEADER | Start of a multi-frame payload |
1 | FRAME_HEADER | Per-camera frame metadata (followed by JPEG data) |
2 | PAYLOAD_FOOTER | End of a multi-frame payload |
Parsing a Payload
To parse a binary payload:
- Read 24 bytes -> payload header. Verify
message_type == 0. Extractframe_numberandnumber_of_cameras. - For each camera (
number_of_camerastimes):- Read 56 bytes -> frame header. Verify
message_type == 1. Extractjpeg_string_length. - Read
jpeg_string_lengthbytes -> raw JPEG image data.
- Read 56 bytes -> frame header. Verify
- Read 24 bytes -> payload footer. Verify
message_type == 2and thatframe_number/number_of_camerasmatch the header.
Implementation References
- Python (server):
skellycam/core/types/frontend_payload_bytearray.py—create_frontend_payload()builds the binary payload using numpy structured arrays. - TypeScript (client):
skellycam-ui/src/services/server/server-helpers/frame-processor/binary-protocol.ts— struct definitions and field offsets.binary-frame-parser.ts—parseMultiFramePayload()parses the binary data and createsImageBitmapobjects.
Image Processing Notes
The server resizes each camera frame before JPEG encoding. If the client has sent displayImageSizes in a frame acknowledgment, the server resizes to match the client's display dimensions. Otherwise, images are scaled to 50% of their native resolution. JPEG encoding uses quality level 80 by default.
JSON Messages (Server -> Client)
Log Records
{
"message_type": "log_record",
"levelname": "INFO",
"levelno": 20,
"message": "Camera group started",
"name": "skellycam.core.camera_group",
"filename": "camera_group.py",
"lineno": 42,
"funcName": "start",
"created": 1700000000.123,
"formatted_message": "2024-01-01 12:00:00 [INFO] Camera group started",
"delta_t": "0.123ms"
}
Log records at TRACE level (level 5) and above are forwarded to the WebSocket. The frontend displays these in the log terminal panel. Log records are produced by skellylogs.
Framerate Updates
{
"message_type": "framerate_update",
"camera_group_id": "group-0",
"backend_framerate": {
"mean_frame_duration_ms": 33.3,
"mean_frames_per_second": 30.0,
"frame_duration_max": 40.1,
"frame_duration_min": 28.5,
"frame_duration_mean": 33.3,
"frame_duration_stddev": 2.1,
"frame_duration_median": 33.2,
"frame_duration_coefficient_of_variation": 0.063,
"calculation_window_size": 100,
"framerate_source": "Server"
},
"frontend_framerate": {
"mean_frame_duration_ms": 34.1,
"mean_frames_per_second": 29.3,
"framerate_source": "Display"
}
}
Sent approximately every 250ms when cameras are active. The backend_framerate represents the true camera capture rate (computed from frame numbers and capture timestamps, accurate even when the WebSocket skips frames due to backpressure). The frontend_framerate represents the WebSocket delivery rate (what the UI actually receives).
Application State
{
"message_type": "app_state",
"state": {
"camera_groups": {
"group-0": {
"id": "group-0",
"camera_ids": ["0", "1"],
"is_recording": false,
"is_paused": false
}
}
}
}
Sent approximately every 1 second and whenever the application state changes (e.g., recording starts/stops).
Client -> Server Messages
Frame Acknowledgment
After processing a binary frame payload, the client sends an acknowledgment:
{
"frameNumber": 42,
"displayImageSizes": {
"group-0": {
"0": { "width": 640, "height": 480 },
"1": { "width": 640, "height": 480 }
}
}
}
The frameNumber field tells the server which frame has been rendered. The server uses this for backpressure management — it will not send new frames until the previous frame is acknowledged. If the frontend falls behind, the server skips frames to prevent buffer bloat.
The displayImageSizes field (optional) tells the server the current display dimensions for each camera, allowing it to resize JPEG frames to match, reducing bandwidth.
Ping/Pong
Sending the text "ping" will receive "pong" in response. This can be used for connection health checks.
Backpressure Management
The server tracks the last sent frame number and the last acknowledged frame number. If the frontend has not acknowledged the most recent frame:
- The server skips sending new frames.
- After acknowledgment arrives, the server skips one additional frame to let the frontend catch up.
- If the gap exceeds 1000 frames, a trace-level warning is logged.
This ensures the WebSocket buffer does not grow unbounded even when the frontend rendering is slower than the camera capture rate.