openvocs

Reference Architecture

The openvocs reference architecture complies to the High Level architecture description above and goes beyond with a complete system definition. The system definition is as lightweight as possible and allows adaptations at both sides, clients and server side. A vendor may use the openvocs backend and develop only dedicated clients, or the other way around.

Architecture Description

To build the most lightweigth client possible, openvocs uses an HTML5, CSS, JavaScript based webclient with WebRTC based media distribution.

Our reference implementation focuses on the most high level implemention to build a VoCS system. The client is virtual it instantiates within the webbrowser. A client is always upto date, as it is basically a website, which is loaded from a HTTPs server.

User interaction within the system e.g. selection of participation states for Voiceloops or PTT to transmit audio are converted to some events. These events are transmitted over the signaling channel to the server. As signaling channel the websocket protocol is used. To frame events a JSON structure is used. Events are transmitted as JSON over websockets.

For media transmission SRTP is selected. This is the secure version of the RTP protocol and widely used within the telekommunication industrie. At the client side WebRTC is used to transmit the audio from Microphone to the backend, using the SRTP channel.

The arichtecture selected is based on Webtechnologies to support the High Level implementation of the reference architecture. Nonetheless the protocol suite is quite simple and allows dedicated client implementations based on the usage of JSON over websockets, as well as WebRTC based communication channels.

Breaking down the architecture one step further, the backend needs to be defined in terms of functionality. The High Level description above gives a good hint about the Client Server architecture to connect clients and backend, but to provide VoCS services a backend must support the VoCS specific building blocks.

VoCS specific building blocks are an Authentication and Authorization backend as well as a mixing backend.

For Authentication and Authorization the openvocs reference implementation uses an in memory database based on JSON values. This database allows multi domain usage and is multitenant. It allows different projects within a domain and is Multimission ready.

The mixing backend is build up of microservice based mixer instances. Each proxy connection will use a dedicated mixer to mix the audio stream an user selected over its interface.

The microservice cloud is using Multicast based mixing of Voiceloops.

Media within the system is transmitted over Multicast Voiceloops. Each Voiceloop is using a dedicated Multicast IP. All trafic for a specific Voiceloop will be forwarded to that IP. Forwarding is implemented within the media proxy and transparent for clients. Clients communicate with the proxy and the proxy is forwarding incoming and outgoing media to the client. When a Voiceloop is selected for talk, the media proxy forwards that Voiceloop to the specific Multicast group.

During login to the system each client will be associated with a dedicated Mixing service. The mixing service is basisically a Multicast mixing node. Each Voiceloop a user selects will be mixed and a single stream of audio is transmitted to the proxy, which forwards the audio back to the client.

The above image shows the selection of Voiceloop DEV1, which is mapped to Multicast Group B. In addition some Loops A,C,D are selected for monitoring and mixed within the user's mixer instance. The mixer instance forwards the stream to the proxy, which again forwards to the client. Switching is implemented over an internal API, which is not shown here for simplicity. This mixing functionality implements the core of a VoCS system, multiparty multiconferencing. Our solution using a media proxy to forward streams to and from the backend allows simple client implementations. A client connection is basically a (voice) call to the system, but instead of calling to a conference room, the call is transmitted to a custom mutliconferencing backend.

Switching within the system is quite simple. The Signaling proxy receives a command from the client, checks if the user is allowed to perform the switch and switches the media proxy, or media mixer dependent on the loop state. A monitor switch means to either switch on or off monitoring for a multicast group and therewith the reception of that Voiceloop. Switching a Voiceloop to talk means to switch the media proxies outgoing stream to the Multicast group of the Voiceloop.

Reference Architecture

The reference architecture contains an HTTPS capable server, which has a signaling proxy implementation enabled, combined with a Media Proxy server, a Multicast based backend network and a Mixer Cloud. The Mixer Cloud is actually a set of mixer implementations, which register at the signaling proxy. Each mixer is able to serve one client. The system scales with the amount of mixer services. If a system needs to provide 100 positions in parallel, the cloud must be configured for 100 mixers. Signaling proxys are Webservers with HTTPS and Websocket support. Within the Webserver a VoCS implementation instance is loaded, which provides all signaling event handling as well as user authentication and authorization capabilities.

This setup is highly flexible and adaptable. For operational use cases we deploy 2 instances in parallel and each client connects to both instances. Therewith every service is build up redundant.

Our setup is as flexible as it could be to provide VoCS services. It is highly scalable, based on the amount of mixer instances used and able to provide redundant implementations over the client interfaces.

High Level Architecture

Reference Architecture

Architecture Description

Reference Architecture