back to index

WaveSync-shad


Purpose
Installation
Usage
      master/sender
      slave/receiver
            raspi "disp"
            android/termux
            windows 10
Options
      additional
      original
Code
      Terms
      Timing
      Code structure
            encountered locations
Termux issues
      time syncing
            python ntplib
            non-termux environments
      callback-based playback
            issues
            callback mode issue
FILES
TODO

Purpose

Sometimes it is necessary to perform audio playback from multiple computers. Bringing sound to several rooms is a common application.

For this purpose, there is a Python3 script available via pip, called Wavesync.

Available via GitHub (https://github.com/blaa/WaveSync), or in most distributions via pip.

The script takes in audio via a unix pipe from PulseAudio server, chops it to packets with size usually bound by local network's MTU, attaches header with the time when the sample is supposed to be played on the client, sends it to each client via unicast and to everyone who wants via multicast.

It requires tight clock synchronization between master and slave(s). This is usually done by NTP, but for this a third-party daemon has to be running as a root - not always possible.

It uses PortAudio for audio output, via PyAudio library.


Installation

Usually sufficient: pip3 install wavesync

Depends on pyaudio, which is a wrapper for PortAudio library. If it complains about a missing library (.so or .dll), it is this one.


Usage

assuming master and NTP server are at 10.0.0.100

master/sender

in pulseaudio server, in /etc/pulse/default.pa, set the unix socket sink:

 load-module module-null-sink sink_name=wsync
 load-module module-simple-protocol-unix rate=44100 format=s16le channels=2 record=true source=wsync.monitor socket=/tmp/music.source

restart pulseaudio

 pulseaudio -k
 pulseaudio -D

start the master

 wavesync --tx /tmp/music.source --channel laptop:1111 --channel disp:1111 --channel phone:1111 --latency 1000

slave/receiver

raspi "disp"

ordinary

 wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100

callback (test)

 wavesync --rx --channel 0.0.0.0:1111 --tolerance 200 --ntp=10.0.0.100 --callback --device-index 0

android/termux

REQUIRES callback, has weirdly large NTP jitter, so NTP tolerance --ntptol is higher; sink-latency tuned for simultaneous play

 wavesync --rx --channel 0.0.0.0:1111 --buffer-size 2000 --tolerance=200 --sink-latency=90 --ntp=10.0.0.100 --ntptol 15 --callback

windows 10

 python C:\Users\user\AppData\Local\Programs\Python\Python38\Scripts\wavesync --rx --channel 0.0.0.0:1111 --tolerance=20 --buffer-size 2000 --ntp=10.0.0.100

Options

additional

All additional options are primarily for --rx mode. NTP server is preferred to run on the --tx machine, in order to have intrinsic synchronization there.

original


Code

Terms

Timing

Intrinsic delays (at 44.1 kHz, 2 channels, 16 bits):

Code structure

Wavesync is a python program, based on libwavesync. Usually lives in /usr/lib/python<version>/site-packages/libwavesync/ directory.

It consists of several files (modified files in bold):

encountered locations


Termux issues

The software was never intended to be run on android phones. It can be done via python interpreter in Termux. It however has several issues.

time syncing

The packet timestamps are related to absolute time from the local clock. The sync is done in time_machine.py module, by calling datetime.utcnow().timestamp() (returns a float).

The local clock syncing has to be done externally, usually by NTP or PTP.

This introduces dependency on write access to local clock, which eg. on unrooted Android is not always feasible.

In android, there is no direct write access to the system clock. Without root, a NTP query can not change the clock, only can tell the offset. Fortunately, that is all that's needed; a known offset can be added to the local clock result in the time_machine.now() call.

A workaround was done, by running process's own NTP client in its own thread. NTP query in userspace can be done via eg. python ntplib library. Ideally this is done against a local server, on local LAN.

Due to appaling drift of clocks and jitter of queries, fairly frequent queries have to be done; once per 10 seconds was chosen to easily observe the drifts, and for rapid resynchronizations in case another clock-syncing mechanism acts and changes the clock by too much in too short time.

The jitter of the queries was addressed by keeping a local stock of ten last queries, and returning averaged result.

The step change caused by ntpd kicking in desynchronizes the client. This is detected by next ntplib NTP query; an above-limit difference against the average is taken as such step synchro and the buffer is forgotten.

If the NTP server is specified on the commandline, the queries are run in a separate thread, using a javascript-like recursive threading.Timer call.

python ntplib

https://pypi.org/project/ntplib/ - userspace process for ntp queries

 import ntplib
 c=ntplib.NTPClient()
 resp=c.request('0.pool.ntp.org',version=3)
 print(resp.offset)

non-termux environments

The NTP queries work on other platforms. Tested successfully on Windows 10, and on raspberry pi raspbian, both over wifi.

callback-based playback

To facilitate the precision-synced playback, a constant output delay has to be maintained. This is achieved by management of the size of the output queue/buffer.

The original software uses blocking writes, with pulseaudio as a backend.

The playback loop that sends data to the output device relies on stream.get_write_available(). This call returns the space in the device buffer that can be written to immediately, without blocking. This allows manageable delaying of the output and dropping chunks that are coming faster than the device can play them. However, in termux variant of the portaudio library this call always returns, drumrolls please, zero. So the loop was stuck, dropping all the packets and complaining, quote, "Hey, the output is STUCK!".

First attempt was just removing the condition. Voila, the playback started. Aaaaaand, the now non-dropped packets caused the stream to lag compared to the rest of the devices. Ooooops. Back to the drawing board.

Next attempt was rewriting the code from blocking writes to callbacks. Have own queue with depth that can be monitored easily.

The first sub-attempt failed. The callback happened once, then it died. Adding a conditional stream.start_stream() did not work. Adding stream.stop_stream() before the start stream made it work but AWFULLY choppy.

After the first callback, both stream.is_active() and stream.is_stopped() gave False. Hint hint...

The net search said nothing. The documentation was silent about this symptom. Turned out that the length of the data block sent in the callback has to match the size specified in the frames_per_buffer call. OOPS. Constant was changed from self.buffer_size (8192) for the blocking-writes buffer to 367 (number of frames per packets - 2 channels, 2 bytes per sample, 1468 data bytes per packet).

A FIFO queue had to be added; the raw sending was jittery and the callback was picking up the same block twice sometimes. As each packet is about 8 milliseconds of sound, even few levels of queue quickly add to noticeable delay. A Python "queue" library was chosen for this task, level was set to 3 packets in queue and aggressive probabilistic dropping afterwards, and the play callback throws out every other chunk if the queue length exceeds a limit. This got the playback to manageable performance.

Frequent short drop-outs persist. Uneven packet delivery and some out-of-order packets are suspected. TODO, queue sorting by timestamp.

Over longer-term playback, slight delay tends to accumulate. Restart helps. Suspected something within Portaudio or termux's Pulseaudio. Possible workaround involves taking code from play-audio, or (better) a minimal streamer with callbacks.

Disconnect-reconnect the stream at a start of a detected silence period seems to help.

The callback output option can be selected by argument --callback, instead of the default blocking-write output.

issues

The data sometimes come in bursts. The data are also sometimes consumed in bursts. This requires aggressively managing the local queue length, even for the cost of some data loss and artefacts - at usual setting every packet/chunk takes over 8 milliseconds.

The write-to-local-queue call that replaces the write-to-buffer call is increasing the queue length.

As the data are consumed in roughly the rate they come in, the queue increase tends to stick for a long time. If over-the-limit length is detected, every other chunk gets discarded instead of output.

Sometimes packets come in different order. Such packets in an unsorted queue would cause drop-outs. As of now they are detected and discarded.

On termux, the blocking call doesn't work at all. The callback was written to address exactly this.

Callback mode can suffer from underruns.

callback mode issue

The playback can be a little choppy/uneven. This happens on both android and raspi (less on the latter).

On raspi, on the jack connector (no HDMI), the callback playback is uselessly choppy with the default output. --device-index 0 helps greatly.

On windows 10, the callback mode behaves surprisingly well. May be a good wifi on the test machine.


FILES

original version: https://github.com/blaa/WaveSync

for review:

for case:


TODO


If you have any comments or questions about the topic, please let me know here:
Your name:
Your email:
Spambait
Leave this empty!
Only spambots enter stuff here.
Feedback: