back to index

Polycam-ustreamer


Why
Streaming background
      USB streaming quirks
      stream formats
      stream formats, sensor-to-host
      stream formats, server-to-clients
            transport layer
            choices
            formats/codecs overview
            container formats
How
      code architecture
      important functions
      gotchas
      dirty/todo
options added to ustreamer
      added makefile options
URLs
related tricks
      remote framebuffer screenshot
      v4l2loopback, linux
      IP camera to virtual webcam, Windows
            shenyaocn camera bridge
            OBS Studio
Files
Todo
Images
OpenCV/python tricks
      read grayscale from YUYV shared memory, then read barcode
      read BGR from YUYV shared memory and save as jpeg
      convert BGR to YUYV bytearray (colors from even bytes only, straight drop of colors of odd bytes)
      pixel formats

Why

Many IP cameras are built around variants of Raspberry Pi. Especially with its CSI/MIPI camera it is a powerful and relatively inexpensive system, much more flexible than anything else on the market. Features can be easily added on demand.

USB UVC webcams are another flexible kind of optics. They come in all sorts of shapes and capabilities. The most interesting ones are:

These cams can be used for many sorts of optical supervision/surveillance, alignments, and even machine vision.

An example can be a CNC machine, with one cam surveying the overall workspace, another on the toolhead looking at the tool-workpiece interaction, yet another on the same toolhead but looking straight down (and possibly being a microscope) for optical alignments and workpiece measurements/coordinate acquisition, one more on the side of the machine to inspect the tool for dimensions and wear...


Streaming background

Streaming of video consists of delivery of image frames from camera sensor to the user display (or machine vision software). This consists of the sensor-encoder pathway and encoder/server-client pathway.

USB streaming quirks

USB cams are operated through their /dev/videoX device, via the Video4Linux API.

The camera gets opened, set up via a series of ioctl calls. Then the data are acquired either by requesting a single frame, or by switching on frame streaming.

When a streaming server is started, it opens the assigned camera and tells it to stream. It receives the images typically in a rotating set of hardware buffers - even if no client is attached to the server.

When streaming, the camera reserves a fair amount of bandwidth on the USB bus, for isochronous transfers. Several cams running on individual streaming servers can therefore quickly run out of available USB bandwidth. This is especially annoying with smaller computers with weaker USB interfaces, not built for this kind of load. Raspberry Pi up to v3 is an example - two streams of non-negligible resolution and the whole USB subsystem starts failing.

Streaming of multiple cams therefore brings rather heavy issues - especially as the cams are often not needed all at once, in most cases selecting just one does the job well enough.

No software however appears to hot-select/hot-swap cams on external demand, without interrupting the stream.

stream formats

Data formats are commonly identified by FourCC codes, four-character codes.

stream formats, sensor-to-host

The best known encoding, used with virtually all framebuffers, is RGB. Each pixel has a triplet of values for red/green/blue brightness.

Sensors however often use YUV, where, instead of red/green/blue, each pixel has a triplet of intensity (Y) and two chroma values, describing the color (U/V, or Cb/Cr). Here the human eye's lower color resolution compared to brightness resolution can be leveraged to reduce the amount of data by chroma subsampling.

With camera sensors, 4:2:2 subsampling is often used. Each pair of pixels then contains two values for intensity (one each), and one shared value for color (as U/V pair). The data volume is instantly reduced by a third.

The frame formats come in two families:

Common planar format is the NV21, used for Android camera previews. Here the 4:2:0 image is sent as an array of Y values, followed by array of interleaved U/V values.

Most cameras however provide data as packed, either in individual pixels containing all data (RGB888, RGB565, YUV444...) or in "macropixels", groups of pixels with some data (typ. color) shared across 2, 4, 8, or more pixels.

The most important pixel formats are:

                   bytes/
format subsampling  pixel   pixels per color info
RGB888    4:4:4     3       1        every pixel has full RGB triplet, full 8bit color depth; also called RGB24
RGB565    4:4:4     2       1        every pixel has full RGB triplet, reduced color depth; also called RGB16
YUV444    4:4:4     3       1        every pixel has full YUV triplet
YUV422    4:2:2     2       2        color shared across even/odd horizontal pixel pairs
YUV411    4:1:1     1.5     4        color shared across horizontal quads of pixels
YUV420    4:2:0     1.5     4        color shared across 2x2 pixel squares

Other family of pixel formats is derived from the RGB model, using the Bayer filter arrays. Here, the "macropixels" are composed of arrays of

fourcc   BA81 GBRG GRBG RGGB BG10 GB10 BA10 RG10
pixels    BG   GB   GR   RG   BG   GB   GR   RG
pixels    GR   RG   BG   GB   GR   RG   BG   GB

Sensor data come from the camera usually packed, in several possible formats:

There is a plethora of frame and pixel formats. Best list on hand is the V4L2 header, /usr/include/linux/videodev2.h.

stream formats, server-to-clients

Video streaming comes in several different formats, each suitable for different use cases.

transport layer

The data can be transferred by several different means:

choices

camera side:

stream side:

For lower framerates and smaller resolutions, the MJPEG is often the best choice.

formats/codecs overview

Formats:

container formats

Comparison of video container formats

Video stream is often present together with audio stream or other video streams, metadata, subtitles, program guide, etc., sometimes standalone as the only component in the container. The different data types get multiplexed together in a container format.

Container formats are used for combining several different chunks of information together; e.g. a .jpg image is a JFIF container with JPEG data as a chunk inside.

In video, most common formats are:

Common "naked" streams are eg. WMA/WMV and MP3.

To play a file, the player has to understand both the container format and the codecs used inside. If a stream codec is not supported, only video or audio can be played with the other missing.


How

Raspberry Pi was chosen as a platform for its availability and cost.

An opensource streaming software was stumbled upon. Low latency and hardware acceleration.

The compression is accelerated via OpenMAX (OMX) API of the Raspberry Pi's GPU.

The µstreamer (ustreamer) was found to support hotplugging of the cameras. In case the /dev/videoX it is connected to cannot be opened, a blank "NO SIGNAL" image is streamed instead. Meanwhile the main stream thread is attempting to reconnect.

This behavior was leveraged for multiple camera streams selection. The request rewrites the device name, then the frame acquisition loop is exited and reconnection is triggered.

The reconnection takes roughly 500 milliseconds. Most of the time is between the IOCTL call for streaming enable and the moment when select() starts actually getting some data.

code architecture

ustreamer is a multithreaded software. The important threads are:

The individual source files handle different subsystems:

  config.h
! device.c/h          open/close/setting/ioctls of /dev/video
  encoder.c/h
  gpio.h              raise/lower a GPIO to profile code
  logging.c/h         output logs
  main.c
  options.c/h         commandline options
  picture.c/h         image buffer copy and compare (for identical frame drop)
  process.h           comm to parent process, process name
! stream.c/h          main loop for the read-compress-output
  threading.h
  tools.h
  xioctl.h            defines for ioctls

  http:
    base64.c/h        base64 enc/dec
    blank.c/h         read JPEG image to the "no signal" static frame
    mime.c/h          guess MIME type from extension
    path.c/h          simplification of URL paths with doubledots etc.
!   server.c/h        HTTP server connections, request handling, stream handling; leverages libevent2
    static.c/h        static page definitions - blank image, main page
    unix.c/h          unix sockets
    uri.c/h           parameter extraction from parsed HTTP requests; bool and strings

  http/data:

  encoders/cpu:
    encoder.c/h       basic nonaccelerated compression - simple to understand
  encoders/hw:
    encoder.c/h       pass-through for already-compressed data from cam - simplest but useless for overlays
    huffman.h
  encoders/omx:
    component.c/h
    encoder.c/h       OpenMAX accelerated compression
    formatters.c/h    state/error codes to strings

The threads are communicating through shared data structures.

Some structs have pairs of static (where setups happen) and runtime ("run") for live data related to the connection. The runtime one is usually a child element.

The bulk of modifications is in stream.c and the http/server.c.

important functions

gotchas

The makefile does not take in account the dependencies on header files. Change a struct in eg. device.h/device.c, recompile, and the binary will segfault somewhere where the ol' compiled .o files expect the old struct data positions. Oopsies. make clean and a little time solves this.

The assert() call is positive on evbuffer_add_printf and negative on ebuffer_add.

Note the "!"!!!

dirty/todo

Many of the modifications rely on global variables instead of their data being nicely packed into structs. This will be cleaned up Sometime Later.


options added to ustreamer

added makefile options

make <option>=1 <anotheroption>=0 ... (all listed options enabled by default)


URLs


related tricks

remote framebuffer screenshot

 ssh $SERVER fbgrab -d /dev/fb1 /tmp/x.png
 scp $SERVER:/tmp/x.png $TARGETFILE

v4l2loopback, linux

https://github.com/umlaeute/v4l2loopback

A kernel module that creates virtual video devices, /dev/videoX loopback/loop device interfaces.

IP camera to virtual webcam, Windows

Several choices, usually rely on DirectShow/Media Foundation (Microsoft Media Foundation, MMF) framework. [ref]

Virtual camera sources are created, usually by installing a DLL file.

Virtual cams are usually DirectShow devices. These work with DS-using software, but MMF-based software won't see them.

Not all programs support the desired camera interface type. Eg. Skype (app form) does not see them, while Skype (desktop form) does. Irfan View in its "select scan/TWAIN source" menu offers local USB webcams but not the virtual ones. Windows 10 "Camera" app does not see them as well.

AMCap has no problems.

Beware of 32/64 bit programs, they seem to require different versions of the running DLLs. Both are usually installed simultaneously.

shenyaocn camera bridge

https://github.com/shenyaocn/IP-Camera-Bridge

shows as "IP Camera Bridge Plus", behaves nicely, does not support HTTP redirect responses

OBS Studio

Open Broadcaster Software, with a virtual cam DirectShow plugin


Files

CAUTION: preliminary, not cleaned up, not release-ready, raw code for raw nerves


Todo


Images




OpenCV/python tricks

OpenCV uses NumPy arrays, usually of uint8 type, for image handling. Pixels are stored in arrays, and can have one discrete value (for grayscale), two (for eg. interleaved YUYV), three (for RGB/BGR/YUV/HSV/HLS/LAB/LUV...), or four (RGBA/BGRA, when alpha channel is used).

A RGB image is typically a 3-dimensional array, with [y][x][color] format, with color order of BGR, or blue-green-red.

A grayscale image is a 2-dimensional array only.

[y][x][RGB] array "a":

read grayscale from YUYV shared memory, then read barcode

read each even byte of binary frame (Y only)

 raw=open('/dev/shm/imgtest.bin','rb').read(640*480*2)[::2]
convert bytearray to numpy array of uint8, reshape to [y][x]
 grayscaleframe=np.frombuffer(raw,'uint8').reshape(480,640)
decode barcode
 print(pyzbar.decode(grayscaleframe))

read BGR from YUYV shared memory and save as jpeg

read entire binary frame

 raw=open('/dev/shm/imgtest.bin','rb').read(640*480*2)
convert bytearray to numpy array of uint8, then reshape to [y][x][depth], then convert to RGB
 bgrframe=cv2.cvtColor( np.frombuffer(raw,'uint8').reshape(480,640,2), cv2.COLOR_YUV2BGR_YUY2)
save file
 cv2.imwrite('/tmp/x.jpg',bgrframe)

convert BGR to YUYV bytearray (colors from even bytes only, straight drop of colors of odd bytes)

 yuvframe=cv2.cvtColor( bgrframe, cv2.COLOR_BGR2YUV)

convert to linear array

 yuvframe=yuvframe.reshape(640*480*3)
create target output, with 8bit unsigned
 yuyv=np.zeros(640*480*2,dtype='uint8')
populate each 2nd byte of target (Y) with each 3rd byte of source
 yuyv[::2]=yuvframe[::3]
populate each 4th+1 byte of target (U) with each 6th+1 byte of source
 yuyv[1::4]=yuvframe[1::6]
populate each 4th+3 byte of target (V) with each 6th+2 byte of source
 yuyv[3::4]=yuvframe[2::6]
convert
 barr=bytearray(yuyv)
...alternatively...
 barr=yuyv.tobytes()

pixel formats

There are also plane formats with brightnesses and colors in separate arrays

Usual number formats:


If you have any comments or questions about the topic, please let me know here:
Your name:
Your email:
Spambait
Leave this empty!
Only spambots enter stuff here.
Feedback: