On generating PNG wave-form images from audio-files…

…using the commandline.


Quote from the LAU email list:

Erik de Castro Lopo writes:
Robin Gareus wrote:

I was actually surprised that there's actually no sndfile-waveform
tool, yet! – Yeah right. It's a gimmick :)

You write it and I'll be happy to add it to sndfile-tools.


Once upon a time, there was an Oscilloscope. She was quite a cutie, with a big green display and two long probes. wires that were. But her actual beauty was only revealed when she displayed audio signals. Matter of fact, she was so fancied that these days every major audio website features a wave-form, no matter of their actual use.

But before there were websites, there were desktop applications, actually there are still quite a lot of GUIs to display wave-forms [insert 50 frames of application screenshots at 25 fps here]. And even before there were GUIs, people already visualized digitally sampled data,.. BUT HOW? There certainly have been data visualizations sometime between the punch-card-age and windows. Remnants, long forgotten.

These days there is a market for displaying audio-waveforms: on websites, inside apps, on posters or flyers, etc. Rendering those images automatically en masse without user-interaction (no clicking and grunting at a GUI) seems to be a normal thing, right? So there must be applications available that run head-less and spit out waveform images without end..

Go look for yourself: You may find gstreamer-plugins for audio visualizations, but those require a bit of pipeline tinkering.. besides gstreamer is quite a heavy tool-chain for the simple task of creating a wave-form image. Then there's the python script from freesound, (wav2png) which works OOTB (if one has a python interpreter and scikits/scipy from the “audiolab” package installed). ..And that's basically the end of the story when it comes to command-line audio-visualizers; oh wait: There's moodbar - a program to compute `mood bars' of audio files - runs perfectly fine without GUI.

The actual problem sits deeper: There are a gazillion of GUIs and each of them rolls their own wave-form. Sure, the basics are simple, but there's a lot of resources wasted: Peakfiles, blocking, zoom to individual samples, annotations, markers, etc. Tim Orford is currently working on a dedicated 'libwaveform' towards a good generic codebase to address situation.

Anyway, it is really surprising that there are no classic Unix-style free-software command-line apps after over 30 years of GNU and over 13 years of libsndfile. The latter even features a spectrogram rendering application which already provides the necessary building-blocks - but no waveform output.

Now, one should step back and ponder the situation. Surely there must be a good reason for the current state and lessons to be learned. About 10 minutes into the history of Linux-Audio, one will notice that more time has already spent on thinking about the problem than it would have taken to implement a solution. Another ten minutes, and one may come to the conclusion that potetentially a lesson can be learned by just doing it. The process may reveal why such a simple tool is so scarce.

Regarding the lesson, I must disappoint you. The remaining story is simply told as –help text.

The source-code is available from the

and has been merged

have fun!

sndfile-waveform - waveform image generator

Creates a PNG image depicting the wave-form of an audio file.
Peak-signal and RMS values can be displayed in the same plot,
where the horizontal axis always represents time.

The vertical axis can be plotted logarithmically, and the signal
can optionally be rectified.

The Time-axis annotation unit is either seconds or timecode
using broadcast-wave time reference meta-data.

The tool can plot individual channels, reduce the file to mono,
or plot all channels in vertically arrangement.

Colours (ARGB) and image- or waveform geometry can be freely specified.

Usage : sndfile-waveform [OPTION]  <sound-file> <png-file>

Options :
  -A, --textcolour <COL>    specify text and border colour ; default 0xffffffff
                            all colours as hexadecimal AA RR GG BB values
  -b, --border              display a border with annotations
  -B, --background <COL>    specify background colour ; default 0x8099999f
  -c, --channel             choose channel (s) to plot, 0 : merge to mono ;
                            < 0 : render all channels vertically separated ;
                            > 0 : render only specified channel. (default : 0)
  -C, --centerline <COL>    set colour of zero/center line (default 0x4cffffff)
  -F, --foreground <COL>    specify background colour ; default 0xff333333
  -g <w>x<h>, --geometry <w>x<h>
                            specify the size of the image to create
                            default : 800x192
  -G, --borderbg <COL>      specify border/annotation background colour ;
                            default 0xb3ffffff
  -h, --help                display this help and exit
  -l, --logscale            use logarithmic scale
  --no-peak                 only draw RMS signal using foreground colour
  --no-rms                  only draw signal peaks (exclusive with --no-peak).
  -r, --rectified           rectify waveform
  -R, --rmscolour  <COL>    specify background colour ; default 0xffb3b3b3
  -s, --gainscale           zoom into y-axis, map max signal to height.
  -S, --separator <px>      vertically separate channels by N pixels
                            (default : 12) - only used with -c -1
  -t <NUM>[/<DEN>], --timecode <NUM>[/<DEN>]
                            use timecode instead of seconds for x-axis ;
                            The numerator must be set, the denominator
                            defaults to 1 if omitted.
  -T <offset>               override the BWF time-reference (if any) ;
                            the offset is specified in audio-frames
                            and only used with timecode (-t) annotation.
  -V, --version             output version information and exit
  -W, --wavesize            given geometry applies to the plain wave-form.
                            image height depends on number of channels.
                            border-sizes are added to width and height.

Report bugs to <>.
Website and manual: <>
Example images: <>

Example Images

sndfile-waveform  /tmp/ywbs.wav example0.png

sndfile-waveform --border -g 800x400 --logscale --timecode 10 --channel -1 /tmp/ywbs.wav example1.png

sndfile-waveform -b -g 800x200 --rectified -t 25 -T 172800000 /tmp/ywbs.wav example2.png 

sndfile-waveform -b -g 800x200 --channel 1 --logscale --rectified \
  -F 0xFFFFFFFF -R 0xFF000000 -B 0xFF000000  -A 0xFF000000 -G 0xFFFFFFFF \
  /tmp/ywbs.wav example3.png

sndfile-waveform --channel -1 --separator 0 --logscale --geometry 800x240 \
 -F 0xFF30FF00 -B 0xFF000000 -R 0x99000000 \
 /tmp/ywbs.wav example4.png

The End

ToDo Ideas:

  • render meta-data text (artist, title, BWF headers) on Image
  • allow to specify font-family and size(s).
  • FFT analyze, colour highlight peak freq.
  • use different colour for high levels and/or clipped values.
  • allow drawing of only the background outside the peaks and have the peak data be transparent (renders a silhouette ~ how soundcloud does it)

The last one can be achieved by using ImageMagick post-processing to make the peak data transparent:

sndfile-waveform --logscale --no-rms -B 0xffffffff -F 0xFF000000 -C 0xFF000000 INFILE.wav i1.png
convert i1.png -fuzz 50% -transparent black i2.png
display i2.png

I'll take Erik's stance here: you write it, we'll be happy to merge it :)

wiki/sndfile-waveform.txt · Last modified: 30.04.2012 13:01 by