All Articles

Anomaly Detection for Cities and Airports

Cities and airports often stream their radio traffic to the internet, you can listen to police or firetrucks being dispatched or airplanes being given headings and altitudes.

I had the idea that I’d like to have a sense of when something is “up”, of course, there are various definitions when something is “up”. A few examples: a structure fire, a civil insurrection, or an unplanned closure at an airport. Waiting for local media to announce these events can incur delays sometimes to the point where the announcement is no longer relevant. If I could have a system that would alert me that something is going on, even if it couldn’t tell me what it is would be useful when it is time to seek additional information from other sources.

By sending information from these radio transmissions to Microprediction.com I can obtain a crowdsourced prediction of the activity level of transmissions on the frequency. When comparing the actual amount of activity to the forecasted predictions of activity elicited at a fixed time period previously, if those predictions were very inaccurate it may signal that something is “up”.

Getting the Streaming Audio

To obtain the audio data is quite simple, it is streamed using Icecast via HTTPS. So to obtain the data:

  1. Make the HTTPS GET request.
  2. Accumulate the returned data, there won’t be a fixed content length as the data is streamed.
  3. After a fixed amount of time send the received data to the analysis phase.

See the streaming code.

Analyzing the Streaming Audio

Before the audio can be analyzed, must be uncompressed from the streaming format (MP3). If you were to save the audio into a file and ran the file command you will see this output.

MPEG ADTS, layer III, v2, 16 kbps, 22.05 kHz, Monaural

Performing analysis on a compressed MP3 stream isn’t the easiest. I’m going to use ffmpeg to convert the audio to uncompressed PCM samples.

// Builds the ffmpeg command line
const makePCM = shell([
  'ffmpeg', '-i', audio_filename,
  '-f', 's16le', '-acodec', 'pcm_s16le',
  '-ar', sample_rate,
  pcm_output,
])

The PCM samples will be 16-bit signed little-endian integers. The PCM samples represent the amplitude of the audio at a particular point in time. Tracking the point of wall clock time to an actual sample isn’t super important (I don’t care when there was a transmission). I’m interested in the utilization of a radio frequency over a fixed interval as such the ordering of the samples is important.

Graphing a few seconds of audio in Julia leads to the standard waveform visualizations (that you may have seen in apps like Logic).

using Plots
f = open("/Users/rusty/example.pcm")
y = Vector{Int16}(undef, convert(Int64, stat(f).size / sizeof(Int16)))
read!(f, y)
# Just played around to find an interesting range.
plot(y[80500:84000], xlabel="Sample Time", ylabel="Amplitude", label="PCM Sample Value", dpi=150, size=(600,300))

Amplitude Graph

When the graph shows a value near zero, there is near silence on the radio frequency. Looking at a histogram of values shows this distribution.

Amplitude Histogram

This is expected since most PCM sample values should be close to zero since the frequency doesn’t have a continuous broadcast.

Detecting Blocks of Silence

Loading and analyzing the array of samples is easy in Node.js.

The strategy is to build an array of the indexes of samples that are louder than some sample thresholds. The sample values are transformed to be their absolute value rather than their signed value.

Once the indexes of samples that are loud are determined, iterate through that list and from the distance from one loud sample to the next, convert the sample index into seconds by dividing the distance by the sample rate.

See the audio analysis code.

Over a fixed interval of time, the percentage of time that the frequency was silent is now able to be calculated.

Interfacing with Microprediction

The final step is to publish the percentage of a five-minute window that was silent to Microprediction.com. You can see examples of these streams.

Savannah, GA

This is the activity for Savannah, GA’s police department, it seems to be showing some activity correlated with the waking hours of the population.

Savannah, GA Police

See the stream on Microprediction.org.

Orlando, FL Air Traffic Control

This is air traffic control for the airport in Orlando, Florida, it seems busy during the day and not as busy overnight.

Orlando, FL Air Traffic Control

See the stream on Microprediction.org.

Fire and Rescue Albemarle County, Virginia

This is the activity for fire and rescue services in Albemarle County which contains Charlottesville, VA. This does not currently seem easily explainable yet, but since fires and medical assistance calls may be rare it may be hard to see an underlying pattern.

Albemarle County Fire and Rescue Radio Activity

See the stream on Microprediction.org.

Detecting Anomalies

Stay tuned for more details, but anomaly detection will likely use some of the techniques from these links:

But if you’re interested in digging into this yourself you should look into Microprediction’s list of popular time series packages.

If there is a lot of interest, I may even live code a predictive model for this data with you, reach out to me.