Data streams are an abstraction for keeping outputs up-to-date with inputs. A stream yields a sequence of data points. With a stream you do not know how often there will be data points, only their type.
Most inputs and outputs naturally form data streams. Your mouse and keyboard don't hum continually to the motherboard; they send messages when something happens. Similarly, although your monitor has a refresh rate, the image it displays is whatever happens to be in its buffer at the time. Your computer only actually modifies this hardware buffer when it has a reason to - usually in response to an input.
It's a common misconception that when you run a program, it just starts chugging, and keeps chugging until you press the close button. That is, it's a misconception that code has a single entry point. A real application has many entry points. Only one is the main function which is run when you start the application. The other entry points are used when there are inputs to your application.
Multiple difficulties arise because inputs can happen at pretty much any time and any frequency.
- Often, Code B is entered into before Code A is finished. If you coded up a common scratch area where code B and code A both write down notes to communicate with each other, it can actually get complicated to keep this scratch area in its correct state.
- An easy (too easy) program to write is one that allocates 1 MB of memory each time you press a key. This is a memory leak, and will eventually crash the application. After handling an event, it is always important to free any resources that you allocated.
Streams address both of these. First, forget the common scratch area - streams give you safe and explicit ways to combine multiple inputs into a single output. Second, the data that passes through streams is ephemeral. Some streams allocate some memory, but this is for the stream as a whole, not event by event. Memory leaks are largely prevented. The memory leaks that remain are A) easier to debug because all persistent objects belong to a data stream that you declared, and B) caused by your program logic, not by a typo (i.e. forgetting to free memory). Generally, entry points are imported into a data stream abstraction as early as possible. Streams show up in the C++ standard library (do not take this as an endorsement of C++), as well as in Unix for standard input and output.
A stream combinator is any function that takes one or more streams, and produces an output stream. According to FRP (functional reactive programming), to write a computer program you simply define a single output stream as the application of various combinators to input streams.
Here are some stream combinators:
The most common way to use a stream is probably to map over it, i.e. change each value into a new value.
Reducing a data stream is like mapping over it but you can keep a running total. First, the output stream is initialized with a nullary total. Then, for each value yielded by the input stream, a function is applied to both it and the previous total to produce a new total.
You may have heard of map reduce algorithms in big data contexts. Mapping a large data set can be spread out over many processors, but to reduce a large data set you have to bring the data to the same place. This can still be done on many processors, as long as the reduce operation is associative. Here though, we are working with streams, not data sets, and values show up one by one, so it doesn't matter if the reduce operation is associative.
When your users keep hammering the button, you might choose to debounce the stream before you process it.
There are multiple ways to combine multiple data streams into a single multi-value data stream. The easiest way is to pretty much yield a value whenever any input stream yields a value.
If you know that multiple streams are already somehow in sync with one another, then you can zip them together.
Zip is less common than Combine. First, when you use the data stream abstraction you aren't supposed to know how often data flows through each stream. When you use the Zip method, presumably you do know. Second, if one stream is faster than another and they do get out of sync, your program can end up buffering a large amount of data.
Again, a great benefit of using streams is that they let you handle events that occur over long spans of time, without leaking memory. This is not true about this Zip combinator - it leaks memory if the streams go out of sync. The one pictured below has already buffered a square and a circle. I considered leaving Zip out of this post entirely, but decided to put it in as an example of what you usually should not do.
Tests each value, yields the ones that succeed.
Once, and Delay
If you have a single value, you can trivially represent it as a stream: one that yields the value or immediately, or that yields the value after a delay. These are most useful for testing purposes.
Putting it Together
Here's what a program written using FRP might look like. As you see, we let the inputs to the program be data streams. These streams are mapped, reduced, and combined together into a stream of outputs. Finally, the hardware executes upon this output stream.