git ssb

2+

Dominic / pull-stream



Commit 5110c2317317d6ff14a895af98bf5accd29d2612

add examples

Dominic Tarr committed on 11/28/2015, 7:48:07 AM
Parent: 97a87376f0a454d027861fe7e24e0c211260d06e

Files changed

examples.mdadded
examples.mdView
@@ -1,0 +1,88 @@
1+
2+This document describes some examples of where various features
3+of pull streams are used in simple real-world examples.
4+
5+Much of the focus here is handling the error cases. Indeed,
6+distributed systems are _all about_ handling the error cases.
7+
8+# simple source that ends correctly. (read, end)
9+
10+A normal file (source) is read, and sent to a sink stream
11+that computes some aggregation upon that input.
12+such as the number of bytes, or number of occurances of the `\n`
13+character (i.e. the number of lines).
14+
15+The source reads a chunk of the file at each time it's called,
16+there is some optimium size depending on your operating system,
17+file system, physical hardware,
18+and how many other files are being read concurrently.
19+
20+when the sink gets a chunk, it iterates over the characters in it
21+counting the `\n` characters. when the source returns `end` to the
22+sink, the sink calls a user provided callback.
23+
24+# source that may fail. (read, err, end)
25+
26+download a file over http and write it to fail.
27+The network should always be considered to be unreliable,
28+and you must design your system to recover from failures.
29+So there for the download may fail (wifi cuts out or something)
30+
31+The read stream is just the http download, and the sink
32+writes it to a tempfile. If the source ends normally,
33+the tempfile is moved to the correct location.
34+If the source errors, the tempfile is deleted.
35+
36+(you could also write the file to the correct location,
37+and delete it if it errors, but the tempfile method has the advantage
38+that if the computer or process crashes it leaves only a tempfile
39+and not a file that appears valid. stray tempfiles can be cleaned up
40+or resumed when the process restarts)
41+
42+# sink that may fail
43+
44+If we read a file from disk, and upload it,
45+then it is the sink that may error.
46+The file system is probably faster than the upload,
47+so it will mostly be waiting for the sink to ask for more.
48+usually, the sink calls read, and the source gets more from the file
49+until the file ends. If the sink errors, it calls `read(true, cb)`
50+and the source closes the file descriptor and stops reading.
51+In this case the whole file is never loaded into memory.
52+
53+# sink that may fail out of turn.
54+
55+A http client connects to a log server and tails a log in realtime.
56+(another process writes to the log file,
57+but we don't need to think about that)
58+
59+The source is the server log stream, and the sink is the client.
60+First the source outputs the old data, this will always be a fast
61+response, because that data is already at hand. When that is all
62+written then the output rate may drop significantly because it will
63+wait for new data to be added to the file. Because of this,
64+it becomes much more likely that the sink errors (the network connection
65+drops) while the source is waiting for new data. Because of this,
66+it's necessary to be able to abort the stream reading (after you called
67+read, but before it called back). If it was not possible to abort
68+out of turn, you'd have to wait for the next read before you can abort
69+but, depending on the source of the stream, that may never come.
70+
71+# a through stream that needs to abort.
72+
73+Say we read from a file (source), JSON parse each line (through),
74+and then output to another file (sink).
75+because there is valid and invalid JSON, the parse could error,
76+if this parsing is a fatal error, then we are aborting the pipeline
77+from the middle. Here the source is normal, but then the through fails.
78+When the through finds an invalid line, it should abort the source,
79+and then callback to the sink with an error. This way,
80+by the time the sink receives the error, the entire stream has been cleaned up.
81+
82+(you could abort the source, and error back to the sink in parallel,
83+but if something happened to the source while aborting, for the user
84+to know they'd have to give another callback to the source, this would
85+get called very rarely so users would be inclined to not handle that.
86+better to have one callback at the sink.)
87+
88+

Built with git-ssb-web