Commit 5110c2317317d6ff14a895af98bf5accd29d2612
add examples
Dominic Tarr committed on 11/28/2015, 7:48:07 AMParent: 97a87376f0a454d027861fe7e24e0c211260d06e
Files changed
examples.md | added |
examples.md | ||
---|---|---|
@@ -1,0 +1,88 @@ | ||
1 | + | |
2 | +This document describes some examples of where various features | |
3 | +of pull streams are used in simple real-world examples. | |
4 | + | |
5 | +Much of the focus here is handling the error cases. Indeed, | |
6 | +distributed systems are _all about_ handling the error cases. | |
7 | + | |
8 | +# simple source that ends correctly. (read, end) | |
9 | + | |
10 | +A normal file (source) is read, and sent to a sink stream | |
11 | +that computes some aggregation upon that input. | |
12 | +such as the number of bytes, or number of occurances of the `\n` | |
13 | +character (i.e. the number of lines). | |
14 | + | |
15 | +The source reads a chunk of the file at each time it's called, | |
16 | +there is some optimium size depending on your operating system, | |
17 | +file system, physical hardware, | |
18 | +and how many other files are being read concurrently. | |
19 | + | |
20 | +when the sink gets a chunk, it iterates over the characters in it | |
21 | +counting the `\n` characters. when the source returns `end` to the | |
22 | +sink, the sink calls a user provided callback. | |
23 | + | |
24 | +# source that may fail. (read, err, end) | |
25 | + | |
26 | +download a file over http and write it to fail. | |
27 | +The network should always be considered to be unreliable, | |
28 | +and you must design your system to recover from failures. | |
29 | +So there for the download may fail (wifi cuts out or something) | |
30 | + | |
31 | +The read stream is just the http download, and the sink | |
32 | +writes it to a tempfile. If the source ends normally, | |
33 | +the tempfile is moved to the correct location. | |
34 | +If the source errors, the tempfile is deleted. | |
35 | + | |
36 | +(you could also write the file to the correct location, | |
37 | +and delete it if it errors, but the tempfile method has the advantage | |
38 | +that if the computer or process crashes it leaves only a tempfile | |
39 | +and not a file that appears valid. stray tempfiles can be cleaned up | |
40 | +or resumed when the process restarts) | |
41 | + | |
42 | +# sink that may fail | |
43 | + | |
44 | +If we read a file from disk, and upload it, | |
45 | +then it is the sink that may error. | |
46 | +The file system is probably faster than the upload, | |
47 | +so it will mostly be waiting for the sink to ask for more. | |
48 | +usually, the sink calls read, and the source gets more from the file | |
49 | +until the file ends. If the sink errors, it calls `read(true, cb)` | |
50 | +and the source closes the file descriptor and stops reading. | |
51 | +In this case the whole file is never loaded into memory. | |
52 | + | |
53 | +# sink that may fail out of turn. | |
54 | + | |
55 | +A http client connects to a log server and tails a log in realtime. | |
56 | +(another process writes to the log file, | |
57 | +but we don't need to think about that) | |
58 | + | |
59 | +The source is the server log stream, and the sink is the client. | |
60 | +First the source outputs the old data, this will always be a fast | |
61 | +response, because that data is already at hand. When that is all | |
62 | +written then the output rate may drop significantly because it will | |
63 | +wait for new data to be added to the file. Because of this, | |
64 | +it becomes much more likely that the sink errors (the network connection | |
65 | +drops) while the source is waiting for new data. Because of this, | |
66 | +it's necessary to be able to abort the stream reading (after you called | |
67 | +read, but before it called back). If it was not possible to abort | |
68 | +out of turn, you'd have to wait for the next read before you can abort | |
69 | +but, depending on the source of the stream, that may never come. | |
70 | + | |
71 | +# a through stream that needs to abort. | |
72 | + | |
73 | +Say we read from a file (source), JSON parse each line (through), | |
74 | +and then output to another file (sink). | |
75 | +because there is valid and invalid JSON, the parse could error, | |
76 | +if this parsing is a fatal error, then we are aborting the pipeline | |
77 | +from the middle. Here the source is normal, but then the through fails. | |
78 | +When the through finds an invalid line, it should abort the source, | |
79 | +and then callback to the sink with an error. This way, | |
80 | +by the time the sink receives the error, the entire stream has been cleaned up. | |
81 | + | |
82 | +(you could abort the source, and error back to the sink in parallel, | |
83 | +but if something happened to the source while aborting, for the user | |
84 | +to know they'd have to give another callback to the source, this would | |
85 | +get called very rarely so users would be inclined to not handle that. | |
86 | +better to have one callback at the sink.) | |
87 | + | |
88 | + |
Built with git-ssb-web