git ssb - add examples · Dominic/pull-stream@5110c231

Commit 5110c2317317d6ff14a895af98bf5accd29d2612

add examples

Dominic Tarr committed on 11/28/2015, 7:48:07 AM
Parent: 97a87376f0a454d027861fe7e24e0c211260d06e

Files changed

added

examples.mdView
		@@ -1,0 +1,88 @@
	1	+
	2	+This document describes some examples of where various features
	3	+of pull streams are used in simple real-world examples.
	4	+
	5	+Much of the focus here is handling the error cases. Indeed,
	6	+distributed systems are _all about_ handling the error cases.
	7	+
	8	+# simple source that ends correctly. (read, end)
	9	+
	10	+A normal file (source) is read, and sent to a sink stream
	11	+that computes some aggregation upon that input.
	12	+such as the number of bytes, or number of occurances of the `\n`
	13	+character (i.e. the number of lines).
	14	+
	15	+The source reads a chunk of the file at each time it's called,
	16	+there is some optimium size depending on your operating system,
	17	+file system, physical hardware,
	18	+and how many other files are being read concurrently.
	19	+
	20	+when the sink gets a chunk, it iterates over the characters in it
	21	+counting the `\n` characters. when the source returns `end` to the
	22	+sink, the sink calls a user provided callback.
	23	+
	24	+# source that may fail. (read, err, end)
	25	+
	26	+download a file over http and write it to fail.
	27	+The network should always be considered to be unreliable,
	28	+and you must design your system to recover from failures.
	29	+So there for the download may fail (wifi cuts out or something)
	30	+
	31	+The read stream is just the http download, and the sink
	32	+writes it to a tempfile. If the source ends normally,
	33	+the tempfile is moved to the correct location.
	34	+If the source errors, the tempfile is deleted.
	35	+
	36	+(you could also write the file to the correct location,
	37	+and delete it if it errors, but the tempfile method has the advantage
	38	+that if the computer or process crashes it leaves only a tempfile
	39	+and not a file that appears valid. stray tempfiles can be cleaned up
	40	+or resumed when the process restarts)
	41	+
	42	+# sink that may fail
	43	+
	44	+If we read a file from disk, and upload it,
	45	+then it is the sink that may error.
	46	+The file system is probably faster than the upload,
	47	+so it will mostly be waiting for the sink to ask for more.
	48	+usually, the sink calls read, and the source gets more from the file
	49	+until the file ends. If the sink errors, it calls `read(true, cb)`
	50	+and the source closes the file descriptor and stops reading.
	51	+In this case the whole file is never loaded into memory.
	52	+
	53	+# sink that may fail out of turn.
	54	+
	55	+A http client connects to a log server and tails a log in realtime.
	56	+(another process writes to the log file,
	57	+but we don't need to think about that)
	58	+
	59	+The source is the server log stream, and the sink is the client.
	60	+First the source outputs the old data, this will always be a fast
	61	+response, because that data is already at hand. When that is all
	62	+written then the output rate may drop significantly because it will
	63	+wait for new data to be added to the file. Because of this,
	64	+it becomes much more likely that the sink errors (the network connection
	65	+drops) while the source is waiting for new data. Because of this,
	66	+it's necessary to be able to abort the stream reading (after you called
	67	+read, but before it called back). If it was not possible to abort
	68	+out of turn, you'd have to wait for the next read before you can abort
	69	+but, depending on the source of the stream, that may never come.
	70	+
	71	+# a through stream that needs to abort.
	72	+
	73	+Say we read from a file (source), JSON parse each line (through),
	74	+and then output to another file (sink).
	75	+because there is valid and invalid JSON, the parse could error,
	76	+if this parsing is a fatal error, then we are aborting the pipeline
	77	+from the middle. Here the source is normal, but then the through fails.
	78	+When the through finds an invalid line, it should abort the source,
	79	+and then callback to the sink with an error. This way,
	80	+by the time the sink receives the error, the entire stream has been cleaned up.
	81	+
	82	+(you could abort the source, and error back to the sink in parallel,
	83	+but if something happened to the source while aborting, for the user
	84	+to know they'd have to give another callback to the source, this would
	85	+get called very rarely so users would be inclined to not handle that.
	86	+better to have one callback at the sink.)
	87	+
	88	+

Built with git-ssb-web

Dominic / pull-stream

Commit 5110c2317317d6ff14a895af98bf5accd29d2612

add examples

Files changed