git ssb

0+

Dominic / flumeview-bloom



Tree: f854b637eba6ce4b8832080a3c9f86771a161db7

Files: f854b637eba6ce4b8832080a3c9f86771a161db7 / README.md

2275 bytesRaw

flumeview-reduce

A flumeview into a reduce function. Stream append-only log data into a reduce function to calculate a state.

Example

var FlumeLog = require('flumelog-offset')
var codec = require('flumecodec')
var Flume = require('flumedb')
var Bloom = require('flumeview-bloom')

//initialize a flumelog with a codec.
//this example uses flumelog-offset, but any flumelog is valid.
var log = FlumeLog(file, 1024*16, codec.json) //use any flume log

//attach the reduce function.
var db = Flume(log).use('bloom', Bloom(1, 'key')

db.append({key: 1, value: 1}, function (err) {
  db.bloom.ready(function (err, stats) {
    console.log(db.bloom.has(1)) // ==> true
    conosle.log(db.bloom.has(2)) // ==> false
  })
})

FlumeViewBloom(version, map, opts) => FlumeView

construct a flumeview from this reduce function. version should be a number, and must be provided. If you change map
then increment version and the view will be rebuilt. Also, if any options change, the view will be rebuilt.

opts provides options to the bloom filter items and probability. default settings are 100,000 items and 0.001 probability of a collision.

map is the key that is used to id each item. it can be a function that returns a string, or if it is a string then that property is taken from the item.

db[name].has(key) => boolean

check if an item with key is in the log. If the result is false, then the item is not in the database, but if the result is true, then the item might be in the database.

Uses

cheaply enforce uniqueness

Before adding something, check if you already have it. If the bloom filter does not have it, then we can add it without any other checks. But since bloom filters can give false positives, if it says yes, we need to check if it really is there. This will be a more expensive check, but we only need to do it if the bloom check fails.

estimating the number of unique values

By measuring the probability of a bloom filter match, we can get an estimate of the number of unique values added to the bloom filter. For example, unique visits to your website. This could also be used to track the how many possible values a field might have.

License

MIT

Built with git-ssb-web