Grouping by pattern discovery
Given: D, a set of domains
find a “good” pattern Pi for an acceptable subset Si of the examples Di,
D[i+1] := Di - Si %remove Si from Di to give D[i+1]
Output: K, the set of (Pattern,DomainSet) pairs
Note that is it not guaranteed that any Pi exclusively matches domains from Di and no other Dj (j?i). I.e. the grouping is not a partition, and Pi is therefore characteristic of Di, not a classifier function.
- A 'good pattern' P matching a subset S of examples D is one where the function F( G(P), C(S,D)) is below some given value (“pruneval”).
- At present F( G(P), C(S,D)) = log(G(P)) * C(S,D)
- G(P) is the goodness of pattern P, where “goodness” is given by a measure of compression
- C(S,D) is the cover value |S|/|D| where |X| is the number of items in set X