Classification problem C1
Given a set S+ of sequences believed to be members of family F+, and a set S- of sequences believed not to be members, i.e.
S+ ? F+ and S- ? F-
F+ ? F- = ? and F+ ? F- = ?*
Find compact string functions that return
- TRUE for all s ? S+ and FALSE for all s ? S- , and
- have a high likelihood for returning TRUE for s ? F+ and FALSE for s ? F-
C1a: find compact “explanations” of known sequences
C1b: try to predict the family relationship of yet unknown sequences
N1: suppose F+ ? F- = ? and F+ ? F- = ?*, and S+ ? F- and S- ? F+ are small, find compact string functions that return
- TRUE for most s ? S+ and FALSE for most s ? S- , and
- have a high likelihood for returning TRUE for s ? F+ and FALSE for s ? F-