1.org.apache.spark.util.sketch.BloomFilter
public abstract class BloomFilter {}
进一步的信息
* Combines this bloom filter with another bloom filter by performing a bitwise OR of the underlying data
Callers must ensure the bloom filters are appropriately sized to avoid saturating them.
filter1.mergeInPlace(filter2)
02. put() putString() putLong() putBinary()
参考: test.org.apache.spark.sql.JavaDataFrameSuite
Currently supported data types include
Byte,Short,Integer,Long,String
* The implementation is largely based on the {@code BloomFilter} class from Guava.
* @since 2.0.0
2.org.apache.spark.sql.DataFrameStatFunctions
final class DataFrameStatFunctions private[sql](df: DataFrame) {
// Builds a Bloom filter over a specified column.
def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): BloomFilter = {
buildBloomFilter(Column(colName), BloomFilter.create(expectedNumItems, fpp))}
}
3.org.apache.spark.sql.Dataset
class Dataset[T] private[sql]{
def stat: DataFrameStatFunctions = new DataFrameStatFunctions(toDF())
}