pub fn tokenize(value: &str) -> Vec<String>
Expand description
Break text into tokens
Currently replaces é
and ë
with -e
, splits on hyphens, and removes non-alphabetic characters.
This function is a good entry point for adding support for the nuacnces of ’scientific“ texts