libcudf
24.04.00
|
Vocabulary object to be used with nvtext::tokenize_with_vocabulary. More...
#include <tokenize.hpp>
Public Member Functions | |
tokenize_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource()) | |
Vocabulary object constructor. More... | |
Vocabulary object to be used with nvtext::tokenize_with_vocabulary.
Use nvtext::load_vocabulary to create this object.
Definition at line 235 of file tokenize.hpp.
nvtext::tokenize_vocabulary::tokenize_vocabulary | ( | cudf::strings_column_view const & | input, |
rmm::cuda_stream_view | stream = cudf::get_default_stream() , |
||
rmm::mr::device_memory_resource * | mr = rmm::mr::get_current_device_resource() |
||
) |
Vocabulary object constructor.
Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.
cudf::logic_error | if vocabulary contains nulls or is empty |
input | Strings for the vocabulary |
stream | CUDA stream used for device memory operations and kernel launches |
mr | Device memory resource used to allocate the returned column's device memory |