libcudf  24.04.00
Public Member Functions | List of all members
nvtext::tokenize_vocabulary Struct Reference

Vocabulary object to be used with nvtext::tokenize_with_vocabulary. More...

#include <tokenize.hpp>

Public Member Functions

 tokenize_vocabulary (cudf::strings_column_view const &input, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::mr::device_memory_resource *mr=rmm::mr::get_current_device_resource())
 Vocabulary object constructor. More...
 

Detailed Description

Vocabulary object to be used with nvtext::tokenize_with_vocabulary.

Use nvtext::load_vocabulary to create this object.

Definition at line 235 of file tokenize.hpp.

Constructor & Destructor Documentation

◆ tokenize_vocabulary()

nvtext::tokenize_vocabulary::tokenize_vocabulary ( cudf::strings_column_view const &  input,
rmm::cuda_stream_view  stream = cudf::get_default_stream(),
rmm::mr::device_memory_resource *  mr = rmm::mr::get_current_device_resource() 
)

Vocabulary object constructor.

Token ids are the row indices within the vocabulary column. Each vocabulary entry is expected to be unique otherwise the behavior is undefined.

Exceptions
cudf::logic_errorif vocabulary contains nulls or is empty
Parameters
inputStrings for the vocabulary
streamCUDA stream used for device memory operations and kernel launches
mrDevice memory resource used to allocate the returned column's device memory

The documentation for this struct was generated from the following file: