The text-to-dense representation techniques vary, evolving from character bi-grams to advanced subword vectorizers, combating OOV challenges like adversarial attacks and typos.
Researchers at Google recently developed and unveiled a new resilient and efficient text vectorizer dubbed "RETVec," which will defend Gmail users against malicious emails and spam.
RETVec is an efficient, multilingual, next-gen text vectorizer with built-in adversarial resilience.
This next-gen text vectorizer is resilient to character-level manipulations like-.
There are two layers in the RETVec character encoder, and here below, we have mentioned those layers:-.
RETVec uses a unique character encoder, handling UTF-8 efficiently.
Being a layer, it seamlessly fits into any TF model without extra pre-processing.
RETVec Binarizer boosts word representation but lacks competitiveness.
Researchers enhance it with a small model, boosting accuracy and outperforming others.
TensorFlow models easily employ RETVec for string vectorization in just one line.
Researchers tested RETVec against adversarial content using a Google spam filter.
Swapping SentencePiece with RETVec improved spam detection by 38% at a 0.80% false positive rate, reducing latency by 30%. This suggests RETVec is competitive for real-world tasks, boosting confidence in its effectiveness.
How to optimize RETVec for better multilingual skills, robustness, and smaller models in large language models is a key question.
For smaller LLMs, where the vocabulary layer can be over 20% of the parameters, RETVec eliminates it.
Using RETVec in generative models poses challenges, as its 256-float embedding doesn't directly convert to softmax output.
A new training method compatible with text generation is needed.
Experimenting with character-by-character decoding and the VQ-VAE model renders indecisive results.
Future work addresses these limitations and explores RETVec's use as a word embedding, replacing GloVe and word2vec and training text similarity models with its character encoder.
To install the latest TensorFlow version of RETVec, you can use "Pip":-.
On TensorFlow 2.6+ and Python 3.8+, the RETVec has already been tested.
This Cyber News was published on cybersecuritynews.com. Publication date: Sat, 02 Dec 2023 13:20:04 +0000