博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
trie树查找前缀串_Trie数据结构(前缀树)
阅读量:2519 次
发布时间:2019-05-11

本文共 7976 字,大约阅读时间需要 26 分钟。

trie树查找前缀串

by Julia Geist

Julia·盖斯特(Julia Geist)

A Trie, (also known as a prefix tree) is a special type of tree used to store associative data structures

Trie (也称为前缀树)是一种特殊类型的树,用于存储关联数据结构

A trie (pronounced try) gets its name from retrieval — its structure makes it a stellar matching algorithm.

Trie(发音为try)的名称取自rev val -其结构使其成为出色的匹配算法。

语境 (Context)

Write your own shuffle method to randomly shuffle characters in a string.
Use the words text file, located at /usr/share/dict/words, and your shuffle method to create an anagram generator that only produces real words.
Given a string as a command line argument, print one of its anagrams.

I was presented with this challenge this week at .

本周,我在了这一挑战。

The words in the text file are separated by new lines. Its formatting makes it a lot easier to put the words into a data structure. For now, I’m storing them in a list — each element being a single word from the file.

文本文件中的单词用新行分隔。 它的格式使将单词放入数据结构变得容易得多。 现在,我将它们存储在列表中-每个元素都是文件中的单个单词。

One approach to this challenge is to:

解决这一挑战的一种方法是:

  • randomly shuffle the characters in the string

    随机随机播放字符串中的字符
  • then, check it against all words that were in /usr/share/dict/words to verify that it’s a real word.

    然后,对照/ usr / share / dict / words中的所有单词检查它,以确认它是真实单词。

However, this approach requires that I check that the randomly shuffled characters in the new string matches one of 235,887 words in that file — that means 235,887 operations for each string that I want to verify as a real word.

但是,这种方法要求我检查新字符串中随机混洗的字符是否与该文件中的235887个单词匹配-这意味着要验证为真实单词的每个字符串需要进行235887次操作

This was an unacceptable solution for me. I first looked up libraries that had already been implemented to check if words exist in a language, and found . I first completed the challenge using the library, in a few lines of code.

对我来说,这是不可接受的解决方案。 我首先查找已经实现的库,以检查语言中是否存在单词,然后找到 。 我首先使用库通过几行代码完成了挑战。

def generateAnagram(string, language="en_US"):   languageDict = enchant.Dict(language)    numOfPossibleCombinationsForString = math.factorial(len(string))   for i in range(0, numOfPossibleCombinationsForString):       wordWithShuffledCharacters = shuffleCharactersOf(string)
if languageDict.check(wordWithShuffledCharacters):            return wordWithShuffledCharacters     return "There is no anagram in %s for %s." % (language, string)

Using a couple of library functions in my code was a quick and easy solution. However, I didn’t learn much by finding a library to solve the problem for me.

在我的代码中使用几个库函数是一个快速简便的解决方案。 但是,我没有找到可以为我解决问题的图书馆,因此学不到很多东西。

I was positive that the library wasn’t using the approach I mentioned earlier. I was curious and dug through the source code — I found a trie.

我很肯定图书馆没有使用我前面提到的方法。 我很好奇并仔细研究了源代码,发现了一个特里。

特里 (Trie)

A trie stores data in “steps”. Each step is a node in the trie.

特里树以“步骤”存储数据。 每个步骤都是特里树中的一个节点。

Storing words is a perfect use case for this kind of tree, since there are a finite amount of letters that can be put together to make a string.

对于此类树而言,存储单词是一个完美的用例,因为可以将一定数量的字母放在一起构成一个字符串。

Each step, or node, in a language trie will represent one letter of a word. The steps begin to branch off when the order of the letters diverge from the other words in the trie, or when a word ends.

语言树中的每个步骤或节点将代表一个单词的一个字母。 当字母的顺序与特里中的其他单词不同或单词结束时,步骤开始分支。

I created a trie out of directories on my Desktop to visualize stepping down through nodes. This is a trie that contains two words: apple and app.

我从桌面上的目录中创建了一个Trie,以可视化逐步通过节点。 这是一个trie,包含两个词:apple和app。

Note that the end of a word is denoted with a ‘$’. I’m using ‘$’ because it is a unique character that will not be present in any word in any language.

注意,单词的结尾用“ $”表示。 我使用的是“ $”,因为它是一个唯一字符,不会以任何语言出现在任何单词中。

If I were to add the word ‘aperture’ to this trie, I would loop over the letters in the word ‘aperture’ while simultaneously stepping down the nodes in the trie. If the letter exists as a child of the current node, step down into it. If the letter does not exist as a child of the current node, create it and then step down into it. To visualize these steps using my directories:

如果要在此Trie中添加单词“ aperture”,则将遍历单词“ aperture”中的字母,同时降低Trie中的节点。 如果该字母作为当前节点的子代存在,请下移至该节点。 如果该字母不作为当前节点的子代存在,请创建该字母,然后逐步降低该字母。 要使用我的目录可视化这些步骤:

While stepping down the trie, the first two letters of ‘aperture’ are already present in the trie, so I step down into those nodes.

在下移Trie时,“ aperture”的前两个字母已经存在于该Trie中,因此我下移到这些节点。

The third letter, ‘e’, however, is not a child of the ‘p’ node. A new node is created to represent the letter ‘e’, branching off from the other words in the trie. New nodes for the letters that follow are created as well.

但是,第三个字母“ e”不是“ p”节点的子代。 创建一个新的节点来表示字母“ e”,并从该树中的其他单词分支出来。 还创建了随后字母的新节点。

To generate a trie from a words file, this process will happen for each word, until all combinations for every word are stored.

为了从单词文件生成特里树,将对每个单词进行此过程,直到存储每个单词的所有组合。

You might be thinking: “Wait, won’t it take really long to generate the trie from that text file with 235,887 words in it? What’s the point of looping over every single character in every single word?”

您可能会想:“等等,从包含235,887个单词的文本文件生成trie是否真的需要很长时间? 是什么在每一个字,遍历每一个字符的意义呢?”

Yes, iterating over each character of every word to generate a trie does take some time. However, the time taken to create the trie is well worth it — because to check if a word exists in the text file, it takes at most, as many operations as the length of the word itself. Much better than the 235,887 operations it was going to take before.

是的,遍历每个单词的每个字符以生成特里确实需要一些时间。 但是,创建特里树所花的时间是非常值得的 -因为检查文本文件中是否存在单词,它最多花费与单词本身长度一样多的操作。 比之前要进行的235,887次操作好得多

I wrote the simplest version of a trie, using nested dictionaries. This isn’t the most efficient way to implement one, but it is a good exercise to understand the logic behind a trie.

我使用嵌套词典编写了最简单的trie版本。 这不是实现一个的最有效方法,但是了解一个trie背后的逻辑是一个很好的练习。

endOfWord = "$"
def generateTrieFromWordsArray(words):   root = {}   for word in words:      currentDict = root      for letter in word:         currentDict = currentDict.setdefault(letter, {})      currentDict[endOfWord] = endOfWord   return root
def isWordPresentInTrie(trie, word):   currentDict = trie   for letter in word:      if letter in currentDict:         currentDict = currentDict[letter]      else:          return False   return endOfWord in currentDict

You can see my for the anagram generator on my Github. Since exploring this algorithm, I’ve decided to make this blog post one of many — each post covering one algorithm or data structure. The code is available on my repo — star it to stay updated!

您可以在我的Github上看到我的字谜生成器 。 自探索该算法以来,我决定将该博客文章设为众多文章之一-每个文章都涉及一种算法或数据结构。 该代码在我的库中可用-对其加注星标以保持更新!

下一步 (Next Steps)

I suggest checking out Ray Wenderlich’s . Although written in Swift, it’s a valuable source for explanations of various algorithms.

我建议您查看雷·温德利希(Ray Wenderlich)的 。 尽管是用Swift编写的,但它是解释各种算法的宝贵资源。

Similar to the trie (but more memory efficient) is a suffix tree, or radix. In short, instead of storing single characters at every node, the end of a word, its suffix, is stored and the paths are created relatively.

后缀树或基数与特里树类似(但具有更高的存储效率)。 简而言之,不是在每个节点上都存储单个字符,而是存储单词的结尾及其后缀,并且相对地创建路径。

However, a radix is more complicated to implement than a trie. I suggest taking a look at Ray Wenderlich’s if you’re interested.

但是,基数的实现比trie的实现更为复杂。 如果您有兴趣,我建议您看看Ray Wenderlich的 。

This is the first post of my algorithm and data structures series. In each post, I’ll present a problem that can be better solved with an algorithm or data structure to illustrate the algorithm/data structure itself.

这是我的算法和数据结构系列的第一篇文章。 在每篇文章中,我将介绍一个可以通过算法或数据结构更好地解决的问题,以说明算法/数据结构本身。

Star my on Github and follow me on if you’d like to follow along!

在Github上为我的加注星标,如果您想跟随我,在关注我!

Did you gain value by reading this article? to share it on Twitter! If you’d like to see content like this more often, follow me on Medium and subscribe to my once-a-month newsletter below. Feel free to too.

您通过阅读本文获得了价值吗? 在Twitter上分享! 如果您想经常看到这样的内容,请在Medium上关注我,并订阅下面的每月一次的新闻通讯。 也可以给 。

翻译自:

trie树查找前缀串

转载地址:http://etgwd.baihongyu.com/

你可能感兴趣的文章
图片放大器——wpf
查看>>
SCALA STEP BY STEP
查看>>
cocos2d-x学习笔记
查看>>
MySql中的变量定义
查看>>
Ruby数组的操作
查看>>
hdu1181暴搜
查看>>
解码字符串 Decode String
查看>>
json学习笔记
查看>>
工具:linux 性能监控工具-nmon
查看>>
fatal error C1853
查看>>
Ural 1001 - Reverse Root
查看>>
玩转webpack之webpack的entry output
查看>>
java 操作mongodb查询条件的常用设置
查看>>
黑马程序员_java基础笔记(02)...java语言基础组成
查看>>
关于缓存击穿
查看>>
对innodb 拷贝文件实现数据库的方式(转)
查看>>
python知识点 2014-07-09
查看>>
FloatingActionButton的一点学习感悟
查看>>
ABAP CDS ON HANA-(10)項目結合して一つ項目として表示
查看>>
网站地址信息
查看>>