如何制作词云 - How to make wordcloud

下图是一张词云的照片。

The image below is a photo of a wordcloud.

首先，安装wordcloud包。对于汉语用户，需额外安装jieba包。

Firstly, install the wordcloud package. For Chinese users, install an additional package, jieba.

```bash
pip install jieba
pip install wordcloud
```

上图词云的文本来源是本篇博客，通过爬虫爬取内容，再用BeautifulSoup包解码。读者也可以从文本中读取，或者直接向程序中粘贴文本来源。

The text for the word cloud above was obtained from this blog post, extracted through web scraping, and then decoded with the BeautifulSoup package. Readers could also read from a text file or directly paste the text into the program.

```python
url = 'https://blog.tennisatw.com/2023/09/how-to-make-wordcloud.html'
r = requests.get(url=url).text
soup = BeautifulSoup(r, 'lxml')
paragraphs = soup.find_all('p')

blog_text = ''
for text in paragraphs:
    blog_text += text.text
```

```python
blog_text = '文本 text'
```

如果是汉语用户，由于汉语词汇中间没有空格，需使用jieba分词，执行以下代码：

For Chinese users, as Chinese vocabulary does not contain spaces in between words, the jieba package is needed for word segmentation. Run the following code.

```python
ls = jieba.lcut(blog_text)
text = ' '.join(ls)
```

以下为全部代码：

Below is the complete code:

```python
import requests
from bs4 import BeautifulSoup
import jieba
import wordcloud

url = 'https://blog.tennisatw.com/2023/09/how-to-make-wordcloud.html'
r = requests.get(url=url).text
soup = BeautifulSoup(r, 'lxml')
paragraphs = soup.find_all('p')

blog_text = ''
for text in paragraphs:
    blog_text += text.text

ls = jieba.lcut(blog_text)
text = ' '.join(ls)

stopwords = wordcloud.STOPWORDS | {"的", "是", "了", "我"}

wc = wordcloud.WordCloud(font_path="msyh.ttc",
                         width=800,
                         height=600,
                         background_color='white',
                         max_words=80,
                         max_font_size=150,
                         stopwords=stopwords)

wc.generate(text)
wc.to_file("wordcloud.png")

```

搜索此博客 - search in blogs

Tennisatw的博客 - Blog of Tennisatw

如何制作词云 - How to make wordcloud

热门博文 - Popular posts

全部博文 - Archive

标签 - Labels