Quickly Migrate Blog Using YingDao

Preface

I searched for many blog migration tools, but most of them require writing code or using scripts to read HTML for dumping, which often leads to some data confusion in my blog. I hadn’t found a particularly suitable method for quick migration.

In August, I came into contact with various RPA tools like YingDao (ShadowBot) and n8n. This time, when creating a new blog, I thought about moving all 170 previous articles from CSDN in one go. Although AI coding tools can reduce the need to reinvent the wheel—I use Cursor, Trae, and Antigravity—the answers they gave were all about crawling, and the results were mediocre. So, I decided to try using RPA tools. As a result, with simple debugging, I finished writing the script in about thirty minutes and started automatically reading and saving.

YingDao (ShadowBot)

I stumbled upon YingDao by chance and found it very convenient to use. It is easy for people with programming experience to write some scripts, and also convenient for people without programming experience to perform programming tasks. I previously used YingDao to write scripts for a freight forwarding company to upload to E-Booking or get ETAs, and the speed and overall application were very smooth.

Most importantly, the personal version is free, which is sufficient for individuals or small teams. Of course, when the program complexity increases, using pure code is still more convenient. So in my opinion, this scenario is suitable for demos and fixed repetitive labor (I haven’t tried Web or Client side complex interactions yet).

Logic

Reason

I wanted to migrate all my blogs completely. However, since I didn’t back up my blog files before, I couldn’t find most of the original images and markdown files. But CSDN automatically transfers and stores them in its cloud CDN.

I tested the CDN links and found that external access is not allowed. Attaching them to my blog domain xlx.dev would result in resolution errors (antis-hotlinking).

Result

So after sorting it out, I determined the route as:

Login to my backend personal homepage.
Find the list of all articles.
Get the upload time (Two situations will occur here. CSDN uses two tag styles for time, causing YingDao capture to fail, so a try...catch... is needed here to get data for both).
Click Edit.
Get the title and context of the edit page.
Analyze all URLs in the context, especially the CSDN CDN headers, and replace them with local paths or my own CDN paths.
- CDN headers include two types:
  1
  2
  https://i-blog.csdnimg.cn/direct/*
  https://i-blog.csdnimg.cn/blog_migrate/*
Save images locally.
Save context, article theme, and time to a file.
Store images and articles separately in each folder named after the Title.
Also store images in a unified location for later upload to my own CDN.

Steps

1. Installation

For domestic users, it’s recommended to install the YingDao tool directly from the official website: https://www.yingdao.com/, and download the installation package.

2. Create Application

Create a new Web Automation Application.
Start placing functional modules according to the logic mentioned above. First, complete the Get Article List module.
1. Open the backend webpage. For convenience later, I logged into the account using Google Chrome directly.
2. Select Similar Element List, click verify element on the right to see if the data is correct.
3. Since there are many articles and YingDao basically mimics manual operations, we need to scroll the scrollbar to the bottom. However, because it is dynamically loaded, we need to scroll multiple times to complete the homepage cache of all articles. I set it to scroll ten times here. Through this node, we can ensure that all article data is obtained.
4. This way, without scrolling, I can get 60 articles, but after setting the default scroll, I can directly get all 162 articles.
Traverse articles and get article data.
1. Get the hyperlink of each article, open a new web page, and get data from it. Browser needs to be set to Google Chrome.
2. Get time data.
3. Since the obtained data is not a pure time string but contains other text parts, a regular expression is needed to process the data.
  1
  \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}
4. Need to click Edit again to enter the edit state, and finally the module to get time from that page and click edit.
Get internal Article Title and Context body.
1. After getting the title, need to create a folder and a file with the same name based on the title as the local storage address of the blog.
  1. Since there will be many illegal characters in the title, such as < > , /r , \r etc., a regular expression is also needed to filter the title to get a valid filename.
  2. Combine address.
2. After getting the body, extract the URL links in the body. The purpose is to extract the CDN image URLs. Note that you need to uncheck Find only the first match in the node, otherwise the node will only return the first item.
  1
  2
  https://i-blog.csdnimg.cn/direct/*
  https://i-blog.csdnimg.cn/blog_migrate/*
  Regex to extract text:
  1
  https://i-blog\.csdnimg\.cn/(?:direct|blog_migrate)/.+?\.[a-zA-Z]{3,4}
3. Create a loop. Because CSDN’s CDN settings allow empty referer, this mechanism can be used to download images.
  1. Similarly, get all image names through a regular expression.
    1
    https://i-blog\.csdnimg\.cn/(?:direct|blog_migrate)/(.+?\.(?:png|jpg|jpeg|gif|webp))
  2. Save the image twice, the save paths are: folder // images and folder // title.
After each loop ends, close the webpage and check the “Ignore confirmation on leave” checkbox, because YingDao’s characteristics will trigger CSDN’s save trigger.

Final Effect