获取抖音用户作品列表信息-进阶
# 概要
继上一篇用puppeteer获取用户作品列表的方式,做了一些优化方案
# 问题
- 上一篇文章描述的方案,有一个日期范围的缺点,20条记录产生的跨度比较大,不是很精确,需优化
- 通过模拟浏览的方式, 有一点耗时,不过没有后遗症
# 优化点
日期范围的问题, 根据接口返回内容格式,有一个min_cursor, max_cursor,最小和最大游标,利用这个特性只访问2条记录,一条记录一个。那就很明确了该作品的日期。
采取了拦截请求url获取签名等重要参数(很难伪造),
const page = await browser.newPage();
await page.setRequestInterception(true);
- 获取到参数后拼接请求参数,递归查询
const loopList = async (param: IList): Promise<any> => {
await sleep(500);
const res = await getList(param);
if (res.aweme_list && res.aweme_list.length) {
const list = res.aweme_list;
list[0].timestamp = res.min_cursor;
if (list[1]) list[1].timestamp = res.max_cursor;
result.push(...list);
// 必须有retrun
return loopList({ ...param, max_cursor: res.max_cursor });
} else {
return new Promise(resolve => {
resolve(result);
});
}
};
- 拿到结果后可以存入文件中,也可以调用接口保存
# 坑
在第二步拿到签名信息的时候,去发起接口请求,发现接口返回是空, 想想应该是user-agent的问题,于是随便增加了一个user-Agent,还是不行,再深入一想,会不会签名和生成的浏览器有关呢,于是把无头浏览器的user-Agent放入接口中。正常返回了!!! 抖音的接口加密做的还是很好的!!
# 相关
# 代码
// 获取抖音用户作品列表~
import { getList, IList } from '../services/douyin-share';
import { sleep } from '../utils/star';
import { saveAuthorVideo } from '../services/douyin-share';
const path = require('path');
const fs = require('fs');
const puppeteer = require('puppeteer');
const querystring = require('querystring');
const apiUrl = 'https://www.iesdouyin.com/web/api/v2/aweme/post';
const result = [] as Record<string, any>;
let index = 0; // 数组下标
// 获取_signature, dytk参数
const getRequestParam = (id: string) => {
return new Promise(async resolve => {
const browser = await puppeteer.launch({
headless: false,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
await page.setRequestInterception(true);
// 拦截请求获取请求参数
page.on('request', (interceptedRequest: any) => {
const requestUrl = interceptedRequest.url();
if (requestUrl.indexOf(apiUrl) > -1) {
const qs = querystring.parse(requestUrl.split('?')[1]);
const {
user_id,
sec_uid,
count,
max_cursor,
aid,
_signature,
dytk,
} = qs;
resolve({
user_id,
sec_uid,
count,
max_cursor,
aid,
_signature,
dytk,
});
} else {
interceptedRequest.continue();
}
});
// 进入页面
await page.goto(`https://www.iesdouyin.com/share/user/${id}`);
const title = await page.$eval('.nickname', (el: any) => el.innerHTML);
console.log(title);
await page.waitFor(5000);
browser.close();
resolve();
});
};
const loopList = async (param: IList): Promise<any> => {
await sleep(500);
const res = await getList(param);
if (res.aweme_list && res.aweme_list.length) {
const list = res.aweme_list;
list[0].timestamp = res.min_cursor;
if (list[1]) list[1].timestamp = res.max_cursor;
result.push(...list);
// 必须有retrun
return loopList({ ...param, max_cursor: res.max_cursor });
} else {
return new Promise(resolve => {
resolve(result);
});
}
};
// 递归执行保存方法 执行完成返回promise
export const doSaveVideoTask = async (arr: string[]): Promise<any> => {
const param = (await getRequestParam(arr[index])) as IList;
param.count = '20';
param.max_cursor = `0`;
const res = await loopList(param);
const data = res.map((item: Record<string, any>) => {
const video_src = `https://www.iesdouyin.com/share/video/${item.aweme_id}/?region=CN&mid=&u_code=&titleType=title&utm_source=copy_link&utm_campaign=client_share&utm_medium=android&app=aweme`;
return {
awemeId: item.aweme_id,
awemeType: item.aweme_type,
cover: item.video.cover && item.video.cover.url_list[0],
desc: item.desc,
duration: item.video.duration,
dynamicCover:
item.video.dynamic_cover && item.video.dynamic_cover.url_list[0],
height: item.video.height,
statistics: JSON.stringify(item.statistics),
timestamp: item.timestamp,
vid: item.video.vid,
videoSrc: video_src,
width: item.video.width,
};
});
// 将作品信息写入文件
let writerStream = fs.createWriteStream(
path.join(__dirname, `../../douyin-data/douyin-${arr[index]}.json`)
);
writerStream.write(JSON.stringify(data, undefined, 2), 'UTF8');
writerStream.end();
try {
await saveAuthorVideo(arr[index], data);
} catch (e) {
console.log(e);
}
if (index < arr.length - 1) {
index++;
return doSaveVideoTask(arr);
} else {
return new Promise(resolve => {
resolve(`${arr.length}条数据已经保存完毕`);
console.log('task is done');
});
}
};
上次更新: 2021/12/19, 18:05:42