Threat Watch ByteDance Spider

If you run a web server chances are your sites traffic has been invaded by the unwanted vistors know as ByteDance. If you look at your access log there's a good chance you have seen traffic such as this right here.

Log info

Who is ByteDance

ByteDance is a Chinese company who may or may not have ties to the CCP. They also own the following products, CapCut, Douyin, Lark, Toutiao, Xigua Video, Nuverse, and most infamously TikTok. Yes the same TikTok that has been all the rage over the last few years. A few years ago the U.S. goverment tried to outright ban the platform due to concerns about its ties to the CCP which only seemed to make the platform more popular. As a result of their growing popularity they now site at the same table as the rest of big tech and are no different in their motives of spying, collecting data, and selling that data to who ever wants it and yes they also track people who don't even use TikTok through tracking pixels. Needless to say this is not an ethical company at all.

The ByteDance spider

Most webcrawlers or spiders as they are often called are search engines who place your site where ever they see fit on their scale of relevance. ByteDance however is not a search engine in the sense that they point you to actual websites but rather a few second videos to attempt to capitleize on the most revenue possible. With that in mind what exactly is happening? To truly understand how much of a pest this spider really is we first need to understand how bandwidth works on a server. For each vistor your site draws in will generate a number of bandwidth per month. As that bandwidth builds up it isn't uncommon to see slow downs on the website, increase of cost per month to run the site, and eventualy a Denial of Service situation where no one can use the site. Now with that in mind let's look at what ByteDance is doing. Esentualy they are running around trying to find what ever server they can even if the admin has never used TikTok nor emabled any ads or tracking pixels and hunting down random pages for unknown reasons although if anything about the ByteDances profile is anything to go off of it's probably not something good. Now with most web crawlers you can set limitations on where they can and can't access pages from through something called robots.txt. The problem is that ByteDance does not respect these limitions at all and will instead crawl through those pages anyway. And then there's the bandwidth problem. These webcrawlers that ByteDance uses in the best way to describe it are freeloaders who not only provide nothing of vaule but also activley hurt those who are effected by their actions. Ok so it's one thing to eat up resources that could've gone to legitimate users, but if you actualy go on to virustotal and do a lookup on these ip addresses that are comming from these webcrawlers you will find that these are malicous ip addresses whos intentions are unknown.

Best course of action

If you are a user of site unfourtantly there really isn't a whole lot you can do to fight this beast as it remains completly hidden to the end user even stuff like ublock origin don't seem to be able to find it. The only thing you can do as the end user is to stop using products made by ByteDance such as TikTok and not installing it in the first place which is becoming harder to do since most phones these days and even Windows 11 come with TikTok preinstalled. Not to mention that unless you have a way of blocking all of ByteDances tracking pixels then they still have your information and thus are still profiting off of you regaurdless of if you ever used their products or not. If you are a server owner however you do have a few options please note that they may or may not work for you so please do your own reasearch on the matter to find what works best for your services. Option 1 is to disallow the offending company in the robots.txt file the downside to this approach though is that it is possible for them to setup another spider with a different name so as to bypass the rules on the robots.txt file which they already do not respect to begin with. Option 2 you can try to block any associated ip addresses at the firewall level but the downside to this apporach is that you would have to know every IP address these guys use in order to get rid of the spider and if know anything about ip addresses you know that they can change at any given point meaning that this would only be temporary soluation to the problem.