Monday, December 16, 2024

Scraping eBay so I (and you) can research toys properly

Can't say I've met many people who like researching old toys. As for me, I'm a sucker for a weird old toy and love figuring out where they came from. I've made a few pennies from doing that too. It's fun.

But damn is it hard! One of the biggest issues with toy research is that people don't document things. You're forced to go toil through newspapers.com to find a few measly ads for it and even if you do half the time you're not going to find the company that made the thing. So then you have to keep looking. Sometimes you can find a copyright registration that might point you in the right direction, but most of the time when it comes to researching novelties you are brought to the mercy of eBay.

Most people don't care about old toys. They see them as something to sell or throw away. Because of this, nearly every toy I and others will research, that isn't your mainstream Transformer or Barbie, will only have documentation through eBay listings. This can be a listing of the toy itself or even just an advertisement for the toy. Yes, even a toy as famous as the Jo-Bo, the toy you squeeze and its eyes pop out, never had anything written about it until I had to figure it out for myself. One of the biggest leads I came across, outside of newspaper ads, was trade ads listed on eBay.

March, 1954 advertisement for Blake Industries' "Jo-Bo" line

The ad above did not come from a publicly accessible archive. It did not come from a magazine I own physically. The only way you can find this ad anywhere online is from one singular eBay listing. It is the only ad I've seen show the entire line of toys the Jo-Bo came from.

This advertisement came from a trade magazine called Playthings. Playthings ran for a century, detailing the struggles of the toy industry, what new products were coming out and what companies were getting into the madness. Many toys you'll see in Playthings were either never successful or have since fallen into deep obscurity. Playthings, along with Toys & Novelties, another trade magazine, is an invaluable resource for toy research.

The Strong Museum of Play was donated a collection of every single issue of Playthings in 2010. Their libraries are accessible to the public and theoretically one could come in and view them. The problem is that The Strong is located in Rochester and I am located in Dallas. I am not taking a plane to Rochester just to read some 70 year old magazines about toys.

There are many, many pages of Playthings on eBay being sold individually. Along with them are pages from boating magazines, Sears catalogs, aviation magazines, you name it. It doesn't matter how rare the magazine is, these sellers have bought the collections at a high price as an investment. Their process is to disassemble the magazines and sell them page-by-page as ephemera or research materials. Some go as far as to cover the issue name and date to "protect their data." There is a finite number of these magazines available on the planet, with that number slowly dwindling. Many of these eBay listings are the only way the majority of people can see these articles and advertisements until a digitally-accessible archive is made.

I had attempted to contact the main seller multiple times, asking if he kept a backlog of scans available and if he'd be willing to share them so that these magazines could be available for research purposes. Repeatedly he ignored my messages until I was responded simply with "No, I won't help you." Other sellers were just as unhelpful. Their reasoning for not sharing these ads outside of eBay listings, as well as sometimes even covering portions of them, was "to protect [their] hard won (and expensively done at that) data. Knowing where what when etc is the name of the game."

You may have your thoughts on whether or not this is ethical in the matters of historical preservation given that these people have made an investment to make money to support themselves. Some may argue it's no different than having an antique booth. However, the difference between this and an antique booth is 90% of the time and antique booth is selling things that aren't exceptionally rare and can be duplicated with ease.

Thankfully, something can be done about it. At least a little.

This is an issue I had discovered and pondered about maybe two years ago. My idea was to scrape all of the current listing images, which are decent jpeg scans, and upload them onto the Internet Archive so they could be permanently accessible as well as searchable with OCR. I am very bad at coding so an acquaintance was very generous and scraped the account for me. Soon, I was left with 70GB of listing data including images and descriptions. My next step was to OCR these and organize them by the magazines that they came from (often listed at the bottom border of each page). I found another person to help with this, but the process was slow and ultimately the project was canceled.

Quite recently, I revisited the project. This was appropriate timing given that new pages had been uploaded about a year prior that I had not yet scraped. Still, I am bad at coding, but understand it better. My acquaintance had since lost the script that they used to scrape the listings but had said that it was so bad I wouldn't have wanted it anyway. So, after studying how eBay's web layout worked, I instructed AI to write me a python script to scrape all the listings it could.

This sounds easy, but surprisingly it wasn't. For some reason, eBay's search query caps off at approximately 10080 listings. This account had over 34000. To get around this, I had the script make individual searches for each year given that every listing has the year of which the ad came from. This came with its own issues. A search could come up with, say, 48 results, but continue to show more results after those first 48 that were not relevant. I have no idea why eBay works this way. So, I had the script only scrape the number of results that were actually there. This may seem like a no-brainer when reading this, but you shouldn't have to specify exactly how many links it's supposed to scrape.

Anyway, then the script compiled every link it found into a TXT file. I then used another script to scrape every listing image from every listing, named after the listing title. By the end of it I had about 61K images. I did my best to check for duplicates between this and the previous scrape (though missed a few) and combined the scrapes together.

The images are then zipped into individual zip files by year. The collection can be accessed here. By the end of it, 79,986 pages were scraped. Eventually, other accounts that are doing the same thing will have the same treatment.

Interestingly, sellers are not willing to help because they think that doing this will decrease the value of what they're selling. I don't understand this, because when they make the listing and scan the page, they're making that page accessible for everyone as long as the listing is up. They've already "decreased the value" of what they're selling. Some of these listings have been up for years and I've seen a few of the listings already used for reference by other toy researchers already.

Regardless, I hope this helps people research a bit better or at least gives someone a fun half hour of pouring through old ads. There's always something weird in there.

No comments:

Post a Comment

Scraping eBay so I (and you) can research toys properly

Can't say I've met many people who like researching old toys. As for me, I'm a sucker for a weird old toy and love figuring out ...