Monday, December 16, 2024

Scraping eBay so I (and you) can research toys properly

Can't say I've met many people who like researching old toys. As for me, I'm a sucker for a weird old toy and love figuring out where they came from. I've made a few pennies from doing that too. It's fun.

But damn is it hard! One of the biggest issues with toy research is that people don't document things. You're forced to go toil through newspapers.com to find a few measly ads for it and even if you do half the time you're not going to find the company that made the thing. So then you have to keep looking. Sometimes you can find a copyright registration that might point you in the right direction, but most of the time when it comes to researching novelties you are brought to the mercy of eBay.

Most people don't care about old toys. They see them as something to sell or throw away. Because of this, nearly every toy I and others will research, that isn't your mainstream Transformer or Barbie, will only have documentation through eBay listings. This can be a listing of the toy itself or even just an advertisement for the toy. Yes, even a toy as famous as the Jo-Bo, the toy you squeeze and its eyes pop out, never had anything written about it until I had to figure it out for myself. One of the biggest leads I came across, outside of newspaper ads, was trade ads listed on eBay.

March, 1954 advertisement for Blake Industries' "Jo-Bo" line

The ad above did not come from a publicly accessible archive. It did not come from a magazine I own physically. The only way you can find this ad anywhere online is from one singular eBay listing. It is the only ad I've seen show the entire line of toys the Jo-Bo came from.

This advertisement came from a trade magazine called Playthings. Playthings ran for a century, detailing the struggles of the toy industry, what new products were coming out and what companies were getting into the madness. Many toys you'll see in Playthings were either never successful or have since fallen into deep obscurity. Playthings, along with Toys & Novelties, another trade magazine, is an invaluable resource for toy research.

The Strong Museum of Play was donated a collection of every single issue of Playthings in 2010. Their libraries are accessible to the public and theoretically one could come in and view them. The problem is that The Strong is located in Rochester and I am located in Dallas. I am not taking a plane to Rochester just to read some 70 year old magazines about toys.

There are many, many pages of Playthings on eBay being sold individually. Along with them are pages from boating magazines, Sears catalogs, aviation magazines, you name it. It doesn't matter how rare the magazine is, these sellers have bought the collections at a high price as an investment. Their process is to disassemble the magazines and sell them page-by-page as ephemera or research materials. Some go as far as to cover the issue name and date to "protect their data." There is a finite number of these magazines available on the planet, with that number slowly dwindling. Many of these eBay listings are the only way the majority of people can see these articles and advertisements until a digitally-accessible archive is made.

I had attempted to contact the main seller multiple times, asking if he kept a backlog of scans available and if he'd be willing to share them so that these magazines could be available for research purposes. Repeatedly he ignored my messages until I was responded simply with "No, I won't help you." Other sellers were just as unhelpful. Their reasoning for not sharing these ads outside of eBay listings, as well as sometimes even covering portions of them, was "to protect [their] hard won (and expensively done at that) data. Knowing where what when etc is the name of the game."

You may have your thoughts on whether or not this is ethical in the matters of historical preservation given that these people have made an investment to make money to support themselves. Some may argue it's no different than having an antique booth. However, the difference between this and an antique booth is 90% of the time and antique booth is selling things that aren't exceptionally rare and can be duplicated with ease.

Thankfully, something can be done about it. At least a little.

This is an issue I had discovered and pondered about maybe two years ago. My idea was to scrape all of the current listing images, which are decent jpeg scans, and upload them onto the Internet Archive so they could be permanently accessible as well as searchable with OCR. I am very bad at coding so an acquaintance was very generous and scraped the account for me. Soon, I was left with 70GB of listing data including images and descriptions. My next step was to OCR these and organize them by the magazines that they came from (often listed at the bottom border of each page). I found another person to help with this, but the process was slow and ultimately the project was canceled.

Quite recently, I revisited the project. This was appropriate timing given that new pages had been uploaded about a year prior that I had not yet scraped. Still, I am bad at coding, but understand it better. My acquaintance had since lost the script that they used to scrape the listings but had said that it was so bad I wouldn't have wanted it anyway. So, after studying how eBay's web layout worked, I instructed AI to write me a python script to scrape all the listings it could.

This sounds easy, but surprisingly it wasn't. For some reason, eBay's search query caps off at approximately 10080 listings. This account had over 34000. To get around this, I had the script make individual searches for each year given that every listing has the year of which the ad came from. This came with its own issues. A search could come up with, say, 48 results, but continue to show more results after those first 48 that were not relevant. I have no idea why eBay works this way. So, I had the script only scrape the number of results that were actually there. This may seem like a no-brainer when reading this, but you shouldn't have to specify exactly how many links it's supposed to scrape.

Anyway, then the script compiled every link it found into a TXT file. I then used another script to scrape every listing image from every listing, named after the listing title. By the end of it I had about 61K images. I did my best to check for duplicates between this and the previous scrape (though missed a few) and combined the scrapes together.

The images are then zipped into individual zip files by year. The collection can be accessed here. By the end of it, 79,986 pages were scraped. Eventually, other accounts that are doing the same thing will have the same treatment.

Interestingly, sellers are not willing to help because they think that doing this will decrease the value of what they're selling. I don't understand this, because when they make the listing and scan the page, they're making that page accessible for everyone as long as the listing is up. They've already "decreased the value" of what they're selling. Some of these listings have been up for years and I've seen a few of the listings already used for reference by other toy researchers already.

Regardless, I hope this helps people research a bit better or at least gives someone a fun half hour of pouring through old ads. There's always something weird in there.

Thursday, December 12, 2024

The unfinished Pac-Man maze at the National Videogame Museum

In 2015, I was 12 years old. My parents had been restoring arcade games as a hobby in their spare time. We had a Dyn-O-Mite pinball machine that my mother had had since she was a kid, a Ms. Pac-Man, a Tron, and a Gorf. I don't remember if we had all of those at once, but those were our earliest machines if I remember correctly.

Anyway, around that time the National Videogame Museum was being built in the Frisco Discovery Center. I can't remember how they got involved, but near the beginning my parents became the museum's technicians. I remember coming with them to a dinner at the Outback Steakhouse with the museum's founders and some of the team trying to figure out what they should name the museum's arcade. My kid self thought the name they picked, Pixel Dreams, was one of the stupidest ideas I had ever heard. I've grown to like it over time.

I remember watching a woman paint the mural on the wall for the consoles. She spent hours standing on an extremely tall ladder and I would pester her with stupid questions about video games. If you're reading this, sorry 12 year old me wouldn't shut up!

I nearly grew up in that building. My parents would take me with them and spend hours repairing machines in the half-built arcade and I would lay on the floor. It was interesting, but over time it got very boring. The museum's founder, Joe Santulli, was very kind to me and I loved talking to him. I haven't spoken to him since the museum's opening party in 2016(?), but I always looked forward to seeing him when he was at the museum.

Anyway, I remember when they were building the arcade, cutting foam for the pixel stripes and airbrushing the planets on the wall, they were also considering painting a Pac-Man maze on the ceiling. They got as far as drawing a sketch of it and then they just... didn't go through with it. I don't recall why. Instead, they put one of those party projectors on the floor in the core behind the machines where nobody could see it. It would project those moving neon lights across the ceiling. The kinda thing they put at raves and company parties. I'm not sure if it's still there anymore, but they didn't keep it on for very long.

If you ever visit the museum, look at the ceiling at Pixel Dreams! Part of the maze is still there. I'm not sure if anyone else has really talked about it.

https://cdn.discordapp.com/attachments/474426322526011405/1316955081924415498/20241208_181250.jpg?ex=6762337f&is=6760e1ff&hm=5ac5ced77fdf3b27b5c044780c71a66a516201d48e57094c4d7ba976ff3b6ebb&

I have more stories about being raised at the museum but they're not as fun. Just small stuff like cataloging the boxed Atari games that are at the game crash exhibit or donating a book of mine to be used in the bedroom. I also have an unused Pong console painted gold that was intended to be used in the sculpture in the foyer. I'm not sure where that went. It's probably in the attic.

Scraping eBay so I (and you) can research toys properly

Can't say I've met many people who like researching old toys. As for me, I'm a sucker for a weird old toy and love figuring out ...