https://iremdb.com
I finally managed to get something out ... This has become a much harder effort that I could have ever imagined.
Having to both index magazines (currently at 17k pages) and build that horrible website, has become a very hard job, Hope it somehow people enjoy it.
I got the basics out, list platforms, publications, magazine, search feature, and a basic viewer.
Some things you can find there:
All pages that contains Super Mario Bros 3 : https://iremdb.com/platforms/nintendo-entertainment-system-nes/super-mario-bros-3
All famous Sega ads "Genesis does what nintendon't": https://iremdb.com/search?mode=ads&tags=123
All John Madden Football ads: https://iremdb.com/search?mode=content&games=38633&contentTypes=Ads
All content published by "Andromeda" editor from Gamepro: https://iremdb.com/search?mode=editors&editors=43
The entire "Legend of Zelda" comic strips from nintendo power published over 12 issues https://iremdb.com/search?mode=content&tags=228&contentTypes=Comic
I need now to integrate a user management into the site, so people can start contributing if they wish.
Some progress I made in automating the whole thing:
I had managed to do some neat tricks such as automatic ads detection, given that 50% of those magazines are ads, I used a perceptual hash algorithm to finger print each ad page, and for each new issue, it simply looks in the database and usually get 80-90% of all ads indexed (it does miss about 10%, and there's always some new ads on each issue).
I have also played with using custom object detection, and it works fine, for instance I automated all detection of gamepro reviews by looking for those silly faces, problem is that data labeling to train a model is as expensive in time as actually, you know labeling the page
I had set a goal to release this to public as soon as I reached March 1993 (exact 30 years ago) in magazines indexed.
I'll keep pushing new issues, and adding more features, need to get user integration or this is just a waste of my time.
Once again, if you want to participate ping me again, I had to shut the other site down.
I may also have to one day keep the viewer behind a login screen, the site has never been shown to anyone else but somehow hundreds of bots from some countries are just crawling it nonstop, if I want to keep CDN costs at bay, I may need to just let the thumb images to be public, other pages, and future features such as like, collection, custom searches, to be only for registered users.
Constructive (does not need to be good, just constructive) Feedback is welcome