Jump to content

Mission to index everything


retroprime

Recommended Posts

Hi folks, first thank you so much for this amazing trip to memory lane. I always wanted to check some of my childhood game magazines.

 

So wanted to share a weekend/nighttime project I started. I wanted to catalog each retro magazine out there. I saw an entry here on the database, but just for fun I put up a magazine manager that I made my best to make it so quick to tag/index pages (uses a lot of keyboard shortcuts for instance).

Over the last 50 days or so while building it I manage to index three main magazines (Nintendo Power, EGM and GamePro) up to April 1991 (so far)

This yields about 6452 some fun stats about it:

2323 pages were ads

2366 general content

1698 game reviews

65 covers

So tagging is quite quick, but I won't be able to do all by myself. so I was hoping others would like to help. Please take 3 min to watch a sample video of the app I made hoping it gather some attention:

 

 

So, what would I like to do with that data? Give it back of course, should be free, open for anyone like TGDB for example (which I used to create the game finder database to find and match the games on the magazines)

 

The app is not really available yet, I'm still setting up permissions, there are still a few annoying bugs that needs fixing (some minor UI glitches - I am NOT a UI developer by any chance, so apologies on the raw look)

 

But with Holidays, I hope to have something ready by end of January, where folks can log in, and start contributing, and also a pipeline to dump the data daily into a CDN for people to download.

 

I thought at first to create a front end site, but I may not have time for that, and it would make more sense to live here.

 

So, if you like what you saw, please, send me a DM, I'd love to get more people involved in curating this data and making it publicly available, in any case, I'll just keep tagging them, and making daily dumps until I'm done one day (may take a few years :) I'm churning something like 800 pages a week, there are 110k pages on all those 3 publications)

Happy Holidays and hope someone joins this :) 

 

  • Like 4
Link to comment
Share on other sites

  • Retromags Curator

Oh my god. I have wanted this sort of functionality for this site for years. I've wanted users to be able to search for a game and then have all the issues and pages (even ads) where that game is mentioned or featured. Or bring up PDF pages in the search results in the world where we have PDF versions of all our scans accessible directly on the site.

This is pretty awesome.

  • Like 1
Link to comment
Share on other sites

  • Retromags Curator

Ho. Lee. Cow! 😮

This is, like, literally what I think everyone who visits here would like to see the site offering. I'm EXTREMELY intrigued, and as one of the database mavens, I could see myself putting some time into an app like this!

Also, I think @Rando1975 would be a prime choice to assist with your project. That guy's an indexing machine. :)

*huggles*
Areala :angel:

  • Like 1
Link to comment
Share on other sites

Hi folks, really glad you enjoyed, gotta start working on this to be ready for prime time ;)

So here's a link for the current database export (this uses a mongo db cloud atlas, so everything is a huge json)

https://openscans.nyc3.cdn.digitaloceanspaces.com/db/GamaMagazineExport.zip

Here's what I need to finish before getting this opened up for collaboration

 

  1. Automate the db export data
  2. Fix some login issues
  3. Provide minimum error handling for users (right now if something goes wrong I know where to look in the logs, not friendly)
  4. Get a basic magazine viewer in place (goal is to use the Internet Archive amazing book reader https://openlibrary.org/dev/docs/bookreader, but that will take time to integrate, need a step stone solution in between)
  5. Craft the approval workflow (once a user completes a review, an admin needs to approve)
  6. Get the demo site up and running (I want a demo site for people to try and play, data will be purged daily and restored to latest backup from the main site)

 

I believe most of those if not all should be completed in the next 2-3 weeks, I will then keep you posted and start handling some users for the demo site, we can then start working on the main site to curate the data.

Like I said before, what we do with this data is really up for grabs, not sure how much work would needed to integrate here, but once we have the data, everything else becomes easy, we just need to present it on a relevant manner.

Looking forward for more collaboration

 

  • Like 1
Link to comment
Share on other sites

Hi folks, really glad you enjoyed, gotta start working on this to be ready for prime time ;)

So here's a link for the current database export (this uses a mongo db cloud atlas, so everything is a huge json)

https://openscans.nyc3.cdn.digitaloceanspaces.com/db/GamaMagazineExport.zip

Here's what I need to finish before getting this opened up for collaboration

 

  1. Automate the db export data
  2. Fix some login issues
  3. Provide minimum error handling for users (right now if something goes wrong I know where to look in the logs, not friendly)
  4. Get a basic magazine viewer in place (goal is to use the Internet Archive amazing book reader https://openlibrary.org/dev/docs/bookreader, but that will take time to integrate, need a step stone solution in between)
  5. Craft the approval workflow (once a user completes a review, an admin needs to approve)
  6. Get the demo site up and running (I want a demo site for people to try and play, data will be purged daily and restored to latest backup from the main site)

 

I believe most of those if not all should be completed in the next 2-3 weeks, I will then keep you posted and start handling some users for the demo site, we can then start working on the main site to curate the data.

Like I said before, what we do with this data is really up for grabs, not sure how much work would needed to integrate here, but once we have the data, everything else becomes easy, we just need to present it on a relevant manner.

Looking forward for more collaboration

 

  • Like 1
Link to comment
Share on other sites

Oops sorry, somehow old content was loaded on editor. 

Wanted to give some updates, I made a lot of progress, and I think I'm very close to opening the site.

All major problems have been addressed and I have the workflow ready. Automated the export of data (just not on cron yet, but the code exports it daily to https://openscans.nyc3.cdn.digitaloceanspaces.com/db/openscans.zip I just dumped new data there (old link was broken sorry)

I'm now working on having proper DNS and the demo site (I setup a backup cluster for the data to open a demo) and finally get some basic help page with the many keyboard shortcuts I used to make tagging of the magazines as quick as humane possible ;)

In between coding I have also indexed May/June 1991 :P 

Confident I'll have a website to share with interested folks by early Jan.

Happy Holidays

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

Hi folks, made a ton of progress and finally reached 10k indexed pages :D Site is up and running here: 

https://demo.openscans.org -> Please ping me for a username

That is a demo site, data is erased frequently, but I thought it would be better if folks can get a grip of the app before starting modifying the contents.

As promised, the data is always exported (every Friday for now) to the CDN link shared before. I'm looking forward to discuss what options we may have to publish that data, it would require a new site, not the one I built for this.

And here it is a long help (sorry about that) on how to use the site.

I'm sorry to upload a PDF, but I wrote a markdown file and did not plan properly how to present it.

 

README (1).pdf

  • Like 1
Link to comment
Share on other sites

On 1/5/2023 at 2:37 AM, retroprime said:

Hi folks, made a ton of progress and finally reached 10k indexed pages :D Site is up and running here: 

https://demo.openscans.org -> Please ping me for a username

That is a demo site, data is erased frequently, but I thought it would be better if folks can get a grip of the app before starting modifying the contents.

As promised, the data is always exported (every Friday for now) to the CDN link shared before. I'm looking forward to discuss what options we may have to publish that data, it would require a new site, not the one I built for this.

And here it is a long help (sorry about that) on how to use the site.

I'm sorry to upload a PDF, but I wrote a markdown file and did not plan properly how to present it.

 

README (1).pdf 32.93 kB · 3 downloads

Exciting! Sign me up :)

Link to comment
Share on other sites

So someone figured out how to monetize this stuff. I guess it was inevitable as the pool of data grew and the sense of it being copy-left-for-dead gained traction. On the internet someone will always find a way to make a fortune from the tidbits and tatters of other people's individual, seemingly worthless efforts.

It would take a while to inventory the popular games, but eventually the site would have astronomical bandwidth costs. Hundreds of thousands of dollars at least, image galleries aren't cheap to run and a reviews site as such would be going head-to-head with Metacritic etc. A non-profit like this could never make it in the cloud... would need a privately hosted server with several dedicated lines, else the site would die as soon as Kotaku wrote about it.

Better leave NP out of it. Nintendo is very serious about their IP (unless you intend to bribe them). Official Playstation also. There is also the issue that the very copyrighted content of the mags themselves is now newly viable and presumably valuable.

Contributors of scans should also be consulted individually, because they offered their work under the expectation that the scans would be packaged as whole mags, and even the belief it wouldn't be heavily monetized.

I always expected that GOG would end up owning these scans before the end. This is probably how that outcome develops, because there's no way in hell a Patreon of casual users could keep this online.

  • Like 1
Link to comment
Share on other sites

I'm not sure if I follow your comments @tcaud, right at the start I posted a link with the whole data being available to anyone to download. Just like I have downloaded the TGDB database they offer as the way to link the games during the search to each page.

Just to start, I have zero expectations of making any money, fortune I think you may be overselling the number of people who actually cares about retro magazines.

As per IP, well I know Nintendo is a big A*hole when it comes to IP, but the way I see, although this is not different than the Internet Archive that parsers each PDF page extract OCR information and allow search (I've done that too), it is just a way to tag inventory and allow more meaningful searches such as "All games by publisher X" or "All reviews by a given editor" those things are not currently possible.

Initially I downloaded the magazines from other sites such as Internet Archive or retrocdn.net but they were pdf which required an extra step on breaking into individual jpegs. But the point being that it seems that unless otherwise required by the original publishers this almost became public domain.

As per costs, again I think you are really overestimating the costs of this kind of effort. I don't believe this to be ever more popular than this website, I reached out actually to partner and make something that can co-exist not compete. 

But sure, there are costs, which I have not figured out how to pay, but I would just put some ads on the site. The total cost should never go above 60 bucks a month, and with enough page views, ads could offload some of that cost. I'm comfortable in taking some of the toll, not here to make a profit (unlike your thinking).

Right now the total cost for it is 15 USD. It has a 250gb CDN storage (using 140gb), a shared database as a service that costs $10 month, and should probably be ok for a few million requests a month. The app itself runs as a serverless app, which should cost about $5 a month for 10 million requests. 

Hope this shows you and others what are the true intentions here

 

  • Like 1
Link to comment
Share on other sites

If you're running on one of the big 10 hosts then you might want to check the TOS. They don't like image galleries on their servers.

I think there's a lot of risk in making the data in these mags more accessible. It becomes more valuable which invites copyright claims where before it is aggregated it's seen as more benign and personal, "preservation". (lol like you don't know that, obviously you do). I'm not faulting you for your app, tho I wonder why you didn't just make a Github page for it to demo it.

Requests aren't the issue, bandwidth is. I'm not surprised that Retromags has remained mostly obscure, because looking for specific game ads/reviews is something akin to looking thru a needle in a haystack (and most of the ads and reviews for the big games have already been extracted by fans and posted to the relevant CDNs) but yeah I'm not so naive as to think this particular service wouldn't just explode in popularity because the games in question are still being sold or at least available. Whatever, I didn't make the scans so not my problem. It's just my 2 cents. I was planning to introduce an app to ease translation of the Japanese mags, which will probably get less crowd attention on account of this one.

  • Like 1
Link to comment
Share on other sites

The images are being hosted on a CDN backed by S3 which only really cares about bandwidth. Traditional hosts services will care about that but not any cloud provider out there. There's a 1TB allowance per each 250GB of data stored, which means about 1M pages could be downloaded per month. If I ever get to have that problem then I think it would be a nice problem to have :) 

 

 

Link to comment
Share on other sites

  • Retromags Curator

Worse comes to worse, you can make the data in these magazines accessible without giving the likes of Nintendo a reason to shut it down. If images of the pages tick them off, you can always list what the pages contained without using the exact text on the page.

In any case I think this project is a long way from getting that sort of unwanted attention!

  • Like 1
Link to comment
Share on other sites

Thanks @E-DayI also believe that, I honestly think this is more a way to let people find information than anything else. I don't believe it will drive the fury of attorneys out there. And even if it does, there's a case for both preservation and information retrieval here.

Anyways, I keep just doing what I have been doing for the past 60 days, every morning before work, I try to index 1 magazine, and that has been the goal so far. I literally doubled the number of pages since my initial post. And whenever I have some free time I work on what could be a site to host it. 

The demo site is still a few versions behind, I have not automated any CI/CD pipeline to publish changes, will eventually get there, but so far just keep indexing those pages :D 

Link to comment
Share on other sites

The closest precedent we have to this is the case of the ROM managers, which mainstreamed emulation and began drawing the attention of game companies to piracy. History shows that when something becomes "mainstream" it begins disrupting existing macro behavioral patterns (particularly economic ones). This is definitely disruptive technology.

One additional thing I don't understand: why not just add tags to the images already in the Retromags gallery? I mean it's kind of arrogant to just appropriate other people's work by starting your own independent thing when there's already something going and the leadership isn't being particularly abrasive.

You can't play the preservation card because the preservation community is already established and the courts will expect you to obey its norms. That means not turning a profit with these scans.

Make sure you ask Marktrade, KitsunieB, and Kiwi before using any of their scans. Kiwi in particular has his own competing effort which he takes very seriously. Those Videogame Preservation Project scans probably also have their own terms attached. The safest thing you can do is simply go to Patreon and ask for money... few will trust you if you use ads because your income won't be verifiable (among other reasons). You will have to work to earn my trust... I expect you to use Retromags' scans and IA folkscanomy to get things going, then when your Alexa ranking is up and people start to pay attention you make a deal with the publishers to get a license and buy out or replace the scans by people who object to your deal making ...this is exactly what I assess your intentions to be. I don't believe for a second your motives are altruistic. Not for a moment.

  • Like 1
Link to comment
Share on other sites

Ok I think I may have had enough. Just to start I couldn't care less what you think of my motives. If you continue with accusations and just being plain nasty, you don't have (nor I would like you to) be part of this.

"my not just add tags to the images already in retromags gallery" -> That is EXACTLY what I am doing, maybe you can't understand but in order to correlate the data I need the images in first place. Every single image has an unique name that can be traced back to the original image, the database refer to it, it is a no brainer get the dump of the data and use it. But I don't have access to the retromags db, and I believe the app I wrote was made with the clear intention of making indexing quite fast with all the shortcuts in place, something that would take way more time if done via this website which seems to be a community based platform, not built for this kind of work.

I said this before, all those images are available via retrocdn or internet archive. If someone feels so strong about their scan, they can reach out to me. But that is a very thin line to walk, because the same can be said about the publisher. That is what you really don't get it  do you? You also complain about other people's work, and I'm taking advantage of that. Do you ignore the time it takes to index one magazine? Or to build the site? Are you really that selfish? I'm done talking to you, there's always one person that can't simply accept that others want to improve upon work, they are sour, and I believe that you have said yourself your true motives "my app will get less crowd attention" Don't be like this, build your app to improve something not start trashing others.

 

Link to comment
Share on other sites

  • Retromags Curator

Any scan from this site can be used, regardless of who scanned it. So if a marktrade or kiwi scan is used that was downloaded from this site, it is irrelevant. When you upload something here, you can also upload it anywhere else you want but you also accept that the copy here is now property of the site and can end up being used here there and everywhere. If you are scanning and releasing scans for personal recognition then you are not doing it for the right reasons.

All we ask is to be credited as a site for the scan.

  • Like 1
Link to comment
Share on other sites

I'm really excited to see what this tagging project turns into.  All of this data being available on a central repository would be amazing.  Even better if the tag database is available for anyone else who wants to build upon it and incorporate it into their own project.

  • Like 2
Link to comment
Share on other sites

8 hours ago, E-Day said:

Any scan from this site can be used, regardless of who scanned it. So if a marktrade or kiwi scan is used that was downloaded from this site, it is irrelevant. When you upload something here, you can also upload it anywhere else you want but you also accept that the copy here is now property of the site and can end up being used here there and everywhere. If you are scanning and releasing scans for personal recognition then you are not doing it for the right reasons.

All we ask is to be credited as a site for the scan.

Thanks again @E-Dayrest assured that every single page shown, will have a source link to this site, even to the specific issue if you desire. I'm planning to let users either search (main feature) or say browse a magazine, but with the capability to filter out pages since they are tagged (so one could read an issue and filter out all ads, mail, and just see content)
For that use case I intend to make the retromags image (that is present in several magazines) to stick as an "ad" for a few seconds before the magazine loads, it is my way to give credit where it is due.

If you recall, I'm still working on the magazine viewer, the internet archive one turned out to be a lot of trouble to work with. But as soon as I get something off ground you can test on the demo site.

 

  • Like 3
  • Thanks 1
Link to comment
Share on other sites

  • 4 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Recent Achievements

    • pbsk8 earned a badge
      Dedicated
    • hoogathy earned a badge
      Member for 1 Year
    • hoogathy earned a badge
      Member for 6 Months
    • Adeel earned a badge
      Member for 7 Days
    • Adeel earned a badge
      Member for 1 Day
×
×
  • Create New...
Affiliate Disclaimer: Retromags may earn a commission on purchases made through our affiliate links on Retromags.com and social media channels. As an Amazon & Ebay Associate, Retromags earns from qualifying purchases. Thank you for your continued support!