GitHub - DanielS684/LemmyFediverseData
github.com
external-link
Contribute to DanielS684/LemmyFediverseData development by creating an account on GitHub.

I was recently inspired by this post made by @mookulator@mander.xyz to get the subscription statistics for a list of Lemmy instances.

I did this by scraping the data from the Communities tab from every Lemmy instance listed on the awesome-lemmy-instances from GitHub. So all of this data is available publicly.

I separated it as follows:

Local instance: The instance that the data is being scraped from

Community: The name of the community

Community instance: The instance that the community is hosted on

Local Subscription count: The subscription count of that community coming from the local instance

If the Local instance is equal to the Community instance the Subscription count is actually the total number of users subscribed to that community across the Threadiverse.

Since I was web scraping these websites the data is a bit rough because I had to convert stuff like 42K into 42000 so it isn’t going to be 100% accurate.

Also, this scrape doesn’t include instances that weren’t on the list when I pulled the CSV and alternatives to Lemmy like Kbin or Mastodon users subscribing to Lemmy communities.

This was gathered over the course of a day starting from 12:00 PM EST to about 7:00 PM EST today.

The data could be better if I used the API or added the information from lemmyverse.net on the total subscriber counts but I spent a lot of time on this as is and don’t know how to use the API.

I hope someone uses this to make a data visualization of subscription patterns for Lemmy because I would really like to see that.

P.S. On the post that inspired this post, there was some discussion about whether Lemmy users would like this to be done. So if you guys don’t like it I will delete the data.

I don’t think subscriber count is a useful metric, look here, subscriber count keeps growing but number of comments made decreased by a lot (apparently reddit was hit hard from the API changes).

I don’t know what you can do with that kind of data, but maybe doing the same thing for active users is better.

Yep. These graphs tell the story.

@Danterious@lemmy.dbzer0.com
creator
link
fedilink
2
edit-2
1Y

Yeah, I was thinking of scraping the active users as well but from what I observed when trying different instances, the active users aren’t counted separately by instance so the active users would just be all the active users on that community no matter where they are from. That info is already available on lemmyverse.net so I didn’t want to copy it.

I bet there is a way to do this with the Lemmy API but I don’t have a good understanding on how to use it so I am just waiting for someone more knowledgeable than me to try this again but with more care.

Edit: Unrelated but I went through the subreddit stats for a bunch of subs and it seems like posts and comments for a lot of them have dropped off after the API changes so that seems bad for Reddit and good for us.

kersploosh
link
fedilink
4
edit-2
1Y

Edit: Dang, the pictures didn’t come through very readable. Sorry. I’m going to leave it like this for now. It’s late and I need to go to bed.

Edit 2: It looks like the pictures can be enlarged for better viewing in Jerboa but not on desktop. Weird.

Interesting. I pulled your .csv into a spreadsheet to tinker a bit (though I really should be sleeping right now). I’m not entirely sure what to do with it, but here are a few basic charts I threw together for kicks. Maybe someone more code savvy than me can create an interactive tool to sift through all the data.

Most subscribed communities across Lemmy:

Here’s a comparison of where the subscribers originate for two of the big technology communities. What are all those lemmy.world users doing subscribed to a Beehaw community? Maybe those are lemmy.world that had subscribed before Beehaw defederated from lemmy.world?

Thanks for the graphs

@Danterious@lemmy.dbzer0.com
creator
link
fedilink
1
edit-2
1Y

Thank you! The !technology@beehaw.org chart is surprising. I think what you suggested is the most likely reason why that’s happening. Also maybe later I’ll put up the code that I used to scrape the data so you guys can double-check if there were any bugs.

Edit: wording

Ada
link
fedilink
41Y

Speaking as an admin, it’s really interesting data! Thank you for that.

freamon
link
fedilink
31Y

The data could be better if I used the API or added the information from lemmyverse.net on the total subscriber counts but I spent a lot of time on this as is and don’t know how to use the API.

Your approach may have ended up being for the best. lemmyverse.net can’t index lemmy.world (broken DB => broken API => “well, we didn’t want to be part of the fediverse anyway”), and if you’d tried to use their API yourself, it might have totalled your project the same way it did for lemmyverse.

So there’s a fancy graph to be made here!

I hope so.

P.S. On the post that inspired this post, there was some discussion about whether Lemmy users would like this to be done. So if you guys don’t like it I will delete the data.

Appreciate the express concern over privacy etc!!

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of “federation” and “universe”.

Getting started on Fediverse;

  • 0 users online
  • 6 users / day
  • 1 user / week
  • 70 users / month
  • 614 users / 6 months
  • 1 subscriber
  • 964 Posts
  • 13.2K Comments
  • Modlog