@Codeberg@social.anoxinon.de
Actually since yesterday I'm pondering about the idea to build a #federated version of stackoverflow, nothing written yet, I'm reading, researching.
Also, right now I was checking this stack exchange sqlite db under CC BY-SA 4.0 to check how useful and doable would be import this data and using as a base for the federated version.
Also wondering if we could use this data somehow to train our own opensource AI to help the community, but I'm do not have knowledge on LLM/AI things. Please if there is any expert I would appreciate the opinion on that.
https://seqlite.puny.engineering
EDIT:
A better place to download the dump content, with more interesting tables, like the one with the Votes; the other link the dumped data only contains two tables Users and Posts. Right now downloading the whole data related with stack overflow, and will take some time due my humble home internet connection, so I didn't have the chance to take a look at the data, but I guess that's the interesting thing.
here:
https://archive.org/download/stackexchange
Codeberg was asking about this. The linked toot by a commenter points to :
Yep! It seems a good Threadiverse ecosystem could be on its way with lemmy etc, nodebb and discourse. Hooking a stack overflow alternative into that could make a lot of sense of kick starting it.
Though at some point UI differences could prove problematic(?)
Yes, forum platforms too (incl #nodebb of course).
I do get the (very vague) impression discourse is focusing on integrating well with masto to a good extent and so might not integrate too well with the other Reddit/forum platforms. If true, that might be a good enough reason to start with another base. OTOH, it’s a familiar platform to many devs so adapting it for stackoverflow like use could go well right?
IMHO stack exchange is basically reddit/lemmy with hand cuffs because no threaded discussions and every other question is closed as off topic. I don’t understand what another stack exchange would buy anybody.
I guess one thing stack exchange does well is “related questions” and tagging, but… I dunno. (shrugs)
With a few more additions, lemmy could serve as a good replacement. We already have a Forum / NewComments sort which is perfect for question / answer type communities. We could add a feature to make default sorts for specific communities, so they would feel less fast, or possibly a sort that brings zero comment posts (IE meaning unanswered), to the top.
The reputation and “accepted answer” features from SO are a lot less important than threaded comments can be, especially since questions often need new answers every year, making the “accepted answer” pointless.
Especially with Lemmy getting support for plugins soon, I don’t see the need for making a new platform
A new sorting method for “unanswered” is a cool idea. I’m not sure if it’s quite as simple as just finding posts with 0 comments, because people can put additional questions in the comments but it’s still unanswered. Also how do you sort them for posts with the same number of comments/answers. But this is definitely something that a plugin could handle.
I saw someone else suggested we could just put “[unanswered]” in the title and then edit the title to “[answered]”
And guess what, it can be done just as easily, if not, more easily on a federated instance. You don’t gain at real additional control over your data (and no putting “covered under license X” is about as realistic as those Facebook posts saying “I don’t give anyone access to my posts”).
I’ve said this before and I’ll say it again, realistically the only way to control your data from AI is a DRM type solution which everyone fundamentally hates.
Useful constraints would focus discussion to keep questions/replies brief, relevant, and hopefully helpful, wouldn’t they? I just wonder how up and downvoting would work since that would go very differently from Lemmy.
I’m sure this has been solved already but I’m just wondering how you ensure people are voting based on the helpfulness and/or merit of the response. That’s the ideal on Lemmy but it’s obviously not always the case here. Presumably, you’d have to be logged in on the other platform to vote but you can just see the discussion from Lemmy, I guess?
It seems to matter for the users at Stack Overflow. And why should anybody give anything for free to the crooks in Silicon Valley. All they do is create technology designed to extract value out of people and give as little as possible back.
Because that’s the nature of FOSS. The good news is, if they trained on you data that’s licensed CC BY-SA (as all SO content is), then you can request their source code, and they legally must provide it.
It’s not about privacy. It’s about AI companies stealing other peoples work and knowledge and profiting. Like what they did with artists. And I think that’s bothering a lot of people. It’s kind of sad that we cannot exchange information with each other for free, without some Silicon Valley crooks taking advantage and trying to convert other people’s good will into profit.
These LLMs are also polluting the web with AI junk and slop. The web is absolutely tainted with shitty ChatGPT text and images, making it harder and harder to find authentic information. I think a lot of people don’t want to contribute with that.
@lemmyreader Here’s a starting point for a fediverse StackExchange: Make sure it’s interoperable with Lemmy.
Now, you may not get the full feature set on Lemmy, but you should be able to interact with it from Lemmy as if it’s a group on there.
#StackExchange #Fediverse #Coding
@ajsadauskas @lemmyreader
Yep! It seems a good Threadiverse ecosystem could be on its way with lemmy etc, nodebb and discourse. Hooking a stack overflow alternative into that could make a lot of sense of kick starting it.
Though at some point UI differences could prove problematic(?)
What about Discorse? https://meta.discourse.org/t/activitypub-support-phase-1-rfc/132624 @maegul @ajsadauskas @lemmyreader
@weirdwriter @ajsadauskas @lemmyreader
Yes, forum platforms too (incl #nodebb of course).
I do get the (very vague) impression discourse is focusing on integrating well with masto to a good extent and so might not integrate too well with the other Reddit/forum platforms. If true, that might be a good enough reason to start with another base. OTOH, it’s a familiar platform to many devs so adapting it for stackoverflow like use could go well right?
Honest question: Why?
IMHO stack exchange is basically reddit/lemmy with hand cuffs because no threaded discussions and every other question is closed as off topic. I don’t understand what another stack exchange would buy anybody.
I guess one thing stack exchange does well is “related questions” and tagging, but… I dunno. (shrugs)
With a few more additions, lemmy could serve as a good replacement. We already have a
Forum
/NewComments
sort which is perfect for question / answer type communities. We could add a feature to make default sorts for specific communities, so they would feel less fast, or possibly a sort that brings zero comment posts (IE meaning unanswered), to the top.The reputation and “accepted answer” features from SO are a lot less important than threaded comments can be, especially since questions often need new answers every year, making the “accepted answer” pointless.
Especially with Lemmy getting support for plugins soon, I don’t see the need for making a new platform
A new sorting method for “unanswered” is a cool idea. I’m not sure if it’s quite as simple as just finding posts with 0 comments, because people can put additional questions in the comments but it’s still unanswered. Also how do you sort them for posts with the same number of comments/answers. But this is definitely something that a plugin could handle.
I saw someone else suggested we could just put “[unanswered]” in the title and then edit the title to “[answered]”
Default sort would be great. Especially for sports events. But I don’t want Lemmy to become an answer repository, keep it as a link aggregator
You missed the StackExchange and AI story this week ?
And guess what, it can be done just as easily, if not, more easily on a federated instance. You don’t gain at real additional control over your data (and no putting “covered under license X” is about as realistic as those Facebook posts saying “I don’t give anyone access to my posts”).
I’ve said this before and I’ll say it again, realistically the only way to control your data from AI is a DRM type solution which everyone fundamentally hates.
I don’t think this can be solved with any type of technology. It needs legislation. These AI companies need regulation.
Federated Stack Exchange isn’t harder for AI to eat. If anything it’s easier.
Useful constraints would focus discussion to keep questions/replies brief, relevant, and hopefully helpful, wouldn’t they? I just wonder how up and downvoting would work since that would go very differently from Lemmy.
how so?
I’m sure this has been solved already but I’m just wondering how you ensure people are voting based on the helpfulness and/or merit of the response. That’s the ideal on Lemmy but it’s obviously not always the case here. Presumably, you’d have to be logged in on the other platform to vote but you can just see the discussion from Lemmy, I guess?
deleted by creator
Oohhh. Seeding the alternative with all the old data, if possible, could be an awesome move here!
How could anybody stop the AI robbers from stealing content from the fediverse?
Why does that matter? The content is licensed CC BY-SA. The point here is to prevent AI answers.
It seems to matter for the users at Stack Overflow. And why should anybody give anything for free to the crooks in Silicon Valley. All they do is create technology designed to extract value out of people and give as little as possible back.
Because that’s the nature of FOSS. The good news is, if they trained on you data that’s licensed CC BY-SA (as all SO content is), then you can request their source code, and they legally must provide it.
This is a good thing.
robots.txt may help : https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website or blocking by IP addresses.
deleted by creator
deleted by creator
It’s not about privacy. It’s about AI companies stealing other peoples work and knowledge and profiting. Like what they did with artists. And I think that’s bothering a lot of people. It’s kind of sad that we cannot exchange information with each other for free, without some Silicon Valley crooks taking advantage and trying to convert other people’s good will into profit.
These LLMs are also polluting the web with AI junk and slop. The web is absolutely tainted with shitty ChatGPT text and images, making it harder and harder to find authentic information. I think a lot of people don’t want to contribute with that.