You are not logged in. Your edit will be placed in a queue until it is peer reviewed.
We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.
-
24I think this might be a misunderstanding of CC BY-SA. I don't believe CC BY-SA requires the Stack Exchange company to make all content available to everyone at no charge.D.W.– D.W.2023-07-26 20:11:32 +00:00Commented Jul 26, 2023 at 20:11
-
5@D.W. it doesn't require them to, but any attempt to make these companies pay could be easily circumvented by someone downloading the data dump and republishing it. This would be within that person's legal right. Will SE try to stop that?Someone– Someone2023-07-26 20:12:40 +00:00Commented Jul 26, 2023 at 20:12
-
7That seems like a different question to me. I would suggest that you ask about that specific situation. Right now the answer seems to contain faulty premises or misleading information.D.W.– D.W.2023-07-26 20:13:32 +00:00Commented Jul 26, 2023 at 20:13
-
2I don't believe anyone training an AI model has attempted to argue they are using the content under the terms of CC-BY-SA, rather they think it's all fair use so the license doesn't matter.Bryan Krause– Bryan Krause2023-07-26 20:13:51 +00:00Commented Jul 26, 2023 at 20:13
-
10What companies can use for training data, what requirements there are for the use and attribution, and all the other legal stuff around any of that is only going to be settled by court cases, or legislative acts (probably then interpreted by more court cases). What SE wants, in terms of guard rails or payments, and what the users want, think or believe, is likely to end up meaning nothing, or everything, once the courts have their final say. Probably in another 6-8.Chindraba– Chindraba2023-07-26 20:18:11 +00:00Commented Jul 26, 2023 at 20:18
-
Related: Does Stack Exchange Inc. intend to sue firms that use language models that are partly trained on Stack Exchange? and Is it illegal for a firm to train an AI model on a CC BY-SA 4.0 corpus and make a commercial use of it without distributing the model under CC BY-SA?Franck Dernoncourt– Franck Dernoncourt2023-07-27 14:39:02 +00:00Commented Jul 27, 2023 at 14:39
-
@D.W. In a sense you're right , but practically speaking that is not correct. They can remove content fully at their discretion, so technically in that sense there is no requirement for them to make all contributed content available for free.BryKKan– BryKKan2023-08-21 04:04:02 +00:00Commented Aug 21, 2023 at 4:04
-
@D.W. However, if they share the data with anyone, they do have to share it with everyone on request. They do not have to release it in identical formats, nor specifically share any unique sub-collections. It is sufficient that they publish the full raw dump, in some readily parseable format. A full dump is intrinsically a superset of such private collections, and so they could treat them proprietarily. But they do have to make the data available, for free, if they use it anywhere. So if they don't publish full dumps, then they would be required to offer the other collections for free.BryKKan– BryKKan2023-08-21 04:14:02 +00:00Commented Aug 21, 2023 at 4:14
-
@BryKKan No; they aren't obligated to share it with anyone. The fact that they plan to limit who they give the DB to directly is OK; I was just making sure they don't plan to try to stop those who do have it from passing it on to others. If they choose to stop OpenAI from downloading it, that's fine; if they let me download it but then try to stop me from sharing it with OpenAI if I want to, that's not OK. Because doing the "OK" part but not the "not OK" part doesn't accomplish much for them, these plans call into question whether they might consider doing the "not OK" part.Someone– Someone2023-08-21 04:28:42 +00:00Commented Aug 21, 2023 at 4:28
-
@Someone Have you actually read the CC-BY-SA license? Because there is no practical sense in which that is true. You're entirely right about the restrictions on redistribution, and I explicitly noted that in the previous answer I linked to. I think I see where you're coming from in fear of "muddying the waters", but to my mind these are steps along the same slope. We need never let it get "that far". IFF they use contributions (in any way that gets published), they are obligated by the "SA" part of the license to share the whole thing that results. Only a larger "full dump" substitutesBryKKan– BryKKan2023-08-21 04:47:08 +00:00Commented Aug 21, 2023 at 4:47
-
@someone (continued) This answer gives a detailed explanation of the applicable license (CC-BY-SA 4.0), with references and verbatim quotes of the license text. Most of what you're thinking of would fall under the definition of an "adapted material", so you can skip to that if you like. Also note that this is specific to the "SA" series of CC licenses. Not all Creative Commons licenses require such "resharing", but the one which applies to SE contributions does.BryKKan– BryKKan2023-08-21 04:55:17 +00:00Commented Aug 21, 2023 at 4:55
-
@BryKKan if they try to get data dump downloaders to agree not to share the dumps, or if they apply DRM, then yes, that is illegal. If they only limit who can directly access them, that is legal. Do you believe that private beta sites are violating this because only certain users can access them?Someone– Someone2023-08-21 16:34:40 +00:00Commented Aug 21, 2023 at 16:34
Add a comment
|
How to Edit
- Correct minor typos or mistakes
- Clarify meaning without changing it
- Add related resources or links
- Always respect the author’s intent
- Don’t use edits to reply to the author
How to Format
-
create code fences with backticks ` or tildes ~
```
like so
``` -
add language identifier to highlight code
```python
def function(foo):
print(foo)
``` - put returns between paragraphs
- for linebreak add 2 spaces at end
- _italic_ or **bold**
- indent code by 4 spaces
- backtick escapes
`like _so_` - quote by placing > at start of line
- to make links (use https whenever possible)
<https://example.com>[example](https://example.com)<a href="https://example.com">example</a>
How to Tag
A tag is a keyword or label that categorizes your question with other, similar questions. Choose one or more (up to 5) tags that will help answerers to find and interpret your question.
- complete the sentence: my question is about...
- use tags that describe things or concepts that are essential, not incidental to your question
- favor using existing popular tags
- read the descriptions that appear below the tag
If your question is primarily about a topic for which you can't find a tag:
- combine multiple words into single-words with hyphens (e.g. stack-overflow), up to a maximum of 35 characters
- creating new tags is a privilege; if you can't yet create a tag you need, then post this question without it, then ask the community to create it for you
lang-sql