S. 8331 2
(f) Studies show that news content comprises a disproportionate amount
of generative artificial intelligence training data. News content is
especially valuable to artificial intelligence developers because it is
high-quality, professional writing created by human beings;
(g) After training, generative artificial intelligence systems contin-
ue to access news websites, podcasts, broadcasts and digital platforms
in order gain access to fact-checked, accurate and up to date content to
produce outputs;
(h) The vast majority of generative artificial intelligence developers
do not obtain permission or compensate news publishers or broadcast news
operations for accessing their websites, podcasts, broadcasts and
digital platforms for the purposes of building and operationalizing
their AI tools and services, in violation of copyright law, those sites'
and platforms' terms of service and express prohibitions and prefer-
ences;
(i) Maximizing the potential of generative AI requires ensuring the
sustainability of journalism and the news industry; and
(j) News publishers, broadcast news operations and the public deserve
to know when generative artificial intelligence developers have accessed
news websites and used their work.
§ 3. Article 21-A of the general business law is renumbered article
21-B and a new article 21-A is added to read as follows:
ARTICLE 21-A
ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY
SECTION 338. DEFINITIONS.
338-A. ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY.
338-B. ENFORCEMENT.
338-C. APPLICABILITY.
338-D. SEVERABILITY.
§ 338. DEFINITIONS. THE FOLLOWING TERMS, WHENEVER USED OR REFERRED TO
IN THIS ARTICLE, SHALL HAVE THE FOLLOWING MEANINGS:
1. "ARTIFICIAL INTELLIGENCE" MEANS A MACHINE-BASED SYSTEM THAT CAN,
FOR A GIVEN SET OF HUMAN-DEFINED OBJECTIVES, MAKE PREDICTIONS, RECOMMEN-
DATIONS, OR DECISIONS INFLUENCING REAL OR VIRTUAL ENVIRONMENTS, AND THAT
USES MACHINE AND HUMAN-BASED INPUTS TO PERCEIVE REAL AND VIRTUAL ENVI-
RONMENTS, ABSTRACT SUCH PERCEPTIONS INTO MODELS THROUGH ANALYSIS IN AN
AUTOMATED MANNER, AND USE MODEL INFERENCE TO FORMULATE OPTIONS FOR
INFORMATION OR ACTION.
2. "ACCESS" MEANS TO OBTAIN, RETRIEVE, ACQUIRE, REPRODUCE, CRAWL,
INDEX, OR REQUEST AND RECEIVE A TRANSMISSION OF CONTENT.
3. "COVERED PUBLICATION" MEANS ANY PRINT, BROADCAST, BROADCAST NETWORK
OR DIGITAL PUBLICATION OR SERVICE WHICH:
A. PERFORMS A PUBLIC-INFORMATION FUNCTION COMPARABLE TO THAT TRADI-
TIONALLY SERVED BY JOURNALISM ORGANIZATIONS, SUCH AS NEWSPAPERS, BROAD-
CAST NEWS OPERATIONS, BROADCAST NETWORK NEWS OPERATIONS, MAGAZINES AND
OTHER PERIODICAL PUBLICATIONS;
B. INVESTS SUBSTANTIAL EXPENDITURE OF LABOR, SKILL, AND MONEY TO
CREATE, EDIT, PRODUCE, AND DISTRIBUTE CONTENT INCLUDING BY ENGAGING
NATURAL PERSONS TO CREATE, EDIT, PRODUCE, AND DISTRIBUTE ORIGINAL TEXT,
AUDIO, PHOTO, ILLUSTRATIVE, OR VIDEO CONTENT CONCERNING MATTERS OR
TOPICS OF INTEREST OR USE TO MEMBERS OF THE PUBLIC THROUGH ACTIVITIES
SUCH AS OBSERVATION, VIDEO RECORDING EVENTS, INTERVIEWS, RESEARCH, TEST-
ING, AND ANALYSIS; AND
C. PUBLISHES NEW CONTENT OR UPDATES ITS CONTENT ON AT LEAST A MONTHLY
BASIS AND HAS A PROCESS FOR ERROR CORRECTION AND CLARIFICATION.
S. 8331 3
4. "CRAWLER" MEANS SOFTWARE THAT ACCESSES CONTENT FROM A WEBSITE OR
OTHER INTERNET SOURCE, SUCH AS AN ONLINE CRAWLER, SPIDER, FETCHER,
CLIENT, BOT, USER AGENT OR EQUIVALENT TOOL.
5. "DEVELOPER" MEANS A PERSON THAT DESIGNS, CODES, PRODUCES, OR
SUBSTANTIALLY MODIFIES AN ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE FOR
USE BY MEMBERS OF THE PUBLIC. THE TERM "DEVELOPER" SHALL NOT INCLUDE
ARTIFICIAL INTELLIGENCE SYSTEMS USED, DEVELOPED OR OBTAINED BY A JOUR-
NALISM PROVIDER FOR INTERNAL USE.
6. "GENERATIVE ARTIFICIAL INTELLIGENCE" MEANS A CLASS OF ARTIFICIAL
INTELLIGENCE MODELS THAT EMULATE THE STRUCTURE AND CHARACTERISTICS OF
INPUT DATA TO GENERATE DERIVED SYNTHETIC CONTENT, INCLUDING, BUT NOT
LIMITED TO, IMAGES, VIDEOS, AUDIO, TEXT, AND OTHER DIGITAL CONTENT.
7. "JOURNALISM PROVIDER" MEANS ANY PERSON THAT:
A. BROADCASTS OR PUBLISHES ONE OR MORE COVERED PUBLICATIONS; AND
B. IS COVERED BY MEDIA LIABILITY INSURANCE.
8. "PERSON" MEANS A NATURAL PERSON, CORPORATION, TRUST, ESTATE, PART-
NERSHIP, INCORPORATED OR UNINCORPORATED ASSOCIATION OR ANY OTHER LEGAL
ENTITY.
9. "ARTIFICIAL INTELLIGENCE UTILIZATION" MEANS TO USE DIGITAL CONTENT
AS DATA TO DEVELOP THE CAPABILITIES OF A GENERATIVE ARTIFICIAL INTELLI-
GENCE SYSTEM, INCLUDING THROUGH SETTING OR CHANGING ITS LEARNABLE
WEIGHTS AND OTHER PARAMETERS, AND INCLUDES, IN ADDITION TO THE INITIAL
DATASET TRAINING, FURTHER TESTING, VALIDATING, GROUNDING, OR FINE TUNING
BY THE DEVELOPER OF THE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE.
§ 338-A. ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY. 1. A. ON OR
BEFORE JANUARY FIRST, TWO THOUSAND TWENTY-SEVEN AND BEFORE EACH TIME
THEREAFTER THAT A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE,
OR A SUBSTANTIAL MODIFICATION TO A GENERATIVE ARTIFICIAL INTELLIGENCE
SYSTEM OR SERVICE RELEASED ON OR AFTER JANUARY FIRST, TWO THOUSAND TWEN-
TY-TWO, IS MADE PUBLICLY AVAILABLE TO NEW YORKERS FOR USE, REGARDLESS OF
WHETHER THE SYSTEM OR SERVICE IS MADE AVAILABLE FOR A FEE, THE DEVELOPER
OF THE SYSTEM OR SERVICE SHALL POST ON THE DEVELOPER'S INTERNET WEBSITE
THE FOLLOWING INFORMATION REGARDING VIDEO, AUDIO, TEXT AND DATA FROM A
COVERED PUBLICATION USED TO TRAIN THE GENERATIVE ARTIFICIAL INTELLIGENCE
SYSTEM OR SERVICE:
(I) THE UNIFORM RESOURCE LOCATORS OR UNIFORM RESOURCE IDENTIFIERS
ACCESSED BY CRAWLERS DEPLOYED BY THE DEVELOPER OR BY THIRD PARTIES ON
THEIR BEHALF OR FROM WHOM THEY HAVE OBTAINED VIDEO, AUDIO, TEXT OR DATA;
(II) A DETAILED DESCRIPTION OF THE VIDEO, AUDIO, TEXT AND DATA FROM A
COVERED PUBLICATION USED FOR ARTIFICIAL INTELLIGENCE UTILIZATION,
INCLUDING THE TYPE AND PROVENANCE OF THE VIDEO, AUDIO, TEXT AND DATA AND
THE MEANS BY WHICH IT WAS OBTAINED, SUFFICIENT TO IDENTIFY INDIVIDUAL
WORKS;
(III) WHETHER ANY SOURCE IDENTIFIERS, TERMS, OR COPYRIGHT NOTICES WERE
REMOVED FROM THE VIDEO, AUDIO, TEXT OR DATA; AND
(IV) THE TIMEFRAME OF DATA COLLECTION.
B. THE INFORMATION REQUIRED TO BE POSTED ON A DEVELOPER'S INTERNET
WEBSITE PURSUANT TO PARAGRAPH A OF THIS SUBDIVISION SHALL NOT BE
REQUIRED WHERE THERE IS AN EXPRESS WRITTEN AGREEMENT AUTHORIZING THE
DEVELOPER TO ACCESS THE JOURNALISM PROVIDER'S CONTENT AND THE PARTIES
AGREE NOT TO POST INFORMATION RELATING TO THE JOURNALISM PROVIDER'S
CONTENT ON THE DEVELOPER'S WEBSITE.
2. A. ON OR BEFORE JANUARY FIRST, TWO THOUSAND TWENTY-SEVEN, THE
DEVELOPER OF A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE WHO
DEPLOYS A CRAWLER, EITHER DIRECTLY OR THROUGH A THIRD PARTY, IN
CONNECTION WITH SUCH SYSTEM OR SERVICE SHALL DISCLOSE INFORMATION
S. 8331 4
REGARDING THE IDENTITY OF CRAWLERS USED BY THE DEVELOPER OR BY THIRD
PARTIES ON THE DEVELOPER'S BEHALF IN A MANNER CLEARLY ACCESSIBLE BY A
WEBSITE OPERATOR, INCLUDING BUT NOT LIMITED TO:
(I) THE NAME OF THE CRAWLER INCLUDING THE CRAWLER'S IP ADDRESS, AND
SPECIFIC IDENTIFIER ACTUALLY USED BY THE CRAWLER WHEN CONDUCTING THE
CRAWLING ACTIVITY (SUCH AS INCLUDING THE IDENTIFIERS AS PART OF THE USER
AGENT OR OTHER PART OF THE REQUEST HEADERS);
(II) THE LEGAL ENTITY RESPONSIBLE FOR THE CRAWLER;
(III) THE SPECIFIC PURPOSES FOR WHICH EACH CRAWLER IS USED;
(IV) THE LEGAL ENTITIES TO WHICH OPERATORS PROVIDE DATA SCRAPED BY THE
CRAWLERS THEY OPERATE; AND
(V) A SINGLE POINT OF CONTACT TO ENABLE THIRD PARTIES WHOSE WEBSITES
ARE ACCESSED BY SUCH CRAWLERS TO COMMUNICATE WITH THE DEVELOPER AND TO
LODGE COMPLAINTS.
B. THE INFORMATION DISCLOSED PURSUANT TO PARAGRAPH A OF THIS SUBDIVI-
SION SHALL BE AVAILABLE ON AN EASILY ACCESSIBLE PLATFORM AND UPDATED AT
THE SAME TIME AS ANY CHANGE IS MADE TO SUCH INFORMATION.
C. THE EXCLUSION OF A CRAWLER BY A WEBSITE OPERATOR SHALL NOT NEGA-
TIVELY IMPACT THE FINDABILITY OF THE WEBSITE OPERATOR'S CONTENT IN A
SEARCH ENGINE.
§ 338-B. ENFORCEMENT. 1. A. A JOURNALISM PROVIDER, OR A PERSON AUTHOR-
IZED TO ACT ON A JOURNALISM PROVIDER'S BEHALF, MAY REQUEST THE CLERK OF
THE SUPREME COURT, OR A JUDGE WHERE THERE IS NO CLERK, TO ISSUE A
SUBPOENA TO A DEVELOPER OF A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM
THAT IS MADE AVAILABLE TO NEW YORKERS FOR USE, REGARDLESS OF WHETHER THE
SYSTEM OR SERVICE IS MADE AVAILABLE FOR A FEE, FOR DISCLOSURE OF COPIES
OF, OR RECORDS SUFFICIENT TO IDENTIFY WITH CERTAINTY, THE TEXT AND DATA
USED TO TRAIN THE GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE
INSOFAR AS SUCH TEXT AND DATA PERTAINS TO THE JOURNALISM PROVIDER'S
INTERNET WEBSITE, BROADCASTS, PODCASTS OR OTHER DIGITAL PLATFORMS,
INCLUDING BUT NOT LIMITED TO:
(I) THE UNIFORM RESOURCE LOCATORS ACCESSED BY CRAWLERS DEPLOYED BY
DEVELOPERS OR BY THIRD PARTIES ON THEIR BEHALF OR FROM WHOM THEY HAVE
OBTAINED TEXT, VIDEO, AUDIO OR DATA, AND DATES AND TIMES OF COLLECTION;
AND
(II) THE TEXT AND DATA USED FOR ARTIFICIAL INTELLIGENCE UTILIZATION,
INCLUDING THE TYPE AND PROVENANCE OF THE TEXT AND DATA AND THE MEANS BY
WHICH SUCH TEXT AND DATA WAS OBTAINED AND WHEN.
B. A SUBPOENA ISSUED PURSUANT TO PARAGRAPH A OF THIS SUBDIVISION MAY
REQUIRE DISCLOSURE OF THE INFORMATION REQUIRED PURSUANT TO PARAGRAPH A
OF THIS SUBDIVISION IN THE NATIVE FORM IN WHICH SUCH INFORMATION WAS
COPIED AND STORED (INCLUDING ALL ACCOMPANYING KEYS, VALUES, TAGS, AND
THE LIKE, AND ANY OTHER AVAILABLE METADATA), SUBJECT TO ENTRY OF A SUIT-
ABLE PROTECTIVE ORDER IN THE CASE THAT SUCH INFORMATION CONSTITUTES A
TRADE SECRET OF THE GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM DEVELOPER.
C. THE DEVELOPER SHALL PROVIDE THE SUBPOENAED INFORMATION WITHIN THIR-
TY DAYS OF SERVICE OF THE SUBPOENA OR, IN THE CASE OF TRADE SECRETS,
ENTRY OF A SUITABLE PROTECTIVE ORDER. SUCH SUBPOENA SHALL BE SUBJECT TO
THE PROVISIONS OF ARTICLE TWENTY-THREE OF THE CIVIL PRACTICE LAW AND
RULES. THE COURT MAY IMPOSE A PENALTY FOR FAILURE TO RESPOND TO SUCH
INFORMATION SUBPOENAS PURSUANT TO SECTION TWENTY-THREE HUNDRED EIGHT OF
THE CIVIL PRACTICE LAW AND RULES.
2. A. A JOURNALISM PROVIDER MAY BRING AN ACTION IN THE SUPREME COURT
FOR AN INJUNCTION TO COMPEL A DEVELOPER TO COMPLY WITH SECTION THREE
HUNDRED THIRTY-EIGHT-A OF THIS ARTICLE.
S. 8331 5
B. IF A DEVELOPER FAILS TO COMPLY WITH A SUBPOENA ISSUED PURSUANT TO
SUBDIVISION ONE OF THIS SECTION, THE JOURNALISM PROVIDER REQUESTING SUCH
SUBPOENA MAY MOVE IN THE SUPREME COURT TO COMPEL COMPLIANCE. IF THE
COURT FINDS THAT THE DEVELOPER DID NOT COMPLY WITH THE SUBPOENA, THE
COURT SHALL ORDER COMPLIANCE AND MAY IMPOSE STATUTORY DAMAGES TO THE
JOURNALISM PROVIDER REQUESTING SUCH SUBPOENA OF UP TO TEN THOUSAND
DOLLARS.
C. IF THE DEVELOPER FAILS TO COMPLY WITH A COURT ORDER ISSUED PURSUANT
TO PARAGRAPH B OF THIS SUBDIVISION, THEN THE JOURNALISM PROVIDER MAY
REQUEST THAT THE ATTORNEY GENERAL BRING AN ACTION ON THEIR BEHALF TO
ENSURE COMPLIANCE WITH THE COURT ORDER AND ANY STATUTORY DAMAGES
ASSESSED.
§ 338-C. APPLICABILITY. THE PROVISIONS OF THIS ARTICLE SHALL NOT BE
CONSTRUED TO MODIFY, IMPAIR, EXPAND, OR IN ANY WAY ALTER RIGHTS PERTAIN-
ING TO TITLE 17 OF THE UNITED STATES CODE OR THE LANHAM ACT (15 U.S.C.
1051 ET SEQ.).
§ 338-D. SEVERABILITY. IF ANY PROVISION OF THIS ARTICLE OR THE APPLI-
CATION THEREOF TO ANY PERSON OR CIRCUMSTANCES IS HELD TO BE INVALID,
SUCH INVALIDITY SHALL NOT AFFECT OTHER PROVISIONS OR APPLICATIONS OF
THIS ARTICLE WHICH CAN BE GIVEN EFFECT WITHOUT THE INVALID PROVISION OR
APPLICATION, AND TO THIS END THE PROVISIONS OF THIS ARTICLE ARE SEVERA-
BLE.
§ 4. This act shall take effect immediately.