S. 8331                             2
 
   (f) Studies show that news content comprises a disproportionate amount
 of  generative  artificial  intelligence  training data. News content is
 especially valuable to artificial intelligence developers because it  is
 high-quality, professional writing created by human beings;
   (g) After training, generative artificial intelligence systems contin-
 ue  to  access news websites, podcasts, broadcasts and digital platforms
 in order gain access to fact-checked, accurate and up to date content to
 produce outputs;
   (h) The vast majority of generative artificial intelligence developers
 do not obtain permission or compensate news publishers or broadcast news
 operations  for  accessing  their  websites,  podcasts,  broadcasts  and
 digital  platforms  for  the  purposes  of building and operationalizing
 their AI tools and services, in violation of copyright law, those sites'
 and platforms' terms of service and  express  prohibitions  and  prefer-
 ences;
   (i)  Maximizing  the  potential of generative AI requires ensuring the
 sustainability of journalism and the news industry; and
   (j) News publishers, broadcast news operations and the public  deserve
 to know when generative artificial intelligence developers have accessed
 news websites and used their work.
   §  3.  Article  21-A of the general business law is renumbered article
 21-B and a new article 21-A is added to read as follows:
                               ARTICLE 21-A
             ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY
 SECTION 338.   DEFINITIONS.
         338-A. ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY.
         338-B. ENFORCEMENT.
         338-C. APPLICABILITY.
         338-D. SEVERABILITY.
   § 338. DEFINITIONS. THE FOLLOWING TERMS, WHENEVER USED OR REFERRED  TO
 IN THIS ARTICLE, SHALL HAVE THE FOLLOWING MEANINGS:
   1.  "ARTIFICIAL  INTELLIGENCE"  MEANS A MACHINE-BASED SYSTEM THAT CAN,
 FOR A GIVEN SET OF HUMAN-DEFINED OBJECTIVES, MAKE PREDICTIONS, RECOMMEN-
 DATIONS, OR DECISIONS INFLUENCING REAL OR VIRTUAL ENVIRONMENTS, AND THAT
 USES MACHINE AND HUMAN-BASED INPUTS TO PERCEIVE REAL AND  VIRTUAL  ENVI-
 RONMENTS,  ABSTRACT  SUCH PERCEPTIONS INTO MODELS THROUGH ANALYSIS IN AN
 AUTOMATED MANNER, AND USE  MODEL  INFERENCE  TO  FORMULATE  OPTIONS  FOR
 INFORMATION OR ACTION.
   2.  "ACCESS"  MEANS  TO  OBTAIN,  RETRIEVE, ACQUIRE, REPRODUCE, CRAWL,
 INDEX, OR REQUEST AND RECEIVE A TRANSMISSION OF CONTENT.
   3. "COVERED PUBLICATION" MEANS ANY PRINT, BROADCAST, BROADCAST NETWORK
 OR DIGITAL PUBLICATION OR SERVICE WHICH:
   A. PERFORMS A PUBLIC-INFORMATION FUNCTION COMPARABLE  TO  THAT  TRADI-
 TIONALLY  SERVED BY JOURNALISM ORGANIZATIONS, SUCH AS NEWSPAPERS, BROAD-
 CAST NEWS OPERATIONS, BROADCAST NETWORK NEWS OPERATIONS,  MAGAZINES  AND
 OTHER PERIODICAL PUBLICATIONS;
   B.  INVESTS  SUBSTANTIAL  EXPENDITURE  OF  LABOR,  SKILL, AND MONEY TO
 CREATE, EDIT, PRODUCE, AND  DISTRIBUTE  CONTENT  INCLUDING  BY  ENGAGING
 NATURAL  PERSONS TO CREATE, EDIT, PRODUCE, AND DISTRIBUTE ORIGINAL TEXT,
 AUDIO, PHOTO, ILLUSTRATIVE,  OR  VIDEO  CONTENT  CONCERNING  MATTERS  OR
 TOPICS  OF  INTEREST  OR USE TO MEMBERS OF THE PUBLIC THROUGH ACTIVITIES
 SUCH AS OBSERVATION, VIDEO RECORDING EVENTS, INTERVIEWS, RESEARCH, TEST-
 ING, AND ANALYSIS; AND
   C. PUBLISHES NEW CONTENT OR UPDATES ITS CONTENT ON AT LEAST A  MONTHLY
 BASIS AND HAS A PROCESS FOR ERROR CORRECTION AND CLARIFICATION.
 S. 8331                             3
 
   4.  "CRAWLER"  MEANS  SOFTWARE THAT ACCESSES CONTENT FROM A WEBSITE OR
 OTHER INTERNET SOURCE, SUCH  AS  AN  ONLINE  CRAWLER,  SPIDER,  FETCHER,
 CLIENT, BOT, USER AGENT OR EQUIVALENT TOOL.
   5.  "DEVELOPER"  MEANS  A  PERSON  THAT  DESIGNS,  CODES, PRODUCES, OR
 SUBSTANTIALLY MODIFIES AN ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE  FOR
 USE  BY  MEMBERS  OF  THE PUBLIC. THE TERM "DEVELOPER" SHALL NOT INCLUDE
 ARTIFICIAL INTELLIGENCE SYSTEMS USED, DEVELOPED OR OBTAINED BY  A  JOUR-
 NALISM PROVIDER FOR INTERNAL USE.
   6.  "GENERATIVE  ARTIFICIAL  INTELLIGENCE" MEANS A CLASS OF ARTIFICIAL
 INTELLIGENCE MODELS THAT EMULATE THE STRUCTURE  AND  CHARACTERISTICS  OF
 INPUT  DATA  TO  GENERATE  DERIVED SYNTHETIC CONTENT, INCLUDING, BUT NOT
 LIMITED TO, IMAGES, VIDEOS, AUDIO, TEXT, AND OTHER DIGITAL CONTENT.
   7. "JOURNALISM PROVIDER" MEANS ANY PERSON THAT:
   A. BROADCASTS OR PUBLISHES ONE OR MORE COVERED PUBLICATIONS; AND
   B. IS COVERED BY MEDIA LIABILITY INSURANCE.
   8. "PERSON" MEANS A NATURAL PERSON, CORPORATION, TRUST, ESTATE,  PART-
 NERSHIP,  INCORPORATED  OR UNINCORPORATED ASSOCIATION OR ANY OTHER LEGAL
 ENTITY.
   9. "ARTIFICIAL INTELLIGENCE UTILIZATION" MEANS TO USE DIGITAL  CONTENT
 AS  DATA TO DEVELOP THE CAPABILITIES OF A GENERATIVE ARTIFICIAL INTELLI-
 GENCE SYSTEM,  INCLUDING  THROUGH  SETTING  OR  CHANGING  ITS  LEARNABLE
 WEIGHTS  AND  OTHER PARAMETERS, AND INCLUDES, IN ADDITION TO THE INITIAL
 DATASET TRAINING, FURTHER TESTING, VALIDATING, GROUNDING, OR FINE TUNING
 BY THE DEVELOPER OF THE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE.
   § 338-A. ARTIFICIAL INTELLIGENCE SOURCE DATA TRANSPARENCY. 1. A. ON OR
 BEFORE JANUARY FIRST, TWO THOUSAND TWENTY-SEVEN  AND  BEFORE  EACH  TIME
 THEREAFTER  THAT A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE,
 OR A SUBSTANTIAL MODIFICATION TO A  GENERATIVE  ARTIFICIAL  INTELLIGENCE
 SYSTEM OR SERVICE RELEASED ON OR AFTER JANUARY FIRST, TWO THOUSAND TWEN-
 TY-TWO, IS MADE PUBLICLY AVAILABLE TO NEW YORKERS FOR USE, REGARDLESS OF
 WHETHER THE SYSTEM OR SERVICE IS MADE AVAILABLE FOR A FEE, THE DEVELOPER
 OF  THE SYSTEM OR SERVICE SHALL POST ON THE DEVELOPER'S INTERNET WEBSITE
 THE FOLLOWING INFORMATION REGARDING VIDEO, AUDIO, TEXT AND DATA  FROM  A
 COVERED PUBLICATION USED TO TRAIN THE GENERATIVE ARTIFICIAL INTELLIGENCE
 SYSTEM OR SERVICE:
   (I)  THE  UNIFORM  RESOURCE  LOCATORS  OR UNIFORM RESOURCE IDENTIFIERS
 ACCESSED BY CRAWLERS DEPLOYED BY THE DEVELOPER OR BY  THIRD  PARTIES  ON
 THEIR BEHALF OR FROM WHOM THEY HAVE OBTAINED VIDEO, AUDIO, TEXT OR DATA;
   (II)  A DETAILED DESCRIPTION OF THE VIDEO, AUDIO, TEXT AND DATA FROM A
 COVERED  PUBLICATION  USED  FOR  ARTIFICIAL  INTELLIGENCE   UTILIZATION,
 INCLUDING THE TYPE AND PROVENANCE OF THE VIDEO, AUDIO, TEXT AND DATA AND
 THE  MEANS  BY  WHICH IT WAS OBTAINED, SUFFICIENT TO IDENTIFY INDIVIDUAL
 WORKS;
   (III) WHETHER ANY SOURCE IDENTIFIERS, TERMS, OR COPYRIGHT NOTICES WERE
 REMOVED FROM THE VIDEO, AUDIO, TEXT OR DATA; AND
   (IV) THE TIMEFRAME OF DATA COLLECTION.
   B. THE INFORMATION REQUIRED TO BE POSTED  ON  A  DEVELOPER'S  INTERNET
 WEBSITE  PURSUANT  TO  PARAGRAPH  A  OF  THIS  SUBDIVISION  SHALL NOT BE
 REQUIRED WHERE THERE IS AN EXPRESS  WRITTEN  AGREEMENT  AUTHORIZING  THE
 DEVELOPER  TO  ACCESS  THE JOURNALISM PROVIDER'S CONTENT AND THE PARTIES
 AGREE NOT TO POST INFORMATION  RELATING  TO  THE  JOURNALISM  PROVIDER'S
 CONTENT ON THE DEVELOPER'S WEBSITE.
   2.  A.  ON  OR  BEFORE  JANUARY  FIRST, TWO THOUSAND TWENTY-SEVEN, THE
 DEVELOPER OF A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM OR SERVICE  WHO
 DEPLOYS  A  CRAWLER,  EITHER  DIRECTLY  OR  THROUGH  A  THIRD  PARTY, IN
 CONNECTION WITH  SUCH  SYSTEM  OR  SERVICE  SHALL  DISCLOSE  INFORMATION
 S. 8331                             4
 REGARDING  THE  IDENTITY  OF  CRAWLERS USED BY THE DEVELOPER OR BY THIRD
 PARTIES ON THE DEVELOPER'S BEHALF IN A MANNER CLEARLY  ACCESSIBLE  BY  A
 WEBSITE OPERATOR, INCLUDING BUT NOT LIMITED TO:
   (I)  THE  NAME  OF THE CRAWLER INCLUDING THE CRAWLER'S IP ADDRESS, AND
 SPECIFIC IDENTIFIER ACTUALLY USED BY THE  CRAWLER  WHEN  CONDUCTING  THE
 CRAWLING ACTIVITY (SUCH AS INCLUDING THE IDENTIFIERS AS PART OF THE USER
 AGENT OR OTHER PART OF THE REQUEST HEADERS);
   (II) THE LEGAL ENTITY RESPONSIBLE FOR THE CRAWLER;
   (III) THE SPECIFIC PURPOSES FOR WHICH EACH CRAWLER IS USED;
   (IV) THE LEGAL ENTITIES TO WHICH OPERATORS PROVIDE DATA SCRAPED BY THE
 CRAWLERS THEY OPERATE; AND
   (V)  A  SINGLE POINT OF CONTACT TO ENABLE THIRD PARTIES WHOSE WEBSITES
 ARE ACCESSED BY SUCH CRAWLERS TO COMMUNICATE WITH THE DEVELOPER  AND  TO
 LODGE COMPLAINTS.
   B.  THE INFORMATION DISCLOSED PURSUANT TO PARAGRAPH A OF THIS SUBDIVI-
 SION SHALL BE AVAILABLE ON AN EASILY ACCESSIBLE PLATFORM AND UPDATED  AT
 THE SAME TIME AS ANY CHANGE IS MADE TO SUCH INFORMATION.
   C.  THE  EXCLUSION  OF A CRAWLER BY A WEBSITE OPERATOR SHALL NOT NEGA-
 TIVELY IMPACT THE FINDABILITY OF THE WEBSITE  OPERATOR'S  CONTENT  IN  A
 SEARCH ENGINE.
   § 338-B. ENFORCEMENT. 1. A. A JOURNALISM PROVIDER, OR A PERSON AUTHOR-
 IZED  TO ACT ON A JOURNALISM PROVIDER'S BEHALF, MAY REQUEST THE CLERK OF
 THE SUPREME COURT, OR A JUDGE WHERE  THERE  IS  NO  CLERK,  TO  ISSUE  A
 SUBPOENA  TO  A DEVELOPER OF A GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM
 THAT IS MADE AVAILABLE TO NEW YORKERS FOR USE, REGARDLESS OF WHETHER THE
 SYSTEM OR SERVICE IS MADE AVAILABLE FOR A FEE, FOR DISCLOSURE OF  COPIES
 OF,  OR RECORDS SUFFICIENT TO IDENTIFY WITH CERTAINTY, THE TEXT AND DATA
 USED TO TRAIN THE GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM  OR  SERVICE
 INSOFAR  AS  SUCH  TEXT  AND  DATA PERTAINS TO THE JOURNALISM PROVIDER'S
 INTERNET WEBSITE,  BROADCASTS,  PODCASTS  OR  OTHER  DIGITAL  PLATFORMS,
 INCLUDING BUT NOT LIMITED TO:
   (I)  THE  UNIFORM  RESOURCE  LOCATORS ACCESSED BY CRAWLERS DEPLOYED BY
 DEVELOPERS OR BY THIRD PARTIES ON THEIR BEHALF OR FROM  WHOM  THEY  HAVE
 OBTAINED  TEXT, VIDEO, AUDIO OR DATA, AND DATES AND TIMES OF COLLECTION;
 AND
   (II) THE TEXT AND DATA USED FOR ARTIFICIAL  INTELLIGENCE  UTILIZATION,
 INCLUDING  THE TYPE AND PROVENANCE OF THE TEXT AND DATA AND THE MEANS BY
 WHICH SUCH TEXT AND DATA WAS OBTAINED AND WHEN.
   B. A SUBPOENA ISSUED PURSUANT TO PARAGRAPH A OF THIS  SUBDIVISION  MAY
 REQUIRE  DISCLOSURE  OF THE INFORMATION REQUIRED PURSUANT TO PARAGRAPH A
 OF THIS SUBDIVISION IN THE NATIVE FORM IN  WHICH  SUCH  INFORMATION  WAS
 COPIED  AND  STORED  (INCLUDING ALL ACCOMPANYING KEYS, VALUES, TAGS, AND
 THE LIKE, AND ANY OTHER AVAILABLE METADATA), SUBJECT TO ENTRY OF A SUIT-
 ABLE PROTECTIVE ORDER IN THE CASE THAT SUCH  INFORMATION  CONSTITUTES  A
 TRADE SECRET OF THE GENERATIVE ARTIFICIAL INTELLIGENCE SYSTEM DEVELOPER.
   C. THE DEVELOPER SHALL PROVIDE THE SUBPOENAED INFORMATION WITHIN THIR-
 TY  DAYS  OF  SERVICE  OF THE SUBPOENA OR, IN THE CASE OF TRADE SECRETS,
 ENTRY OF A SUITABLE PROTECTIVE ORDER. SUCH SUBPOENA SHALL BE SUBJECT  TO
 THE  PROVISIONS  OF  ARTICLE  TWENTY-THREE OF THE CIVIL PRACTICE LAW AND
 RULES.  THE COURT MAY IMPOSE A PENALTY FOR FAILURE TO  RESPOND  TO  SUCH
 INFORMATION  SUBPOENAS PURSUANT TO SECTION TWENTY-THREE HUNDRED EIGHT OF
 THE CIVIL PRACTICE LAW AND RULES.
   2. A. A JOURNALISM PROVIDER MAY BRING AN ACTION IN THE  SUPREME  COURT
 FOR  AN  INJUNCTION  TO  COMPEL A DEVELOPER TO COMPLY WITH SECTION THREE
 HUNDRED THIRTY-EIGHT-A OF THIS ARTICLE.
 S. 8331                             5
 
   B. IF A DEVELOPER FAILS TO COMPLY WITH A SUBPOENA ISSUED  PURSUANT  TO
 SUBDIVISION ONE OF THIS SECTION, THE JOURNALISM PROVIDER REQUESTING SUCH
 SUBPOENA  MAY  MOVE  IN  THE  SUPREME COURT TO COMPEL COMPLIANCE. IF THE
 COURT FINDS THAT THE DEVELOPER DID NOT COMPLY  WITH  THE  SUBPOENA,  THE
 COURT  SHALL  ORDER  COMPLIANCE  AND MAY IMPOSE STATUTORY DAMAGES TO THE
 JOURNALISM PROVIDER REQUESTING SUCH  SUBPOENA  OF  UP  TO  TEN  THOUSAND
 DOLLARS.
   C. IF THE DEVELOPER FAILS TO COMPLY WITH A COURT ORDER ISSUED PURSUANT
 TO  PARAGRAPH  B  OF  THIS SUBDIVISION, THEN THE JOURNALISM PROVIDER MAY
 REQUEST THAT THE ATTORNEY GENERAL BRING AN ACTION  ON  THEIR  BEHALF  TO
 ENSURE  COMPLIANCE  WITH  THE  COURT  ORDER  AND  ANY  STATUTORY DAMAGES
 ASSESSED.
   § 338-C. APPLICABILITY. THE PROVISIONS OF THIS ARTICLE  SHALL  NOT  BE
 CONSTRUED TO MODIFY, IMPAIR, EXPAND, OR IN ANY WAY ALTER RIGHTS PERTAIN-
 ING  TO  TITLE 17 OF THE UNITED STATES CODE OR THE LANHAM ACT (15 U.S.C.
 1051 ET SEQ.).
   § 338-D. SEVERABILITY. IF ANY PROVISION OF THIS ARTICLE OR THE  APPLI-
 CATION  THEREOF  TO  ANY  PERSON OR CIRCUMSTANCES IS HELD TO BE INVALID,
 SUCH INVALIDITY SHALL NOT AFFECT OTHER  PROVISIONS  OR  APPLICATIONS  OF
 THIS  ARTICLE WHICH CAN BE GIVEN EFFECT WITHOUT THE INVALID PROVISION OR
 APPLICATION, AND TO THIS END THE PROVISIONS OF THIS ARTICLE ARE  SEVERA-
 BLE.
   § 4. This act shall take effect immediately.