S T A T E O F N E W Y O R K
________________________________________________________________________
6955
2025-2026 Regular Sessions
I N S E N A T E
March 27, 2025
___________
Introduced by Sen. GOUNARDES -- read twice and ordered printed, and when
printed to be committed to the Committee on Internet and Technology
AN ACT to amend the general business law, in relation to establishing
the artificial intelligence training data transparency act
THE PEOPLE OF THE STATE OF NEW YORK, REPRESENTED IN SENATE AND ASSEM-
BLY, DO ENACT AS FOLLOWS:
Section 1. The general business law is amended by adding a new article
44-B to read as follows:
ARTICLE 44-B
ARTIFICIAL INTELLIGENCE TRAINING DATA TRANSPARENCY ACT
SECTION 1420. SHORT TITLE.
1421. DEFINITIONS.
1422. DATA USED TO TRAIN GENERATIVE ARTIFICIAL INTELLIGENCE
MODELS OR SERVICES.
1423. EMPLOYEE DATA USED TO TRAIN GENERATIVE ARTIFICIAL INTELLI-
GENCE MODELS OR SERVICES.
§ 1420. SHORT TITLE. THIS ACT SHALL BE KNOWN AND MAY BE CITED AS THE
"ARTIFICIAL INTELLIGENCE TRAINING DATA TRANSPARENCY ACT".
§ 1421. DEFINITIONS. FOR THE PURPOSES OF THIS ARTICLE, THE FOLLOWING
TERMS SHALL HAVE THE FOLLOWING MEANINGS:
1. "ARTIFICIAL INTELLIGENCE" OR "ARTIFICIAL INTELLIGENCE TECHNOLOGY"
MEANS A MACHINE-BASED SYSTEM THAT CAN, FOR A GIVEN SET OF HUMAN-DEFINED
OBJECTIVES, MAKE PREDICTIONS, RECOMMENDATIONS, OR DECISIONS INFLUENCING
REAL OR VIRTUAL ENVIRONMENTS, AND THAT USES MACHINE- AND HUMAN-BASED
INPUTS TO PERCEIVE REAL AND VIRTUAL ENVIRONMENTS, ABSTRACT SUCH PERCEP-
TIONS INTO MODELS THROUGH ANALYSIS IN AN AUTOMATED MANNER, AND USE MODEL
INFERENCE TO FORMULATE OPTIONS FOR INFORMATION OR ACTION.
2. "DEVELOPER" MEANS A PERSON, PARTNERSHIP, STATE OR LOCAL GOVERNMENT
AGENCY, OR CORPORATION THAT DESIGNS, CODES, PRODUCES, OR SUBSTANTIALLY
EXPLANATION--Matter in ITALICS (underscored) is new; matter in brackets
[ ] is old law to be omitted.
LBD07975-02-5
S. 6955 2
MODIFIES AN ARTIFICIAL INTELLIGENCE MODEL OR SERVICE FOR USE BY MEMBERS
OF THE PUBLIC.
3. "GENERATIVE ARTIFICIAL INTELLIGENCE" MEANS A CLASS OF AI MODELS
THAT ARE SELF-SUPERVISED AND EMULATE THE STRUCTURE AND CHARACTERISTICS
OF INPUT DATA TO GENERATE DERIVED SYNTHETIC CONTENT, INCLUDING, BUT NOT
LIMITED TO, IMAGES, VIDEOS, AUDIO, TEXT, AND OTHER DIGITAL CONTENT.
4. "SUBSTANTIALLY MODIFIES" OR "SUBSTANTIAL MODIFICATION" MEANS A NEW
VERSION, NEW RELEASE, OR OTHER UPDATE TO A GENERATIVE ARTIFICIAL INTEL-
LIGENCE MODEL OR SERVICE THAT MATERIALLY CHANGES ITS FUNCTIONALITY OR
PERFORMANCE, INCLUDING THE RESULTS OF RETRAINING OR FINE TUNING.
5. "SYNTHETIC DATA GENERATION" MEANS A PROCESS IN WHICH SEED DATA IS
USED TO CREATE ARTIFICIAL DATA THAT HAVE SOME OF THE STATISTICAL CHARAC-
TERISTICS OF THE SEED DATA.
6. "TRAIN A GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE"
INCLUDES TESTING, VALIDATING, OR FINE TUNING BY THE DEVELOPER OF THE
ARTIFICIAL INTELLIGENCE MODEL OR SERVICE.
7. "AGGREGATE CONSUMER INFORMATION" MEANS INFORMATION THAT RELATES TO
A GROUP OF CONSUMERS, FROM WHICH INDIVIDUAL CONSUMER IDENTITIES HAVE
BEEN REMOVED, THAT IS NOT LINKED OR REASONABLY LINKABLE TO ANY CONSUMER
OR HOUSEHOLD, INCLUDING VIA A DEVICE. AGGREGATE CONSUMER INFORMATION
DOES NOT MEAN ONE OR MORE INDIVIDUAL CONSUMER RECORDS THAT HAVE BEEN
DE-IDENTIFIED.
8. "AI MODEL" MEANS AN INFORMATION SYSTEM OR COMPONENT OF AN INFORMA-
TION SYSTEM THAT IMPLEMENTS ARTIFICIAL INTELLIGENCE TECHNOLOGY AND USES
COMPUTATIONAL, STATISTICAL, OR MACHINE-LEARNING TECHNIQUES TO PRODUCE
OUTPUTS FROM A GIVEN SET OF INPUTS.
§ 1422. DATA USED TO TRAIN GENERATIVE ARTIFICIAL INTELLIGENCE MODELS
OR SERVICES. 1. ON OR BEFORE JANUARY FIRST, TWO THOUSAND TWENTY-SIX, AND
PRIOR TO EACH TIME THEREAFTER THAT A GENERATIVE ARTIFICIAL INTELLIGENCE
MODEL OR SERVICE, OR A SUBSTANTIAL MODIFICATION TO A GENERATIVE ARTIFI-
CIAL INTELLIGENCE MODEL OR SERVICE, RELEASED ON OR AFTER JANUARY FIRST,
TWO THOUSAND TWENTY-TWO, IS MADE PUBLICLY AVAILABLE TO NEW YORKERS FOR
USE, REGARDLESS OF WHETHER THE TERMS OF SUCH USE INCLUDE COMPENSATION,
THE DEVELOPER OF SUCH MODEL OR SERVICE SHALL POST ON THE DEVELOPER'S
WEBSITE DOCUMENTATION REGARDING THE DATA USED BY THE DEVELOPER TO TRAIN
THE GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE, INCLUDING A
HIGH-LEVEL SUMMARY OF THE DATASETS USED IN THE DEVELOPMENT OF THE GENER-
ATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE, INCLUDING, BUT NOT
LIMITED TO:
(A) THE SOURCES OR OWNERS OF THE DATASETS;
(B) A DESCRIPTION OF HOW THE DATASETS FURTHER THE INTENDED PURPOSE OF
THE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE;
(C) THE NUMBER OF DATA POINTS INCLUDED IN THE DATASETS, WHICH MAY BE
IN GENERAL RANGES, AND WITH ESTIMATED FIGURES FOR DYNAMIC DATASETS;
(D) A DESCRIPTION OF THE TYPES OF DATA POINTS WITHIN THE DATASETS. FOR
PURPOSES OF THIS PARAGRAPH, THE FOLLOWING DEFINITIONS APPLY:
(I) AS APPLIED TO DATASETS THAT INCLUDE LABELS, "TYPES OF DATA POINTS"
MEANS THE TYPES OF LABELS USED; AND
(II) AS APPLIED TO DATASETS WITHOUT LABELING, "TYPES OF DATA POINTS"
REFERS TO THE GENERAL CHARACTERISTICS;
(E) WHETHER THE DATASETS INCLUDE ANY DATA PROTECTED BY COPYRIGHT,
TRADEMARK, OR PATENT, OR WHETHER THE DATASETS ARE ENTIRELY IN THE PUBLIC
DOMAIN;
(F) WHETHER THE DATASETS WERE PURCHASED OR LICENSED BY THE DEVELOPER;
S. 6955 3
(G) WHETHER THE DATASETS INCLUDE PERSONAL INFORMATION OR PERSONAL
IDENTIFYING INFORMATION, AS DEFINED IN SECTION EIGHT HUNDRED NINETY-
NINE-AAA OF THIS CHAPTER;
(H) WHETHER THE DATASETS INCLUDE AGGREGATE CONSUMER INFORMATION;
(I) WHETHER THERE WAS ANY CLEANING, PROCESSING, OR OTHER MODIFICATION
TO THE DATASETS BY THE DEVELOPER, INCLUDING THE INTENDED PURPOSE OF
THOSE EFFORTS IN RELATION TO THE ARTIFICIAL INTELLIGENCE MODEL OR
SERVICE;
(J) THE TIME PERIOD DURING WHICH THE DATA IN THE DATASETS WERE
COLLECTED, INCLUDING A NOTICE IF THE DATA COLLECTION IS ONGOING;
(K) THE DATES THE DATASETS WERE FIRST USED DURING THE DEVELOPMENT OF
THE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE; AND
(L) WHETHER THE GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE
USED OR CONTINUOUSLY USES SYNTHETIC DATA GENERATION IN ITS DEVELOPMENT.
A DEVELOPER MAY INCLUDE A DESCRIPTION OF THE FUNCTIONAL NEED OR DESIRED
PURPOSE OF THE SYNTHETIC DATA IN RELATION TO THE INTENDED PURPOSE OF THE
MODEL OR SERVICE.
2. A DEVELOPER SHALL NOT BE REQUIRED TO POST DOCUMENTATION REGARDING
THE DATA USED TO TRAIN A GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR
SERVICE FOR ANY OF THE FOLLOWING:
(A) A GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE WHOSE SOLE
PURPOSE IS THE OPERATION OF AIRCRAFT IN THE NATIONAL AIRSPACE; OR
(B) A GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE DEVELOPED
FOR NATIONAL SECURITY, MILITARY, OR DEFENSE PURPOSES THAT IS MADE AVAIL-
ABLE ONLY TO A FEDERAL ENTITY.
§ 1423. EMPLOYEE DATA USED TO TRAIN GENERATIVE ARTIFICIAL INTELLIGENCE
MODELS OR SERVICES. 1. ANY PERSON, PARTNERSHIP, STATE OR LOCAL GOVERN-
MENT AGENCY, OR CORPORATION THAT DESIGNS, CODES, PRODUCES, OR SUBSTAN-
TIALLY MODIFIES A GENERATIVE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE
USING DATA OF WHICH A SUBSTANTIAL PART IS DERIVED FROM INDIVIDUALS
EMPLOYED OR CONTRACTED BY THE ENTITY, REGARDLESS IF WHETHER THE MODEL IS
MADE PUBLICLY AVAILABLE, SHALL ENSURE THAT THE FOLLOWING INFORMATION IS
DISCLOSED TO EACH EMPLOYEE WHOSE DATA IS USED TO TRAIN THE ARTIFICIAL
INTELLIGENCE MODEL:
(A) THE INTENDED PURPOSE OF THE ARTIFICIAL INTELLIGENCE MODEL OR
SERVICE;
(B) A DESCRIPTION OF HOW THE COLLECTED DATASETS FURTHER THE INTENDED
PURPOSE OF THE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE;
(C) A DESCRIPTION OF THE TYPES OF DATA POINTS WITHIN THE DATASETS;
(D) WHETHER THE DATASETS INCLUDE PERSONAL INFORMATION OR PERSONAL
IDENTIFYING INFORMATION, AS DEFINED IN SECTION EIGHT HUNDRED NINETY-
NINE-AAA OF THIS CHAPTER;
(E) THE DATES THE DATASETS WERE FIRST USED DURING THE DEVELOPMENT OF
THE ARTIFICIAL INTELLIGENCE MODEL OR SERVICE; AND
(F) THE TIME PERIOD DURING WHICH THE DATA IN THE DATASETS WERE
COLLECTED, INCLUDING A NOTICE IF THE DATA COLLECTION IS ONGOING.
2. AN ENTITY THAT USES EMPLOYEE OR CONTRACTOR DATA TO DESIGN, CODE,
PRODUCE, OR SUBSTANTIALLY MODIFY A GENERATIVE ARTIFICIAL INTELLIGENCE
MODEL OR SERVICE SHALL NOT BE REQUIRED TO DISCLOSE THE INFORMATION
REQUIRED BY THIS SECTION IF THE MODEL OR SERVICE:
(A) IS SOLELY INTENDED TO BE USED IN THE OPERATION OF AIRCRAFT IN THE
NATIONAL AIRSPACE; OR
(B) IS DEVELOPED FOR NATIONAL SECURITY, MILITARY, OR DEFENSE PURPOSES
AND ONLY MADE AVAILABLE TO A FEDERAL ENTITY.
§ 2. This act shall take effect immediately.