DataCite PID Links Data File 2023


The DataCite PID Links Data File contains records in JSON format for relationships between PIDs within the DataCite PIDGraph, up to the end of 2023.

Records have a core triple of an object, a subject, and the relationship between the two, and also include a selection of metadata about the relationship, including the creation date and source of the assertion, when the relationship occurred, and the types of object involved.

The PID Links data file is split into 1679 individual files, each containing 100,000 records as JSON objects, one per line (a variety of JSON known as JSON Lines), to enable easy parallel processing and extraction of data. The files are then stored in a tarball and compressed with gzip. Further guidance and tutorials for working with the PID Links data file will be available soon. In the meantime, please contact if you have questions or need assistance.

Use of the DataCite PID Links Data File is subject to the DataCite Data File Use Policy.

The work to generate the PID Links data file was was made possible through support from the FAIRCORE4EOSC project. FAIRCORE4EOSC has received funding from the EU’s Horizon Europe research and innovation programme under Grant Agreement no. 101057264.

167,844,248 records
11GiB compressed
95GiB decompressed

Get access

As this is the first release of the DataCite PID Links data file, we are gathering information about file usage to inform future development work.

Please provide the following details to get access to the file.

A link to download the file will be sent to you immediately at the email address you provide.