PPAPs Database Structure

The database is structured as follows:

allData = [genericStructure1, genericStructure2, ...]

genericStructure = [description, compound1, compound2, ...]

description = ['filename', 'characteristic1', 'characteristic2', ...]

compound = [R_X_Groups, name, 'source', 'rotation', refs]

R_X_Groups = empty string (if only one compound for this generic structure)
or 'name of R' (if only one R group in this generic structure)
or ['name of R']
or ['name of R1', 'name of R2', ...]
array perhaps appended with 'X = name of X' (if only one X group in this generic structure)
or 'X<sup>1</sup> = name of X1', 'X<sup>2</sup> = name of X2', ...
array perhaps appended with '(diastereomer)' or '(enantiomer)' or '(prob. enantiomer)'

name = 'name'
or ['name', 'footnote letter(s)']
or ['name', 'footnote letter(s)', 'additional text']

refs = 'author'
or ['author year', 'URL']
or [['author year 1', 'URL1'], ['author year 2', 'URL2'], ...]

The data consist of an array of generic structures. Each individual generic structure is itself represented by an array of at least two members: the first member is a description of the generic structure, and subsequent members describe individual compounds that match that generic structure.

The description of the generic structure is itself an array. Its first member is the name of the .png file that contains an image representing that generic structure, and subsequent members describe structural characteristics of that generic structure. Every generic structure is described by at least one of the first five values in the list below. Structural characteristics recognized in the database:

ValueDescription
bicyclononanesbicyclo[3.3.1]nonanes and their further cyclized and seco derivatives
bicyclooctanesbicyclo[3.2.1]octanes
fourPlusTwobicyclo[2.2.2]octanes produced by a formal [4 + 2] cycloaddition
otherBridgedBicyclesbridged bicyclic compounds not included above
spiroindanes
spiropentalanes
otherSpiro
exocompounds with an exo substituent at C7
endocompounds with an endo substituent at C7
noC7compounds with no substituent at C7
noC3compounds with no substituent at C3
OC3compounds with an acyloxy substituent at C3
Atype A PPAPs (with an acyl group at C1)
Btype B PPAPs (with an acyl group at C3)
uncagedPPAPs that have not undergone further cyclization to adamantanes or homoadamantanes
cagedPPAPs that have undergone further cyclization to adamantanes or homoadamantanes
secoPPAPs with skeletal C–C bonds cleaved

As was already mentioned, subsequent members of each generic structure array describe individual compounds that match that generic structure. Each one of these descriptions is itself an array consisting of five members.

  1. The first member of the array defines the R and X groups of the individual structure as depicted in the image of the generic structure. If the generic structure describes only one individual structure, then the image will contain no R or X groups, and this value of the array will be an empty string. If there is more than one R or X group, they will be written as an array. The name of an X group is preceded by 'X = ' or 'Xn = ', but the name of an R group stands solo. If two structures have the same R group definitions, the second's may also include a parenthetical '(enantiomer)' or '(diastereomer)'. If an R group is lavandulyl, it is hyperlinked to the corresponding Wikipedia entry. Other groups with unconventional names, such as ω-isogeranyl, are given a reference to a footnote that defines the group.
  2. The second member of the array is either just the name(s) of the compound, or it is an array containing the name of the compound, a reference to a footnote, and perhaps additional text to include (for example, 'one of two by that name'). If two compounds have the same name, the names are differentiated by inserting an extra space character into the name of one of the compounds, and a hyperlink is inserted to link each to the other. If a compound has more than one name, the names are separated by ', a.k.a. '. The program that converts the database to HTML format adds a hyperlink from the name(s) to its MOL structure. The MOL structures are stored in a different file, where they are identified by the name(s) of the compound exactly as it (they) appear here.
  3. The third member of the array is the species source or sources of the compound.
  4. The fourth member of the array is the specific rotation of the compound, including the concentration and the solvent if it is other than CHCl3.
  5. The fifth member of the array is a reference or references. A reference can be just a string, a two-member array, or an array of arrays.
This is what the database looks like:
var data = [
		[ 	['hyperfirin', 'A', 'noC7', 'uncaged', 'bicyclononanes'],
			[	'',
				'hyperfirin',
				'<i>H. perforatum</i>',
				'NR',
				['Tatsis 2007', 'doi.org/10.1016/j.phytochem.2006.11.026']
			]
		],
		[	['sampsonioneO', 'endo', 'A', 'uncaged', 'bicyclononanes'],
			[	'prenyl',
				'sampsonione O',
				'<i>H. sampsonii</i>',
				'+87.9 (0.073)',
				['Xiao 2007', 'doi.org/10.1021/np0704147']
			],
			[	'geranyl',
				'otogirinin D',
				'<i>H. erectum</i> Thunb.',
				'+160.0 (0.03, m)',
				[	['Ishida 2010', 'doi.org/10.1248/cpb.58.336'], 
					['Li 2015a', 'doi.org/10.1039/C4RA11675E']
				]
			]
		],
		[ 	['hyperforin', 'exo', 'A', 'uncaged', 'bicyclononanes'],
			[	[	'Ph',
					'prenyl',
					'prenyl',
					'prenyl',
					'H'
				],
				['nemorosone', 'd'],
				'Cuban propolis, <i>C. rosea, C. grandiflora, C. insignis, C. nemorosa</i>',
				'+113 (0.1); OMe:,150 (m, 0.8) and,49 (1.4)',
				[	['de Oliveira 1996', 'doi.org/10.1016/0040-4039(96)00656-9'], 
					['de Oliveira 1999', 'doi.org/10.1016/S0031-9422(98)00476-2'], 
					['Lokvam 2000', 'doi.org/10.1016/S0031-9422(00)00193-X'], 
					['Cuesta-Rubio 2001a', 'doi.org/10.1016/S0031-9422(00)00510-0'],
					['Sparling 2015', '10.1021/acs.orglett.5b01121']
				]
			],
			[	[	'Ph',
					'prenyl',
					'<a href="https://en.wikipedia.org/wiki/Lavandulyl_acetate">lavandulyl</a>',
					'prenyl',
					'H'
				],
				['chamone I', 'e'],
				'<i>C. grandiflora</i>',
				'NR',
				['Lokvam 2000', 'doi.org/10.1016/S0031-9422(00)00193-X']
			],
			[	[	'3,4-dihydroxyphenyl',
					'prenyl',
					'prenyl',
					'(<i>S</i>)-isolavandulyl<sup>[<i><a href="#footnotes">g</a></i>]</sup>',
					'H'
				],
				'garcinialiptone D',
				'<i>G. subelliptica</i>',
				'&minus;79.1 (7.83, m)',
				['L.-J. Zhang 2010', 'doi.org/10.1021/np900620y']
			],
			...
		],
		[	['garsubellinC', 'exo', 'A', 'uncaged', 'bicyclononanes'],
			[	[	'Ph',
					'prenyl',
					'H',
					'X = H',
					'(enantiomer)'
				],
				'garcinielliptone I',
				'<i>G. subelliptica</i>',
				'&minus;37.7 (1.1)',
				[	['Weng 2003b', 'doi.org/10.1002/chem.200305209'], 
					['Ciochina 2006', 'doi.org/10.1021/cr0500582']
				]
			],
			...
		],
		...
];