Utilities

RDF Namespaces

Predefined RDF namespaces for convenience, for use with RdfDatastream objects, in ResourceIndex queries, for defining a eulfedora.models.Relation, for adding relationships via eulfedora.models.DigitalObject.add_relationship(), or anywhere else that Fedora-related rdflib.term.URIRef objects might come in handy.

Example usage:

from eulfedora.models import DigitalObject, Relation
from eulfedora.rdfns import relsext as relsextns

class Item(DigitalObject):
  collection = Relation(relsextns.isMemberOfCollection)

eulfedora.rdfns.model = rdf.namespace.ClosedNamespace('info:fedora/fedora-system:def/model#')

rdflib.namespace.ClosedNamespace for the Fedora model namespace (currently only includes hasModel).

eulfedora.rdfns.oai = rdf.namespace.ClosedNamespace('http://www.openarchives.org/OAI/2.0/')

rdflib.namespace.ClosedNamespace for the OAI relations commonly used with Fedora and the PROAI OAI provider. Available URIs are: itemID, setSpec, and setName.

eulfedora.rdfns.relsext = rdf.namespace.ClosedNamespace('info:fedora/fedora-system:def/relations-external#')

rdflib.namespace.ClosedNamespace for the Fedora external relations ontology.

Testing utilities

eulfedora.testutil provides custom Django test suite runners with Fedora environment setup / teardown for all tests.

To use, configure as test runner in your Django settings:

TEST_RUNNER = 'eulfedora.testutil.FedoraTextTestSuiteRunner'

When xmlrunner is available, xmlrunner variants are also available. To use this test runner, configure your Django test runner as follows:

TEST_RUNNER = 'eulfedora.testutil.FedoraXmlTestSuiteRunner'

The xml variant honors the same django settings that the xmlrunner django testrunner does (TEST_OUTPUT_DIR, TEST_OUTPUT_VERBOSE, and TEST_OUTPUT_DESCRIPTIONS).

Any Repository instances created after the test suite starts will automatically connect to the test collection. If you have a test pidspace configured, that will be used for the default pidspace when creating test objects; if you have a pidspace but not a test pidspace, the set to use a pidspace of ‘yourpidspace-test’ for the duration of the tests. Any objects in the test pidspace will be removed from the Fedora instance after the tests finish.

Note

The test configurations are not switched until after your test code is loaded, so any repository connections should not be made at class instantiation time, but in a setup method.

If you are using nose or django-nose, you should use the EulfedoraSetUp plugin to use a separate Fedora Repository for testing. With django-nose, you should add eulfedora.testutil.EulfedoraSetUp to NOSE_PLUGINS and '--with-eulfedorasetup' to NOSE_ARGS to ensure the plugin is automatically enabled.


class eulfedora.testutil.EulfedoraSetUp
help()

Return help for this plugin. This will be output as the help section of the –with-$name option that enables the plugin.

class eulfedora.testutil.FedoraTestWrapper

A context manager that replaces the Django fedora configuration with a test configuration inside the block, replacing the original configuration when the block exits. All objects are purged from the defined test pidspace before and after running tests.

class eulfedora.testutil.FedoraTextTestRunner(stream=<open file '<stderr>', mode 'w'>, descriptions=True, verbosity=1, failfast=False, buffer=False, resultclass=None, tb_locals=False)

A unittest.TextTestRunner that wraps test execution in a FedoraTestWrapper.

run(test)

Run the given test case or test suite.

class eulfedora.testutil.FedoraTextTestSuiteRunner(pattern=None, top_level=None, verbosity=1, interactive=True, failfast=False, keepdb=False, reverse=False, debug_mode=False, debug_sql=False, parallel=0, tags=None, exclude_tags=None, **kwargs)

Extend django.test.simple.DjangoTestSuiteRunner to setup and teardown the Fedora test environment.

eulfedora.testutil.alternate_test_fedora

alias of eulfedora.testutil.FedoraTestWrapper

Synchronization

syncutil.sync_object(dest_repo, export_context='migrate', overwrite=False, show_progress=False, requires_auth=False, omit_checksums=False, verify=False)

Copy an object from one repository to another using the Fedora export functionality.

Parameters:
  • src_obj – source DigitalObject to be copied
  • dest_repo – destination Repository where the object will be copied to
  • export_context – Fedora export format to use, one of “migrate” or “archive”; migrate is generally faster, but requires access from destination repository to source and may result in checksum errors for some content; archive exports take longer to process (default: migrate)
  • overwrite – if an object with the same pid is already present in the destination repository, it will be removed only if overwrite is set to true (default: false)
  • show_progress – if True, displays a progress bar with content size, progress, speed, and ETA (only applicable to archive exports)
  • requires_auth – content datastreams require authentication, and should have credentials patched in (currently only supported in archive-xml export mode) (default: False)
  • omit_checksums – scrubs contentDigest – aka checksums – from datastreams; helpful for datastreams with Redirect (R) or External (E) contexts (default: False)
Returns:

result of Fedora ingest on the destination repository on success

class eulfedora.syncutil.ArchiveExport(obj, dest_repo, verify=False, progress_bar=None, requires_auth=False, xml_only=False)

Iteratively process a Fedora archival export in order to copy an object into another fedora repository. Use object_data() to process the content and provides the foxml to be ingested into the destination repository.

Parameters:
  • obj – source DigitalObject to be copied
  • dest_repo – destination Repository where the object will be copied to
  • verify – if True, datastream sizes and MD5 checksums will be calculated as they are decoded and logged for verification (default: False)
  • progress_bar – optional progressbar object to be updated as the export is read and processed
  • requires_auth – content datastreams require authentication, and should have credentials patched in; currently only relevant when xml_only is True. (default: False)
  • xml_only – only use archival data for xml datastreams; use fedora datastream dissemination urls for all non-xml content (optionally with credentials, if requires_auth is set). (default: False)
dsinfo_regex = <_sre.SRE_Pattern object at 0x2a71640>

regular expression used to identify datastream version information that is needed for processing datastream content in an archival export

encoded_datastream()

Generator for datastream content. Takes a list of sections of data within the current chunk (split on binaryContent start and end tags), runs a base64 decode, and yields the data. Computes datastream size and MD5 as data is decoded for sanity-checking purposes. If binary content is not completed within the current chunk, it will retrieve successive chunks of export data until it finds the end. Sets a flag when partial content is left within the current chunk for continued processing by object_data().

Parameters:sections – list of export data split on binary content start and end tags, starting with the first section of binary content
get_datastream_info(dsinfo)

Use regular expressions to pull datastream [version] details (id, mimetype, size, and checksum) for binary content, in order to sanity check the decoded data.

Parameters:dsinfo – text content just before a binaryContent tag
Returns:dict with keys for id, mimetype, size, type and digest, or None if no match is found
object_data()

Process the archival export and return a buffer with foxml content for ingest into the destination repository.

Returns:io.BytesIO for ingest, with references to uploaded datastream content or content location urls
url_credentials = None

url credentials, if needed for datastream content urls

syncutil.estimate_object_size(archive=True)

Calculate a rough estimate of object size, based on the sizes of all versions of all datastreams. If archive is true, adjusts the size estimate of managed datastreams for base64 encoded data.