10 years, 3 Supervisors, 7 assistants and 30 students. How the Iowa Office of the State Archaeologist managed, manages and plans for the future of archaeological data

by Mary De La Garza - Thu, 05 Apr 2018
Tags: #digitization #archives #sensitive-data

Mary De La Garza

Sustainable accessible data storage is as important to archaeologists as tractors are to farmers. In 2001, the University of Iowa Office of the State Archaeologist (OSA) was archiving 20GB of data on a 100GB server and recognized that the addition of a staff member to focus on network, database, server, and related digital issues was of paramount importance. Seventeen years later, as OSA’s Director of Research Technology, I manage how the office serves 32TB of data via several server systems. I am planning to archive up to 60TB over the next four years. In addition to digital space management, the office must also make portions of these data in its many forms accessible to outside entities, involving us in significant technical data security issues.

In the not so distant past, archaeologists had only one archival option, securing and storing paper, photos, and film in Hollinger boxes stacked to the ceilings, sometimes but not always in dedicated Archives spaces. These days, many archaeological agencies have scanned and archived massive amounts of data to server systems taking up less space and containing many times more data. The OSA has taken advantage of several grants to scan and archive over 400,000 sheets of paper and 58,853 photos and slides, and also to digitize as GIS shapefiles the locations of nearly 30,000 archaeological sites. The advantages of digitization include preservation, accessibility (including control over access to sensitive data), and scalability.

Preservation

A near-miss experience with a tornado in 2006 galvanized OSA’s efforts to digitize our paper documents and photographs. Although this was the first-ever tornado to touch down within the Iowa City limits and such an event has happily not been repeated since, the threat was all too apparent. The archived associated records now held by the OSA are 93 percent digital with original paper preservation copies also available for reference if needed. The paper record is protected to our best ability, but realistically remains vulnerable to tornadoes, floods, and fire. The digital record is better protected through storage on servers at multiple locations, but it is part of the electronic infrastructure our society as a whole is increasingly dependent upon. In our view, keeping original paper records is an important complementary aspect of creating the digital counterpart.

Accessibility

For decades, OSA staff struggled with the classic problem of an Archive with just one paper copy of in-demand documents—who had it at any given time? A paper-based check-out system was used (by some) but there was always a hunt in progress! Digitization has greatly relieved this problem with most of the staff now only accessing digital versions of one-of-a-kind records via their desktops (and increasingly, through their mobile devices). This means of course that multiple users can access the same document, and there is no fear of a document or photograph being misplaced. Likewise, for researchers and companies who conduct archaeological consulting, it is no longer necessary to drive to Iowa City to access data. By logging into our interface, I-SitesPro or I-SitesPro GIS, qualified and licensed users can:

  • View shape files in a browser or through a mobile application,
  • create and export markup on locations of interest,
  • query our databases,
  • check out site numbers and submit new or supplemental site data,
  • read project reports,
  • access the Collections database, and
  • access site data.

Access to these data on a 24/7/365 basis has greatly increased the general efficiency of the consulting work being conducted. Digital access also has the potential to increase the quality and sophistication of consulting as it opens up access to comparative data, literature, and images otherwise difficult for remote researchers to discover and utilize. While these resources are available to researchers visiting the OSA facility, most project budgets typically limit such visits, and there is rarely time allotted for follow-up. Digital access allows researchers to return to the OSA Archives at will to pursue unfolding research questions.

Sensitive Data

Keeping sensitive data secure comes with challenges. We no longer rely on a lock on the door and an archivist checking you in or out of a secure area. If you are planning on hosting your data, consider the following so that you can avoid catastrophic theft and vulnerability.

  • Hardware and software firewalls are necessary.
  • Assign user passwords with random password generators, and they are strong.
  • Do not allow users to change their passwords without using your password generator.
  • Review your user list on a regular basis.
  • Patch your servers as prescribed.
  • Make sure your virus software is up to date.
  • Understand where your data resides, who manages it, and stay in touch.

In addition, photographic data can be as telling as shape files. Time must be taken to ensure photos that can potentially give away site information are treated as such. They should be tagged and secured accordingly.

Scalability

Big data is here to stay and archaeological opportunities to make use of it are at our discipline’s doorstep. As noted, OSA grew from using and archiving 20GB of data to using 32TB in just 16 years. Digital usage is only going to grow at an even faster rate. It is really not hard to imagine once you factor in not only scanned paper files, but also photos, video, SQL DBs, orthomosaic, DSM, point cloud, mesh (.obj .mtl . jpg), LiDAR, and LAS files generated by an increasingly high-resolution array of digital devices. One example I’m involved with is use of unmanned aerial vehicle (a.k.a., drones) in archaeology.

The drone

This is ArchE1

With ArchE1, I flew two flights over 13LA12 (Gast Farm), an expansive Middle and Late Woodland village site located in southeast Iowa, on May 7 and August 25, 2017. Those two flights led me to collect 30GB of photos, thermal images, video, audio, telemetry, and lots and lots of log data. I have then created from these field data another 10GB of models, and I’ve just scratched the surface of what is possible for digital analyses of this site.

3D model of an archaeological site

3D Gast Farm model from 400 feet

Future Plans

Today our data sits on servers in an Enterprise environment that has some scalability in place. When I say some, I mean that if all of my users started to routinely dump 30-40GB projects onto the system without considering space in as short a period as I did back in the summer of 2017, a few things would have to take place that are time consuming and costly. First, a new server would have to be ready to go on-line with additional space, and data would have to be migrated. Alternatively, storage space could be added to our current random array of independent drives (RAID) in chunks, but doing so would necessarily disrupt service. Secondly, if our users start to hit our data at a greater frequency than expected, a server would have to be prepared and put up to handle the traffic and provide expected access speed.

Moving into the future, the OSA, like other big data generating and maintaining organizations, must develop and plan for digital security, storage, and accessibility in ways that protect sensitive data. Security is paramount in any organization but especially so when culturally sensitive data is in place.

Our plan is to put our data securely in a scalable system that does not rely on a server needing to be prepared and put up in a matter of hours. So how will I accomplish this? Via two systems with scalable mirrors in two separate locations. Key characteristics of this design include:

  • A plan for accommodating future storage capacity,
  • the use of SRM to calculate changing growth,
  • one server up while the other waits to be called,
  • readily available and expandable space,
  • traffic that can be diverted and balanced between servers when traffic is high, and
  • separate locations in case of catastrophic events.

Storage resource management (SRM) is used to analyze and manage storage in the enterprise environment. It is pricey but the results are clean and it beats trolling stacks of backup data to get an idea of how much data capacity your system will need in the future. I see it as an investment in your time because spending all those extra hours trying to figure out what you need can be as easy as running a report.

There is more to managing and scaling accessible sensitive data than meets the eye especially when you consider all the complexities of our archaeology world. Proceed with caution and a lot of secure server spaces.

Comments