“The situation today with regards the study of large and complex biotech data is akin to watching an HD movie but only being able to view of a fraction of the content in a much degraded format and at a reduced speed.”
We are in living the age of ‘Big’ data and yet many researchers are left struggling when attempting to quickly process, analyse and manipulate very large and complex spectral data. Obtaining meaningful results can prove challenging as the spectrometer is capable of extracting vast amounts of information from a sample, with an average sized clinical study producing terabytes of data.
Does size really matter?
The size of a dataset is determined by the number of detected molecules of which currently, only a small percentage have been identified leaving the vast majority unused. Researchers and Analysts are often forced to restrict their search criteria to a fraction of the ‘known’ molecules as any attempt to expand simply pushes the boundaries of the hardware and software.
As the instrument hardware continues to evolve with even more information extracted from a sample, so too will the size of the resulting data. It is therefore critical that the analytic technologies are not only capable of keeping pace and support this forward development path, but can also provide a deeper insight into the growing pool of information that currently is left stagnant.
A challenge undertaken?
Data driven research proves challenging as the results have to be repeatable with full traceability but in order for this to happen, researchers need reliable and appropriate tools capable of supporting the entire analysis process from start to finish. Many of the current methods and processes employed can be restrictive, individualistic and neither time efficient nor cost effective long term solutions. As such, a consistent approach that will maximises utilisation of the available data and offer the rapid discovery of new biomarkers is long overdue.
Crash, Bang, Wallop!
Computer crashes can and often do result due in part to the hardware specification as well as the inherent design of the underlying software. Attempting to analyse multiple datasets at a given time, imposes strain on the processing hardware and cross comparison can lead to lengthy delays. If the hardware and software restrict researchers from achieving their analysis targets, protracted project timescales and escalating costs will surely result.
Common issues typically experienced may include –
• Software is not designed to load and process large-scale datasets
• Routine software/system crashes
• Poor hardware and software processing performance
• Multiple tools are required in order to get results
• Can be expensive to purchase, maintain and support
• Escalating project costs