Cyber Talk Radio: Security Implications of Data Science

Bret Piatt, CTR Host and Michael DeFelice, Jungle Disk’s Principal Data Scientist. - Week 57 of Cyber Talk Radio

Show Summary

This past Saturday, October 28, episode 57 of Cyber Talk Radio hit the air on 1200 WOAI and iHeartRadio streaming. I was joined by Michael DeFelice, Jungle Disk’s principal data scientist.

In the first half of the show, Michael introduced himself and spoke about his journey into data science. He then delved into the concept of data science, synthesizing it as “the idea that there is a lot of data out there, the science part is making sense of it.” As the field develops, Michael explained how low the barriers to entry have become, where in the past you needed a mammoth computing budget to go through even small data sets, nowadays a laptop should suffice for most things. Have a bunch of web logs? You can use Elkstat to parse the data. Indeed, in general, Michael explained that most of the tools you would ever want or need are available open source.

This led into a wider discussion about “big data,” and Michael explained that in his view big data has distorted people’s understanding of real data. It is, he observed, actually very difficult to produce a data set too large to fit on a laptop, though it’s not so hard to make a data set too big for Excel. The implication of this is that anyone can become a data scientist, you don’t need to be spending hundreds of dollars a month on cloud computing and storage, in reality your laptop will be enough, oftentimes, your phone can even manage much of it. This has been further demonstrated in recent months with the Census Bureau starting to release data on API, not Excel, to the general public allowing flexible testing and correlation analyses and allowing the release of greater quantities of data. Indeed, if many of the Census Bureau’s data sets are not too large for your laptop, it’s hard to imagine what publicly available data sets would be!

Michael and I then explored developments in artificial intelligence (AI). We noted that machine learning is allowing machines to get very good at finite skills, such as Chess, and even more complicated games. However, chatbot advancements are much slower, as Michael observed that chatbots don’t understand English, they just understand memorized phrases. This dissonance can, on occasion, have disastrous consequences. Microsoft’s Tay chatbot illustrates these risks: Within hours of being launched, Tay had become a terrifying online troll, constantly spewing racist, anti-Semitic and anti-feminist slurs, this happened because many people on the internet say these things, and tweeted them at Tay, yet because Tay doesn’t understand the meaning of words it couldn’t learn what not to say. So severe was the issue that Microsoft took Tay off air less than two days after she launched and haven’t brought her back since.

In the second half of the show, Michael delved into some more disturbing aspects of technological development as we discussed voice recognition and voice impersonation technologies. Michael observed that if you want to get someone to do something, the best way of doing it is impersonating a person in authority or saying a friend did it. As voice impersonation technology improves and the quantity of data on your friends grows, both of these tools become increasingly fungible with very serious implications for society. Right now open source software APIs are sophisticated enough to allow you to make robocalls, and much more besides. This means that what we face isn’t a hypothetical threat of tomorrow, but a genuine threat of today.

We then examined the kinds of technologies that these advancements are finding their ways into. I pointed out how difficult it was to disable the built-in microphone in my new TV, and we discussed the general trend of “if it’s free, you’re the product.” People often think this applies narrowly for things which are totally “free” but they apply more generally now as well, as for example, TV shows. Michael then dove into explaining TOR the web browser which allows users to access the dark web. As he pointed out, the dark web itself is not a particularly sophisticated place, indeed, its marketplaces, just like the now-destroyed Silk Road, are more akin to a low-tech Amazon than a feat of tech wizardry. It was, however, very helpful to be able to end out discussions by talking about something as topical as the dark web and cryptocurrencies, an essential component of the anonymous internet, which links some of the wider change to familiar social ills.

To learn more about cybersecurity, listen to the full episode replay available here!

Upcoming episode – Saturday nights from 11:00 p.m. to Midnight -

Episode 58, November 4: [Technology in San Antonio]

Listen to a replay of this episode or past episodes on a Cyber Talk Radio Podcast stream. Replays are available via the below podcast services:

Recent episodes – available to stream from our YouTube channel -

Have an idea for a topic or want to be a guest?

Contact Cyber Talk Radio via our request a topic or be a guest form.

Protect Your Business Data

We are passionate about helping our customers protect their data. We want you to use Jungle Disk to protect yours. Click on Sign Up to get started. It takes less than 5 minutes!

Sign Up