Hadoop brings far-flung people together across time and space

With its powerful data mining capabilities, Hadoop is bringing together people across different places and even across different generations.

While Hadoop continues to grow in popularity, most reported use cases around the open-source data-processing platform revolve around ad targeting or some other specialised task. But at O’Reilly’s Strata-Hadoop World, held last week in New York, a number of Internet services talked about how they use Hadoop to bring people together.

Ancestry.com is using Hadoop as the cornerstone of a new service that allows users to submit a sample of their DNA and then have allow the website to look for matches with far-flung relatives, both alive and long-deceased. And social dating service eHarmony uses the service to refine its process of matching its millions of members.

In both cases, Hadoop has excelled at comparing hundreds or even thousands of variables across millions of different entities, a job much too large for traditional relational databases or even data warehouses.

“Hadoop is one of those key tools that has allowed us to create a massively scalable system,” said Ancestry.com Chief Technology Officer Scott Sorensen in one presentation. The service is moving from proprietary tools to Hadoop to parse its large and ever-growing amount of data, he said.

Ancestry.com generates around US$480 million a year in revenue, from people who use the service to chart their ancestry, using their own documents as well as a repository of Ancestry.com’s collection of 12 billion public records, about 10 petabytes’ worth of data.

Hadoop powers a new service offered by the company called AncestryDNA. A user can send in a saliva sample, along with US$99, and the company will take 700,000 snips of the DNA from the sample and load the results into Hadoop, which will compare the snips to more than 200,000 other samples collected by the company. The company can then provide a list of far-flung relatives, whose family connections can go back 10 generations or more.

Half of a person’s DNA comes from each biological parent. “Small changes in that DNA over generations leave bread crumbs that are like a view into history,” Sorensen said. Ancestry.com can use the snips to determine a user’s mix of ethnicities, as well as match the user with distant relatives.

Hadoop proved to be uniquely suited for this task in that it excels at taking 700,000 snips and then comparing those to snips from hundreds of thousands of other people’s DNA to find matches. The service can find, on average, 40 fourth cousins for every customer who submits a sample. That result will only improve as more people submit their DNA, Sorensen said.

The company used a number of algorithms developed in academia for finding hidden matches in DNA. But the engineers at Ancestry.com had to parallelize the algorithms to run them across a multinode Hadoop deployment. Using traditional scale-up architectures, it would take Ancestry.com up to four weeks to compare 120,000 sets of DNA.

Also at the conference, Vaclav Petricek, director of machine learning at eHarmony, described how the online dating service uses Hadoop to make better matches among its customers.

Like Ancestry.com’s DNA service, the fundamental problem eHarmony tackles is a massively parallel one. The service wants to find a set of potential suitors for each member of the service, which involves doing many comparisons across a large number of factors, while slimming down the result sets to manageable proportions.

“We want to give people enough options to keep them engaged, but we don’t want to overwhelm them,” Petricek said. “Because this is an embarrassingly parallel problem, you can run this on Hadoop in parallel.”

EHarmony customers fill out an extensive questionnaire, which helps to estimate the user’s personality across 29 different dimensions.

The system first uses algorithms to predict how happy two potential matches would be if they were married, using scientific studies that describe the personality traits of people in both happy and “distressed” marriages, Petricek said. If they have personality types that would indicate they would be happy in a marriage together, they are considered for pairing.

This is only the first step, however. EHarmony must also predict how attracted two potential people would be to one another.

“There is no guarantee that people who have compatible personalities would be interested in each other,” Petricek said.

Gauging attractiveness between two people is where the use of big-data-styled machine learning comes in. The service keeps track of a wide range of additional variables of its members, from the types of devices used to interact with eHarmony to whether each individual is single or divorced. The company also keeps track of the flow of messages among its members, charting which exchanges led to successful matches and trying to find indicators among all the known variables as to why these matches were successful.

For instance, one fairly predictable variable has been distance. The farther apart two people are geographically, the less likely they are to pick one another from a list of candidates. Another variable is the difference in heights between a potential heterosexual couple. On average, the two people are most likely to communicate if the male is 4 to 8 inches taller than the female. The company will not know which factors ahead of time will prove to be predictors of compatibility, so Hadoop churns through all the combinations of all the variables looking for clues.

The use of Hadoop is improving the service eHarmony offers, Petricek said. According to a third-party study, the divorce rate among couples who met on eHarmony and married between 2005 and 2012 was about half the rate of those who met offline. The sample set is limited, Petricek said, but it still is a “very encouraging” sign for the use of big-data analysis.

Championing cyber resilience: Commvault’s vision for secure digital future

Plus500 expands UAE presence, launches mainland operations with newly secured SCA license

Continental advances AI Integration to boost efficiency, protect client trust

DLD, VARA collaborate to boost leadership in realty and virtual assets regulation

Museum of the Future to host interactive workshops on AI-driven music, future of food, and wellness

Sophos powers up cybersecurity in the UAE

Aster Pharmacy unveils largest regional store in Riyadh, pioneering digital healthcare integration

Cisco expands in Saudi Arabia with cloud data centers, AI talent development, and manufacturing plans

Zebra spearheads digital transformation in Saudi Arabia, aligns with Vision 2030

Tech To Make Riyadh Epicentre of MENA Music by 2030

Microsoft AI Tour showcases groundbreaking AI innovations for Oman

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

KROHNE delivers insights to inspire the next generation of engineers in Oman

Oracle supports major project to accelerate Oman digital economy

Ooredoo accelerates cybersecurity in Oman with new deal

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

BDB launches “tijara” platform for SMEs

Bahrain achieves full nationwide 5G coverage

Batelco, SonicWall launch integrated security solutions for SMEs in Bahrain

Bahrain to offer COVID-19 test results on WhatsApp, Facebook Messenger

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Infopercept opens its first Middle East office in Kuwait

Microsoft Compliance Manager now available in Kuwait

Commercial Bank of Kuwait gets mobile payments moving with Thales Digital Solutions

Ooredoo chooses Fortinet to deliver secure SD-WAN managed services in Kuwait

e& enterprise and RAIN Technology to revolutionise Operating Room efficiency in hospitals across MEA

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Looking for the best label solutions in South Africa? Go OKI!

OKI is only going bigger in the South African market!

Huawei honours Women in Tech at Apps UP 2022

Championing cyber resilience: Commvault’s vision for secure digital future

Sophos powers up cybersecurity in the UAE

e& enterprise and RAIN Technology to revolutionise Operating Room efficiency in hospitals across MEA

Secure Domains brings cutting-edge DNS protection to MENA region

Microsoft names Samer Abu-Ltaif president for Europe, ME and Africa

Samsung, e& UAE sign strategic MoU to advance AI-driven innovation, digital experiences at MWC

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Gender Lens investing vital to economic recovery

Virgin Hyperloop unveils location for Hyperloop certification centre

TikTok taps Oracle as secure cloud provider

Bybit partners with University of Wollongong in Dubai to host Demo Trading Challenge

National IT Academy and Microsoft launch the first Microsoft Datacentre Academy in the Region

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

AWR launches “Mobility and Sustainability through Arts”

PROVEN Solution Collaborates with Rashid Center for People of Determination

Solis poised to transform Dubai’s skyline and deserts into beacons of sustainability

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Huawei launches ground-breaking solar inverter at World Future Energy Summit

Middle East Energy to further boost their sustainability agenda

EDF UK selects Dynatrace to keep the power flowing

Continental advances AI Integration to boost efficiency, protect client trust

Arab Bank Group achieves record net profit of USD 1 Billion for 2024, 40% cash dividends

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Careem Pay introduces instant transfers to Europe

Bybit to host exclusive forum: Bridging Islamic finance and cryptocurrency

Abu Dhabi Government accelerates digital strategy with landmark Microsoft, G42 partnership

UNDP and e& strengthen AI collaboration for sustainable development, advancing health and climate solutions

Albania selects Presight for nationwide AI-powered smart city project

EDGE, e& UAE ink collaboration to boost secure communications at IDEX 2025

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

e& enterprise and RAIN Technology to revolutionise Operating Room efficiency in hospitals across MEA

How will Agentic AI ease healthcare’s workforce crisis?

DFF launches fourth edition of ‘Future Opportunities: The Global 50’ report

Emirates Health Services, Dell sign MoU to enhance digital infrastructure in healthcare

Aster Pharmacy unveils largest regional store in Riyadh, pioneering digital healthcare integration

DLD, VARA collaborate to boost leadership in realty and virtual assets regulation

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

Digitalisation key to accelerating construction development in Middle East, says Trimble

R&M Introduces First Single Pair Ethernet System to Support Middle East Smart Building Trend

To ‘upsmart’ your building, start with the elevator

New data: Gen Z embraces AI for social media spending

Yango Group and ROOTS unveil autonomous robots in Dubai

Hisense launches ‘Together Means More This Ramadan’ campaign with exclusive offers across the UAE

Daleel expands into UAE with new platform targeting region’s $44b personal finance market

Open Innovation AI collaborates with Intel to revolutionize AI orchestration with Gaudi

HID redefines secure authentication with new OMNIKEY® SE Plug

SAS explores fortifying financial resilience with AI and advanced analytics

Plus500 expands UAE presence, launches mainland operations with newly secured SCA license

Continental advances AI Integration to boost efficiency, protect client trust

DLD, VARA collaborate to boost leadership in realty and virtual assets regulation