Data Analytics: Should We Build Iron Man or R2D2?

Tuesday, July 27th, 2010

Earlier this year, Alex Handy wrote an intriguing article on exploring the future of data analysis, which In this article Handy compared and contrasted two approaches to understanding the ever-increasing stream of data. One approach depends upon building “exoskeletal systems”, which enhance human comprehension. Hardy draws connections to this solution and “Iron Man”. The other approach would depend chiefly on autonomous robots or automated systems. This alternative, Hardy suggests, is more like “R2D2” from Star Wars. Ultimately, Handy concludes that “[d]evelopers should build Iron Man, not R2D2.”

Here at Digital Reasoning, we have been dealing with the challenges of automated understanding of massive amounts of unstructured data for years. Knowing that Tim Estes, our CEO, might have a different view on this issue, I decided to interview him. Tim has worked within the realms of unstructured data analytics, artificial intelligence, and machine learning for the past decade.

The following is our interview:

Jason Beck – In the article, one researcher suggests that developers shouldn’t build analytics robots, but rather “exoskeletal systems”. Do you agree?

Tim Estes - I think that it’s a matter of degree. The range of judgments that a machine can make as a proxy for the human is constantly and necessarily expanding. Even R2D2 was most famous for taking orders from Luke Skywalker trying to accomplish tasks from fixing the X-wing in flight to cracking into computer networks.

Just to be a little more accurate – Iron Man wouldn’t work without an AI that is close to R2D2. Jarvis (the AI program that runs’ the Stark house and the Iron Man suit) is always chatting up Tony Stark about what’s going on with the suit and the risks that are present around him. The Iron Man analogy means we seed the full situational awareness (the sensory and data input space) to the machine with the human making key decisions on the filtered and prioritized information. I think that’s about right.

R2D2 is distinct in having a measure of its own intentionality (i.e. it is autonomous in more dramatic ways than Jarvis/Iron Man suit) but they are much more close than you might think. Should humans get out of the loop in making analytic judgements? No more than we should have pilots out of the loop in flying commercial airlines at this time. But show me a pilot that can fly a 747 without computer assistance and guidance? We are already in the hybrid space. And the complexity of our technology and the explosion of the information created by machines and man assisted by machines means we will need ever increasing automation in understanding.

JB – Doesn’t the exponential growth of data and decreasing levels of available talent necessitate automated systems?

TE - Exactly. The notion that “augmented intelligence” can solve the full data problem is wishful thinking. Something has to read everything and that can no longer be a human as a matter of scale. We have to make strides to catch up intelligent systems with the complexity and scale of the data we are being inundated with.

JB – Is this an Either-Or situation? Just because someone may prefer automated systems, does this assume that there won’t be any human in the loop?

TE - I think that’s the real issue – where is the dividing line right now and where is it going to be in 5 years? Right now – machines have to read and organize everything. The race is to see who can do it accurately, at scale, and focused on the entity-level vs. the document level. In five years, the information overload will be so substantial that autonomous proxies or agents will likely be the baseline for all of these systems. In both situations, humans are in the loop. Now – they have much greater heavy lifting because nearly all of our enterprise information systems don’t really understand their data that well so the burden is on the reader. That has to change. Even when it does, we will just be enabling the humans to make better decisions in less time and less interruption of their daily lives.

JB – Does the delineation between these two approaches represent a common split in the overall text analytics community?

TE - I think so. We can either be satisfied with augmenting the status quo or we can get to the root of the issue – that software doesn’t understand natural signals that make up unstructured data. We are in a place of diminishing returns with simple classifiers and ETL (Extract, Transform, Load) architecture. The more exciting alternative, however, is to go at the semantic and scale problems with the appropriate technologies and transform the enterprise to be entity-oriented.

JB – Can you think of any example where someone tried to completely automate text mining?

TE - Not off the top of my head. I’m sure there have been. But a lot of text mining is feeding either fancy search engines (such as faceted navigation and data enriched topic clustering) or Business Intelligence frameworks.

JB – What does the future look like regarding automation?

TE - Its going to go from being reactive (search, research, and investigation) to being proactive (push, warnings, summaries). Its going to go from two major silos inside the enterprise – the human curated/ structured data and the content management/unstructured data – to being one, unified entity-oriented data store. Once this is done, programs will constantly monitor this unified data store for areas of interest to users and start to screen most everything and prioritize it. Eventually, we’ll get some real next generation automation out of this because there will be a class of actions that will be autonomously executed without requiring human intervention (such as determining the defense policy in a detected cyber-attack).

JB – What other thoughts do you have about this?

TE - I think that as we weigh the risks or errors in additional automation, we need to be wary of irrational risk aversion. The poverty of attention that most people suffer from has very real consequences even if we don’t fully understand that right now. Solutions which give small, incremental gains are unlikely to get ahead of this increasingly detrimental phenomenon. Without something reading everything and getting smarter, we are simply rolling the dice on what we don’t have time to read or consider. That’s the other side of the coin of the incremental approach.

  • Share/Bookmark

Matthew Russell’s OSCON 2010 Interview about Unstructured Data

Friday, July 23rd, 2010

Mac Slocum interviews our VP of Engineering, Matthew Russell, about Unstructured Data at the recent Open Source Conference (OSCON) is Portland, Oregon.

  • Share/Bookmark

Digital Reasoning’s Matthew Russell featured at OSCON

Friday, July 16th, 2010

Next week, July 19-23, hundreds of developers, designers, hackers and geeks will gather in Portland, Oregon, for the 12th Annual Open Source Conference (OSCON). According to the Open Source Initiative (www.opensource.org) “The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in”.

“For those who have not been to OSCON, it’s a great technical conference covering the whole spectrum of open source, including Linux, MySQL, the LAMP stack, Perl, Python, Ruby on Rails, middleware, applications, cloud computing, and more”, said Zack Urlocker from InfoWorld. “OSCON always has great keynotes, tutorials, and evening Birds-of-a-Feather sessions. As with many conferences, a lot of the meat takes place in hallway conversations and impromptu sessions”.

Matthew Russell, our VP of Engineering at Digital Reasoning, will be speaking at the conference again this year. Matt will be sharing his insights about Natural Language Processing, advanced analytics, and entity resolution on a massive scale. (See here for more details: http://www.oscon.com/oscon2010/public/schedule/detail/13988)

I recently sat down with Matthew, and asked him about open source, OSCON and his upcoming participation. The following is part of our discussion:

Jason Beck – So, how long have you been involved with OSCON?

Matthew Russell – I’ve attended and spoken at OSCON the last three years?

JB – You’ve worked within or supported the Intelligence Community the last several years. Have you seen a change in attitude towards open source?

MR – Absolutely. Whereas it was treated very skeptically years ago, it’s now practically a requirement for any project we do.

JB – Why is that? Aren’t there lingering issues or concerns for security?

MR – A few people still claim those issues. However, the reality is that open source is inherently more secure. Think about it, you have potentially thousands of eyes looking over code and quickly addressing issues. However, if you have a piece of proprietary software or code, you may look at that more infrequently, since fewer people would be looking at the code.

JB – Is there someone that inspired you within the open source community?

MR – Not really. The thing is, open source is by its nature more about the community than about any one person within that community. It is all about collaborating with others to make something useful.

JB – Why would someone want to attend OSCON?

MR – It has quickly become the preeminent conference on open source technologies, issues and ideas. Seriously, it is a place where you can meet other interesting people committed to doing interesting things.

For more information on OSCON, you can visit their website at www.oscon.com/oscon2010

  • Share/Bookmark

Security through Obscurity

Wednesday, May 26th, 2010

“Security through Obscurity” is a term often used to refer to security provided by keeping details of a system secret, or by making a system so obtuse that it is difficult to determine how it works, thus hiding its vulnerabilities. Unfortunately, I believe that there is also an application of this term to the need of identifying and tracking the important information hidden in the mountains of digital data generated each day.

While technology has provided several good paradigms for dealing with structured data (i.e. data that is structured in such a way to be easily decomposed into pre-defined fields), it has not kept pace with unstructured data, such as emails, blogs, web site content, etc. Thus, critical information is often kept “secret” through the obscurity of the sheer volume of data one must process, often manually, to reveal this information.

In response to this challenge, Digital Reasoning Systems, Inc has developed a comprehensive set of analytical tools packaged into product called Synthesys™ that essentially decomposes unstructured text into meaningful information easily understood and manipulated by a user.

This technology is based on the premise that there is order inherent in all languages that can be discovered and mathematically modeled. This has led to the development of our advanced data analytics and knowledge abstraction for unstructured data, based on a distinctive, patented mathematical approach to natural language processing.

For a better understanding of Synthesys™ and its capabilities, a down-loadable white paper (Synthesys – Technology Overview) providing a high-level overview can be found here.

  • Share/Bookmark

Eight Things I Learned While Helping Tennessee Flood Victims

Friday, May 14th, 2010

On Tuesday, May 4th, forty-eight hours after the worst natural disaster hit our area, our company meeting seemed less important – we paused. Our company’s president, Rob Metcalf, stopped the meeting and redirected our conversation. In that moment we simply could not go about business as usual.

Each of us in the room had been affected directly or indirectly by the floods in Middle Tennessee. We knew we had to do something, we just weren’t sure what. But, rather than form a committee we empowered our employees to do what they thought would make the biggest difference. For some, it meant giving money to various charities like the Red Cross or Hands on Nashville, for others it meant supporting our community by buying “I Love Nashville” flood t-shirts, but for me it meant a trip to the country.

So, on Wednesday, May 12th, my wife, two children, and I travelled to Centerville, Tenn., which is located in Hickman County, one of the counties placed on the federal disaster relief list. My wife and I pulled our son, Carter, out of school for the day. We did so, because we sincerely believe that, in addition to his classroom instruction, we must cultivate civic responsibility and give him opportunities to show compassion.

It wasn’t long before that we encountered to the first lesson of the day: rely on your Granny’s instructions (who actually lives in Hickman County) rather than TomTom’s GPS directions. Apparently, TomTom didn’t realize that many of the backroads had been washed out and away.

After about two and a half hours on the road, we arrived at Fairfield Baptist Church, which is a recognized relief center.

My wife, Jen, and I were immediately impressed with how well-organized all of the food items and other things were there. At some level, we were disappointed, which I know is a weird emotion – I think we were hoping that we could be more helpful. However, it wasn’t long before one of the volunteers told us about the “clothing store” that had been set up a few miles down the road in a vacated store. Apparently, the owner simply let the church use the store rent free.

Hoping that we could, in fact, help, we loaded up and headed down the road. When we arrived a few moments later, we weren’t immediately sure that we were in the right place. There appeared to be little or no indication of any activity. But, as we got closer to the front door, we saw what we hoped to see – an unmet need. At that moment, we knew we were in the right place and felt a surge of energy.

When we went inside we met the two ladies who were trying to make sense of it all. With over a hundred large bags and boxes of clothing and other items, a few tables, and no hangers, it was a challenge.

Over the course of the next four or five hours, we worked non-stop to unpack and organize all of the items there.

What follows are a few of my observations and thoughts about volunteering:

1. Lead with a broad smile: You can have a warehouse full of items from bleach to peanut butter, but you need to show concern and love to those hurting it . At the end of the day, people simply want to know that someone cares about them, is willing to listen, and meet a need. So, be sure to smile.

2. A little marketing goes a long way: You can have a warehouse full of items, but if no one knows where you are then it is useless. When we first arrived to the “clothing store”, we realized there were no signs. The first thing Carter and I did was to run across the road to Dollar General and buy poster board, markers and tape. Sometimes just a simple sign is enough. We also told a couple of other local churches and merchants. Word of mouth is king.

3. Dream Big and Think Small: At first, my family and I had really big hopes of making a huge difference. Sometimes I think we can be overwhelmed by the size of the need here in Tennessee but, we realized that we simply had to focus on helping people one person at a time. If anyone doubts that you can make a difference that way, ask the man that simply needed a pair of shoes or the woman that needed a bar of soap and a few bottles of water to wash her child.

4. When giving, consider your intentions: After sorting through thousands of shoes, clothes and other items I realize something: some people give to meet a need and others are cleaning out their attic – it is just not enough to check the box. We found a wide range of things. One person had taken the time to pack a new towel, washcloth, toothbrush, toothpaste, shampoo and conditioner into individual 5-gallon ziplock bags. That is thoughtful. On the other end of the spectrum, here are some of the weirdest things that people donated yesterday:

a. A mink coat. At what point do you say to yourself “yeah, I think someone ravaged by a flood could use a mink jacket”?

b. A headless, armless, legless Power Ranger action figure. This was my son’s favorite. In fact, after a good laugh, he asked me “who would donate an amputated Power Ranger?”; I, of course, had no idea.

c. Three empty wine racks. Just to be clear, I am not making a moral judgment here, I’m simply asking…what do you do with wine racks when you may not even have a home?

d. An antique Victorian folding rocking chair with tapestry upholstery. It isn’t weird or gross, I just thought it was odd yet beautiful.

5. You don’t know what you have until you don’t have it anymore. Water and ice. You really don’t think about how important water is and what a luxury ice is until you don’t have it anymore. Even while deployed to the hinterlands of Iraq, I always had plenty of water. Yesterday, when Carter took a few moments to go to a nearby fast-food restaurant to get lunch, we couldn’t use the sinks, because of the lack of water, and they served us canned drinks. It wasn’t that it was a burden, it just made you realize how devastating it would be not to have regular, clean water. All I could think of was Samuel Coleridge’s poem in which it writes “water, water everywhere and not a drop to drink”. Fortunately, groups like the Red Cross brought in large supplies of water and Gatorade.

6. Tip O’Neill was right, all politics is local. I was pleasantly surprised when the Hickman County Mayor arrived with tables, hangers and clothes racks. There wasn’t a film crew, local reporter or the promise of recognition. He simply wanted to help his community get back on its feet. For once, I saw the brighter side of politics.

7. Uncle Earl is more likely to help than Uncle Sam. Immediately after the floods, Patten Fuqua wrote a blog that has both inspired us here in Tennessee and come to represent the spirit of Tennesseans. We help each other. Though I’m not completely trying to be inflammatory, the reality is you are far more likely to see your neighbors helping you than the federal government. I saw both young and old alike come together in the few hours I was there to simply show kindness, concern and give what they could. Many of the folks that brought items were themselves flood victims – that says everything.

8. Hand Sanitizer is Good Stuff. No explanation needed…

I suppose, to conclude in some clumsy way, I learned that it has less to do with what you give (although some things are clearly better than others) and more to do with how and why you give it. For me and my family, as it often happens, we received much more than we gave. In the end, I really think Carter learned more in those few hours of selfless service to others than he would have learned at school.

So, what are you going to do to help those in need?

For pictures from throughout our day in Hickman County, go to our Twitterpic page.

  • Share/Bookmark

The iPad as the “End of an Era”? – Not the way I see it.

Tuesday, April 13th, 2010

In this month’s Wired magazine, Stephen Johnson writes: “The tablet may turn out to be the final stage of an extraordinary era of textual innovation.” (http://www.wired.com/magazine/2010/03/ff_tablet_essays#johnson)

Johnson’s point is that the small digital footprint of text and nearly infinite computing power of the PC (and now the iPad), means that it’s now only the copyholders that prevent instant access to everything every written, and thus the end of an era.

I disagree.  I think we’re just getting started.

It’s true that computers and networks have dramatically amplified human capacity to generate, store and share text.  It’s also true hardware and software have converged to integrate vast stores of digital information in our every day lives.

However, we’re still remarkably distant from computers being able to understand what we mean when we write.  Sure, gmail can post adds for Coca-Cola when I’m writing to my friends about steps in the steel refinery process, but that remains far from true understanding.

As our devices get more sophisticated, what must happen next is an era of understanding.  While “understanding” requires interpreting myriad inputs, the cornerstone of understanding humans is the ability to comprehend the written word.

This isn’t a new problem, and intelligent people have been working on a solution for quite some time.  The building blocks are clear and solutions are beginning to emerge in the market.

First, you won’t be able to rely on dictionaries to sort out meaning, for the simple reason that words change based on who speaks and in what context.  “Park” is a noun symbolizing where I have a picnic, a verb for what I do with my car when I’m at the store, or, with slightly less frequency, a proper last name describing an individual from a certain family.

Second, you’d better bring a big computer (and have some very good shortcuts) because speed matters.  A human of average intelligence uses about 10,000 words and adjusts the meaning of those words based on tone, location, speaker, non-verbal cues, etc.  While the field of human psychology is rife with examples of our cognitive shortcuts and their corresponding failings, the human brain does a remarkable job with a very computationally intense process.

For a moment, consider a world where the computer will understand text with the same speed and depth of humans.  A new era will be upon us.   You’ll CC your digital assistant on an email and it will schedule a meeting with right people at the right time, book flights for all attendees and make sure you’re eating at a restaurant that can accommodate your co-workers’ special dietary needs.   If you are a lawyer, your computer will suggest arguments with the greatest chance of success for a specific judge, based on the judge’s published opinions, all while you are writing the initial brief.   For doctors and nurses, the computer will suggest and rule out possible diagnosis as you dictate a patient’s symptoms.  Or, even more commonly, as you and I are writing typing our thoughts on a topic of interest, your computer will find people with similar interests and cite relevant passages of everything ever written.

No, not the end of an era.  Far from it.

I’d say the more important era is just beginning.

Rob Metcalf is the President and COO of Digital Reasoning Systems, Inc.
Digital Reasoning is solving the challenge of distilling useful information out of unstructured data – on a massive scale and in real time.
  • Share/Bookmark

Project Managers Unite …We Can Do Better!

Monday, March 22nd, 2010

“Common Sense” Project Management (Part 2)

(Strong project leadership and enlightened “people” management.)

I almost consider the term “project manager” a misnomer. Successful projects are “led”, not “managed”. Projects are too dynamic to lend themselves to just being managed. What are the hallmarks of a good project leader? I believe that a strong leader possesses a strong sense of purpose that instills confidence to the project team. A strong leader thoroughly understands the goals of a project, and the requirements to achieve those goals. They are also very decisive when faced with difficult decisions. A project team needs clear direction and quick conflict resolution. Lack of timely decisions or unresolved conflicts often lead to project objectives not being met on schedule, and can also lead to morale issues on the team.

An IT project should not be run as a democracy, but rather as a benevolent dictatorship. While it is good to obtain consensus when making major decisions, it should not be a requirement for the project leader to come to a timely decision. A project awash in indecision is ripe for failure. Also, once a decision is made, it should stay “made”, unless new compelling information comes to light that was not factored into the original decision. I have seen projects where the de facto project motto seemed to be that any good decision was worth making several times. This leads to confusion and wasted effort.

Part of being a strong leader is conveying a very concise message regarding the project goals and how those goals are going to be met. Just as important is a good understanding of the non-goals of the project (i.e. specific goals that the project will NOT address).  If you don’t paint a clear target, don’t be surprised if no one hits it. A strong leader creates this clear project focus by imparting an unambiguous understanding of the following to the project team:

  1. The project goals – the team needs a crystal clear understanding of the project goals. These goals need to be simply stated and non-ambiguous. If you don’t know where you are going, how do you ever expect to get there. The team should also clearly understand what the non-goals are (i.e. the functions/ features/capabilities that are specifically NOT being provided by design). The level to which these goals are understood will directly impact the effectiveness and “correctness” of every major decision made in the project.
  2. Each team member’s role in meeting the project goals – just as important to understanding the overall project goals is every team member’s understanding of their own individual roles in meeting these goals. Knowing where your tasks fit into the overall scheme of things provides the context necessary to disambiguate and prioritize the minor issues that always arise in the execution of individual tasks. In other words, it helps everyone in the boat to row in the same direction and not work against each other.
  3. A conflict resolution process – the project team needs to know how conflicts will be resolved. A conflict can be a difference of opinion on some aspect of the project design, or any obstacle preventing someone from accomplishing their tasks. A strong leader resolves conflicts as quickly as possible to minimize their impact on the project (i.e. conflicts DO NOT improve with age).

Another hallmark of an effective leader is a good understanding of human nature. A project leader sets the tone of the project primarily through how they treat the project team. Team members need to be treated as individual people, each with their own personal aspirations, and NOT as project resource units. Everyone needs some level of affirmation for a sense of accomplishment. The project leader needs to know his or her team members well enough to provide the right level of affirmation to each person. Treating them with respect creates the type of atmosphere that successfully sustains the project team through the difficult times that often occur during the course of a project. Teams that are led through fear and intimidation often fail because this type of leadership divests team members from project ownership and discourages them from going that “extra mile” often required to get through the difficult times.

Finally, a strong leader NEVER takes credit for accomplishments rightfully belonging to individual team members or the project team as a whole.  A project leader’s success (or lack thereof) should just be a reflection of the project team’s success, and not measured by their own individual efforts.

  • Share/Bookmark

Harry Schultz Featured in Processor Magazine: Sidestep Project Management Landmines

Friday, March 12th, 2010

March 12, 2010 • Vol.32 Issue 6

Page(s) 26 in print issue

by Sixto Ortiz Jr.

Sidestep Project Management Landmines

Poor Communication, Lack Of Leadership & Other Problems Can Hamper A Project’s Success

Key Points

• Technical know-how, business acumen, and people management skills are all ingredients of successful IT project management.

• Capturing top management support greatly increases the odds for success.

• Risks cannot be eliminated, but they can be managed as long as they are identified in plenty of time.

IT project management can be precarious: Depending on which source is consulted, the failure rate for IT projects ranges anywhere from 30 to 60%. So, the odds are good that an IT project will fail to achieve its business case.

But, here’s a silver lining: There’s plenty of history documenting IT project mistakes. Administrators looking to steer a project to success should study the mistakes others have made in the past, especially within their own organizations. After all, those who don’t learn from history are doomed to repeat it.

Poor Definition Of Project Requirements

It would seem logical that an expensive IT project would have clearly defined objectives and a clear idea of the resources and outlays needed to get to the end zone. However, many organizations have suffered through poorly planned projects that ended up in the corporate scrap heap.

Unless everyone agrees on what is getting built, what the parameters are, and the rules for how things can change, the eventual outcome will be an unhappy ending, says Tony Navarrete, lead technical marketing in the IT Business Management unit of BMC Software (www.bmc.com).

“Early in the project definition stages, you need everyone to agree on the scope, high-level project budget, and key requirements,” says Navarrete. And, he adds, all of the stakeholders in the project must agree on these points so the project can move forward.

Harry Schultz, senior vice president of product development and solutions at Digital Reasoning Systems (www.digitalreasoning.com), says the true hallmark of a successful project is a customer that enthusiastically embraces the deployed system because it successfully addresses the specific needs that initiated the project in the first place.

But, says Schultz, the requirements-gathering process usually requires a significant commitment of time with the end customer to really understand their business and how the proposed system will be used. To accomplish this, Schultz recommends that a standing group of key customers (both technical end users and major decision makers) and key project members be formed to participate in the requirements definition process at the project outset.

Insufficient Communications

A complex IT project involves a substantial number of people working together toward one goal. A lack of clear communications can cause personnel to lose direction, focus, and ultimately the desire to see the project through.

One of the reasons projects often fail to align well with the desired value is because of inconsistent or inadequate interaction with the various sponsors and stakeholders, says Eric Willeke, lead architect for EMC Consulting (www.emc.com). To avoid this landmine, he adds, project leaders should focus a majority of their energy outside the project to ensure a clear understanding exists between the project team and the stakeholders. These communication channels, he says, allow potential impediments to be resolved promptly.

William Stuckert, vice president and general manager for Advanced Technology Services (www.advancedtech.com), says lack of communication about the business value of a project can have a negative impact on the people working on a project. Managers should clearly communicate the reasons a project is important to the business, Stuckert says, adding that this shows personnel the overall impact of their contributions.

Lack Of Leadership

Every undertaking must have direction from a leading individual. Without a strong hand to provide direction and leadership, a project may run aground very quickly, much like a ship steered by multiple captains who each wish to go in different directions.

Digital Reasoning Systems’ Schultz says successful projects require a project leader, not a project manager. An IT project should not be run as a democracy but rather as a benevolent dictatorship, he adds, because the project team requires clear direction and quick conflict resolution.

“A project awash in indecision is ripe for failure,” Schultz says. Strong leaders must convey a very concise message about project goals and how those goals are to be met.

Another dimension of leadership is ensuring that the person leading the project has the expertise needed to do so. But, says Jack Bergstrand, CEO of Brand Velocity (www.brandvelocity.com), the program director for a company is usually a respected business or technology executive who has never run a large IT project before. When that happens, this individual depends too much on the consulting systems integrator from the start, thus creating the expectation that the integrator will manage the project in a turnkey fashion.

Poor Risk Management

Projects are filled with unknowns at the outset, so determining what the potential obstacles are and the risks they pose to the effort is absolutely essential.

According to Alexander Magno, senior director of ADM North American Delivery at Keane (www.keane.com), a common mistake made by project managers is taking a “sit back and wait” approach to risk and attempting to solve problems once they occur. Magno says project managers should avoid falling into this passive approach.

The first step to avoiding this mistake, says Magno, is for team leaders to create a collaborative team atmosphere where openness helps identify and manage risks before they become issues. Second, he says, project leads should enable and coach teams so they consider the downstream impacts of delays, for example. Finally, mitigation strategies should be identified so the project team can work around risks and keep moving forward.

Lack Of Alignment With The Business

A project that lacks management support has a good chance to fail. After all, uncommitted top management is more likely to pull the plug on a project the first time things go awry.

For starters, lack of alignment with business objectives can cause project managers to “put blinders on” and neglect to account for market changes that cause a project’s business objectives to become invalidated, says Magno, who adds that it is important to keep business objectives for an IT project in mind throughout the process to avoid making this mistake. An executive steering committee should actively participate and review stated project objectives and business impacts throughout the course of the project, Magno says.

“Getting into the habit of recalibrating the project’s return on investment will help a manager remain ready and equipped to identify when to cut the cord,” he says.

Top Problem: Poor People Skills

People are the driving forces behind projects, so it stands to reason that project managers should be as good at managing people as they are at managing project logistics. Unfortunately, that’s usually not the case; Harry Schultz, senior vice president of product development and solutions at Digital Reasoning Systems (www.digitalreasoning.com), says many project managers have no trouble keeping up with technology but still don’t understand what motivates—and demotivates—people.

The solution? Management must treat people like human beings instead of resources and must value and respect their contributions. Also, he adds, respecting project members’ personal time, avoiding publicly berating personnel who make a mistake, and always showing appreciation will ensure people stay focused and contribute their best efforts.

  • Share/Bookmark

“Common Sense” Project Management (Part 1)

Wednesday, March 10th, 2010

It has often been said that there is nothing “common” about common sense. Nowhere have I found that truer than in the area of project management. The intent of this series of blogs is to explore some of the more common subjective reasons why some projects succeed and some fail. I believe that there are some very important hallmarks of a successful project that are often undervalued because they deal with some of the more subjective aspects of leadership.

There are many factors that differentiate a successful IT project from a mediocre one. Surprisingly, unsuccessful IT projects more often result from not following some simple “common sense” principles of leadership rather than not using the correct project management methodology or because the technology being implemented is too difficult. I don’t want to discount the benefit of all the new project management methodologies and processes available today, and their importance to a project. However, I believe that there are other intrinsic factors critical to the successful execution of a project that while being more subjective, are every bit as important as some of the more quantified aspects of project management.

My opinion is based on 34 years of working on a variety of technical projects, both as a participating team member and as the project manager. I admit that many of the important lessons that I have learned about project management stem from having done it wrong and learning from the experience. (i.e. good judgment comes from experience, and experience comes from bad judgment). I’ve worked on some extremely successful projects as well as participated in a few “death marches”. Through these experiences, I discovered some traits that were often present in the successful projects and absent from the unsuccessful ones. It is these successful traits that I want to explore further.

While there are many facets to these “success traits”, they all have at their core a basic understanding of human nature. Over the years, I have encountered some extremely smart people that while capable of keeping up with the ever increasing tempo of technological change, are clueless about what motivates and demotivates people. They appear to be unaware of the negative consequences of their leadership style and their impact on the project, and then wonder why their project is performing so poorly. It is like having a new car with a powerful engine and insisting on driving with the parking brake on, and then complaining that the car doesn’t perform as promised. I have actually had conversations with people that when I pointed out the “parking brake” in their situation, they were surprised that it would have an impact on their project.

I have never met a technology professional whose goal it was to do a bad job. Everyone wants to be successful and feel good about what they do. While sometimes people are miscast in their role on a project, too many times it is a culmination of these subjective factors that lead to poor project performance, not lack of talent on the individual’s part.

The following are four areas that have played a critical role in the successful projects that I have been part of over the years, and that I will be exploring in subsequent postings.

1. Enlightened “people” management and strong project leadership.

2. Adequate communications with BOTH the customer and the project team

3. Understanding the customer and how to determine their “real” requirements.

4. Risk analysis/avoidance (i.e. how to prepare for things going “bump” in the night”

  • Share/Bookmark

Search Wizards Speak: An Interview with Tim Estes

Tuesday, February 2nd, 2010

NOTE: The following is the full interview that Stephen E. Arnold conducted with our CEO, Tim Estes. You can find this interview and others at www.ArnoldIT.com.

In a taxi from Baltimore-Washington Airport to a speaking engagement in Washington, DC, a colleague and I were discussing my search blog. We were sharing a taxi with two other people. One of them asked, “Are you the fellow who writes about search and content processing?” I replied, “I was.”

The person asking the question introduced himself as a reader and began to tell me about his company’s technology. I took his card, did some research, and this interview is one outcome of that encounter.

Digital Reasoning, based in Franklin, Tennessee, has developed a suite of software that adds value to content.

I learned that the company develops technologies that help solve the problem of information overload. The company’s tools allow users to read, understand, and make use of vast amounts of data.

Digital Reasoning has patented its technology that, according to the firm’s Web site, “deeply, conceptually searches within unstructured data, analyzes it and presents dynamic visual results with minimal human intervention. It reads everything, forgets nothing and gets smarter as you use it.”

I followed up with the company’s chief executive officer, Tim Estes. The full-text of my interview with him appears below.

What is “digital reasoning”?

Digital Reasoning is unique in the market in its ability to bootstrap a model from the data down to the entity level and then start resolving entities and aggregating their connections to give you a much better picture of the data. We are a real summarization technology that is not limited to the a priori model or ontology that is applied. I think this is where the market is going– but time will tell.

What is your background?

I went to the University of Virginia.

That’s interesting. My son attended UVA .

Quite a coincidence.

I’m a philosopher by training. Ironically – when I graduated we had T-shirts that said: “Philosophy – I’m in it for the money.” My background was in semiotics, Philosophy of Language and Philosophy of Mathematics. A principle area of interest was in the works of Wittgenstein and Leibnitz. I have a passion to find hidden structure in things and proceed from the assumption that the world is held together by necessary and intrinsic order (thus the Leibnitz bias). In founding the company, the idea was that with sufficient introspection of mathematical and structural invariances that present themselves inside of data, a “model” would emerge from the data that could allow software to execute on imprecise goals using learned contexts.

Were there key influencers that shaped your firm’s technical approach?

I credit two primary influences with driving me to start the company. One was a brilliant article written by David Gelernter called the “The Second Coming” and the other was an interview that Bill Gates gave in Red Herring in the Spring of 2000. Bottom line – they both pointed forward to a day when all software would learn and the other software would be commoditized and simply infrastructure. Digital Reasoning was really about trying to bridge that gap – and it still is. We saw the most opportunity and challenging problems in the area of having systems understand unstructured data to be able to help bootstrap the context necessary for a new level of software automation – i.e. ambient intelligence in software that could prioritize, summarize, and make a reasonable level of proxy decisions for humans that are overloaded with information. To me – most of the buzzwords in search are just repackaging these core ideas.

Faceted navigation, for instance, is really just prioritization and summarization that draws more out of the user to substitute for a system not having sufficient context or understanding of a users intention to bring back the right results. It has the ancillary benefit of surfacing connections or facets in the data that probably were not known at the outset (the summarization function that lists mentioned entities or histograms of hits over time give you).

What was the trigger in your career that made search and retrieval and content processing focal points? Weren’t there other, easier opportunities for you to use your technical training and expertise?

Well – Digital Reasoning pretty much is my career. I started it in my 3rd year in school and have been doing it ever since. I can’t think of anything else that would make sense in the Industry – I’d probably be teaching if I weren’t running DRSI.

I suppose after 9/11, I could have taken a route to get into the Government and Intelligence space as a Blue Badger. But given my age – just turned 30 last year – I doubt at the time I could have had the impact I wanted. Now – after 8-9 years of working hard problems in this space, I think we are really starting to make a difference.

What type of performance can a licensee expect with your system?

Digital Reasoning’s core product offering is called “Synthesys.” It is designed to take an enterprise from disparate data silos (both structured and unstructured), ingest and understand the data at an entity level (down to the “who, what, and wheres” that are mentioned inside of documents), make it searchable, linkable, and provide back key statistics (BI type functionality). It can work in an online/real-time type fashion given its performance capabilities.

Synthesys is unique because it does a really good job at entity resolution directly from unstructured data. Having the name “Umar Farouk Abdul Mutallab” misspelled somewhere in the data is not a big deal for us – because we create concepts based on the patterns of usage in the data and that’s pretty hard to hide. It is necessarily true that a word grounds its meaning to the things in the data that are of the same pattern of usage. If it wasn’t the case no receiving agent could understand it. We’ve figured out how to reverse engineer that mental process of “grounding” a word. So you can have Abdulmutallab ten different ways and it doesn’t matter. If the evidence links in any statistically significant way – we pull it together.

Synthesys trials can be had at around $50k or so (depending on specifics). Enterprise deals are substantially higher – but that is true of just about everyone in our space. We offer all of the typical high-level features you’d find in players in Unstructured Data Analytics – entity extraction, geotagging, faceted navigation, query suggestion, etc. But few, if any of them, can really resolve entities accurately without a lot of “humans in the loop.”

The system can index ~10 million files on large single systems. We are in testing on a large distributed model for Synthesys with a government customer right now where we will crack 150M files on less than a dozen servers. The new model is proven to be horizontally scalable and implements the first “eventually consistent” model for a player in our space that we are aware of. It is our hope to prove web scale (i.e. billions of documents) before too long.

Most of our throughput is tied into memory/caching. For instance, with four cores and 12 GB of memory and standard SATA drives, you would probably see ingestion in the hundreds of KB per second up until the single millions of documents and then degradation as caches start to get lower and miss more often.

The number of new companies entering the search and content processing “space” is increasing. What’s your view on too many hungry mouths and too few chocolate chip cookies?

I think that it is a lot of noise in the system. One of the areas that is particularly disappointing right now is the lack of innovation in the eDiscovery area. Most of that market is using technology that got lifecycled out of the Intel/Defense space 5-10 years ago. In enterprise search, I suppose the many mouths will lead to natural Darwinian results.

My only hope is that the new companies offer some real innovation and don’t rehash the same old marketing (“Bring Order to Information Chaos.” Etc.) with the same failed approaches (extract, load a DB, search it with more metadata, etc.). I think the sophisticated IT buyer/CIO is pretty tired of being promised more than can be delivered in this space.

Like the old commercial – we are hopefully going to be getting into a “where’s the beef?” type attitude soon.

Finally – I think that while the academic conferences and contests have been interesting – I think there needs to be a better way to prove that these technologies generalize to a real customer’s data. Everything looks pretty once the data gets well formed and cleaned up. Boy don’t those Palantir demos look really cool – but what happens when you really hit the junk we call data in real businesses or Federal enterprises? We need to focus on the real data – not the slickest demos. The people in the Intel community especially understand the “bait and switch” of demoing on clean, structured data and then having to face the reality of their data on the inside where these demos never seem to work against the large amount of noise.

When the market leaders get honest about the challenges of noisy data and start delivering predictable quality over that real data that’s when we (speaking as a member of the unstructured data analytics market) will get our credibility back.

What are the functions that you want to deliver to your customers?

Well, I think we want all data to be available to users from a content/entity level versus a document level. Documents are containers of facts and ideas. We don’t have time to read 1/10th of what we want to or need to. We need summarization and prioritization feeding visualization. We’d like to see that as common practice.

In the Intel business – why do we read stuff before we start creating charts and graphs of key connections? Because the software is too stupid to do it for us right now in an automated fashion. That needs to change. Our analysts our overworked, our managers have to consume too much at too high a level, and we are drowning in email and Facebook/Twitter feeds. Something has to sit between us and the firehose of content and status updates that are overwhelming us. It’s not just new tools to navigate it and read. It’s really something quite different – show me what it means in a snapshot and let me dig in to whatever looks important and novel. And, do it as fast as Google but from a concept or entity-centric point of view.

That’s what we deliver in our Defense/Intel efforts and it’s what we look to deliver to other contexts and markets as we expand into those this coming year.

Are you able to give me some insight into new features you will be offering your licensees in the next release?

I don’t want to go into too much detail. But on the backend side, we have two major efforts going on that we believe will disrupt the market. First, we have a real answer to entity resolution that works at scale. Right now we are integrating it with the ability to apply it to both structured and unstructured data. That’s going to be a real killer. It conceptually integrates the actual entities in enterprise data and does so with minimal a priori modeling and customization (especially compared to the other approaches on the market today).

Next, we are implementing a backend that is very similar to what Amazon has as software infrastructure. It is going to allow horizontal scalability of the underlying storage and processing and allow for multiple datacenters and clouds to synchronize this understanding. This means that Digital Reasoning is positioned to have a real offering for understanding data in the hybrid cloud space.

There’s a push to create mash ups–that is, search results that deliver answers or reports. What’s your view of this trend?

I think it’s pretty useful so long as the quality of analytics is good. It’s always tricky when you automate a process that has a 0.8 F-measure (F1) at best on noisy data. You end up getting some very humorous mistakes. But that’s the price of the early stage of disruptive technologies. If we can create supplemental processes (like ensembles that are tuned toward recall paired with others toward precision) we can emulate what’s worked well in the medical community in terms of the testing process. I want to credit Ted Senator (used to be at DARPA now at SAIC) with the above analogy. He used it in a paper a few years ago and I think is still one of the better analogies I’ve heard in this space.

What sets your technology apart from some other vendors’ systems?

Our solution is generally complementary to the Oracle/MySQL/MSSQL solutions we find in the government and enterprise. It can be stood up on its own – this is the default – but we don’t have issues integrating into the broader enterprise with those other systems.

I think I’ve covered the differentiation point already – but really the ability to find entities, resolve them, and then retain their connections to other entities and all related data is a pretty big differentiation. We also believe that scale and speed are differentiators for us. While others may index for search faster, few if any can match our depth of understanding of the data at scale or with the speed we have.

Our approach is fundamentally different from 90% or more of the market, because we have a real bias against trying to leverage a priori models against the data (i.e. exhaustive extraction or ontology type models). Digital Reasoning tells you what you didn’t already know and also sorts out data easily so you can find what you expect to find if it’s there – we deal with both the knowns and the unknowns elegantly. That’s how we are different. We’re particularly good enabling the discovery of the non-obvious and the unknown from noisy unstructured data.

Semantic systems have been getting quite a bit of coverage, yet the Powerset technology and other semantic players like Hakia.com have been slow out of the gate. What’s your view on semantics and natural language processing? Are these technologies ready for prime time?

It’s getting there. I have a fundamental disagreement with the Extract, Transform, Load (ETL) for text type approach, however. It tends to work well in fixed/stable domains and poorly in domains with evolving semantics and noisy data. I think that is exactly what we see right now in terms of the limitations. I think this approach will ultimately succumb to approaches that can bootstrap form the data (this is a variation of the Peter Norvig camp on the problem). We are still waiting for the iPod of learning algorithms that works at scale to really show how futile all of this a priori modeling investment really is.

I also think that most of these guys probably were optimistic about their ability to scale their analytics to web scale and got caught off guard with how hard it is to go from tens of millions or hundreds of millions of pages and work at tens of billions of pages. It’s just a hard problem. Google succeeds because 6-7/10 hits on the first page helps them keep their business model rolling. Trying to get 9/10 on much more semantically narrow domains is at least an order of magnitude harder problem if not two.

A number of vendors have shown me very fancy interfaces. The interfaces take center stage and the information within the interface gets pushed to the background. Are we entering an era of eye candy instead of results that are relevant to the user?

We are always taken in by the demo. It’s pretty typical. People and enterprises want an information savior – and the demo is like a “miracle proof” even if it is really more Wizard of Oz than anything else. I think that the real work in this space is not being done by the demo artists. It’s being done by those that can make sense of the data while asking less and less of the user.

I think that “Intelligence Augmentation” – something that Palantir was blogging about recently – is very much a cop out. It basically states we still want the human to have to do all of this work but we are going to make it a lot less onerous on them. This doesn’t solve the problem at all. Sure – most of the time investment in applying machine learning algorithms is data normalization – but that’s the point. If we had algorithms that were smart enough to create a model from mathematical order in the data that meant something to a human, we wouldn’t have to ETL it into a specific schema. Data normalization is a machine learning problem. I think that is where they miss the boat. The Intelligence Augmentation approach (left alone) creates false assurance that the user is making progress when, actually, key items are being missed due to the fact the software has no real, evolving understanding of the data. We need computers to see the whole picture of what’s going on in millions or billions of messages because there is no way a human can. No visualization can role up that many nodes to make it tractable for a human to understand. Any visualization without the capacity to understand the underlying data in sophisticated ways is just doing a disservice to the mission.

Like all complex problems, we need substantial automation to grow productivity. To us understanding data is as a lot like automated landing systems in aircraft. At some point in the not-too-distant past it simply became too much for human beings to manage all of the complex subsystems in a commercial jet aircraft. Now pilots only manage those items in emergencies and focus on the major judgment-oriented tasks in flight (direction, altitude, etc.).

We need automated awareness systems across most information-centric activities. That’s the real meat. Visualization is a means to present this underlying capacity for maximum utility. It is not the utility itself.

What text processing functions do you offer?

Currently we offer indexing, entity extraction, geotagging, search, faceted search, relationship extraction (basic), and dynamic graph generation from those relationships. Our entity extraction and language processing is being rebuilt into a next generation capability right now. We plan on offering anaphora resolution, in document co-reference, and deeper extraction in future releases. We are currently English only but also plan to pick up other languages. We hope to do that this year (its not a technology issue for us), but that depends on competing customer demands. Right now, there is a lot of business supporting English since that is what nearly all of the analysts are using.

Also, our new horizontally scalable backend will be in the next release along with new entity resolution capabilities against structured and unstructured data. Other bells and whistles too – but those are the majors.

What is it that you think people are looking for from content technology?

People are looking for semantic technology to help them read less and understand more. Sounds simple right? They don’t readily trust the summarization part – so that’s an area that needs a big step up.

A major source of discontent is the upfront cost of building models (the ETL bias) to turn unstructured data into structured data. This is probably the biggest holdback in the enterprise (especially in a tight budgetary environment). They are tired of software that has an even bigger up front deployment and maintenance cost. Given how we solve the problem, we expect to have a compelling story here.

I think the other big piece that is holding back semantic technology is the obsession with search and reactive applications. Enterprises need to start looking at how to use semantic technology more proactively and vendors need to be delivering better solutions here.

What are the hot trends in search for the next 12 to 24 months?

I think faceted navigation is going to become standard- even passé. The trick will be how well this can happen from noisy data. That’s where it will be interesting to compare what Endeca has (which is heavy on up front modeling of your data) to what Nova Spivack is working on over at Radar Networks (probably a much more elegant approach).

I think the wave that is coming, however, is how do we get into proactive applications in semantics and search – i.e. ambient awareness yielding autonomous action by systems where the principle data streams are unstructured. That’s the next big wave. We are working that both in our direct business in Defense/Intel and in new markets. We expect to pursue partnerships with existing enterprise players during the coming year. Beyond that – well we’ll see.

Where can people get more information?

Our Web site has some current information. Blogging has been a little slow recently since we’ve really been maxed out with new items taking up time from the likely internal contributors but we hope to get a little more diligent on that in the coming months. We’ve got some material on request – we’ve actually got a ton of material but we like to understand the need first so we can maximize both our potential customers time as well as ours.

ArnoldIT Comment

Digital Reasoning has captured the attention of a number of US government agencies. The firm’s profile in the commercial sector is on the upswing. The firm’s approach provides those with a need to know what’s relevant to a particular concept or topic in a large flow of content will find that Digital Reasoning’s approach offers an alternative to the older, one-size-fits-all solutions from vendors with technology dating the from mid 1990s. The company is aggressive and committed to making its licensees get full value from the company’s patented technology. More information is available from http://www.digitalreasoning.com.

Stephen E. Arnold, February 2, 2010

  • Share/Bookmark