New Relic

Logs

Full-text search across billions of log events, built for engineers under incident pressure who needed answers fast.

Lead Product Designer·2019–2020·0-to-1 product

The New Relic Logs query interface with faceted search results and log stream.

Context

Log data is the most granular level of system data: the raw communications and output between every part of a system, from microservices to load balancers to hardware. When an application throws an error, log data is where engineers find out exactly what happened and why.

I traveled with a research team to conduct customer onsite visits and observational research of network operations centers, visiting some of New Relic's largest customers from Vancouver, BC to Burlington, VT. We observed the New Relic platform being used in conjunction with separate logging products, and watched engineers manually, painstakingly reproducing errors during downtime and then combing through their log data to identify where things were failing. New Relic was great at pinpointing a problem down to the transaction, but didn't have the system-level data required to understand what specific processes were breaking.

The Problem

There was a clear push from customers to reduce tooling and context switching, and growing market pressure to create a consolidated observability platform. Strategically, it made sense for New Relic to invest in a Logs product. We could eliminate the need for customers to use multiple tools, prevent context switching between separate interfaces, provide a single pane of glass for observability, and ultimately help customers save millions of dollars they might otherwise lose if outages aren't resolved quickly.

The Approach

We knew we were going to build a Logs product, but we also had to navigate internal pressures. New Relic's CEO and founder had invented a custom query language called NRQL, an offshoot of SQL that powered the platform's query-based features. In our research, we found that New Relic was typically only used by a handful of internal champions at each company who had bothered to learn the query language for more complex workflows. Everyone else relied on the pre-built views or asked a power user.

SELECT count(*) FROM Log
WHERE level = 'ERROR'
  AND appName IN ('payment-service', 'api-gateway')
  AND message LIKE '%timeout%'
FACET hostname, appName
SINCE 1 hour ago
TIMESERIES 5 minutes
LIMIT 50

It was clear that if we wanted a successful launch, we needed to reduce the barrier to entry and make the product easy to use from the first session. This was especially important considering that log interfaces are fundamentally a search-based solution. Engineers are trying to find the needle in the haystack, often under the pressure of an active incident. The search experience had to be fast, intuitive, and powerful without requiring anyone to learn a query language.

We ran extensive usability testing on the search experience to validate the design. The testing proved that the search-first approach was strongly preferred to NRQL by users, which was critical for getting internal buy-in given the founder's personal investment in the query language.

The Solution

The core of New Relic Logs was its search experience. Rather than forcing users to write queries, we built a search model that could be used at any skill level: type a plain term to search across every field, use operators for precision, or build structured filters with key:value pairs that could be combined and removed without clearing the whole query.

Term Search

The simplest interaction: type a word like “error” and every log entry containing that term across any displayed field (application name, hostname, log level, message) is surfaced instantly. No field selection required, no syntax to remember.

Operators and Structured Filters

For more precise queries, users could search using a complete set of operators: is, is not, contains, does not contain. Typing a field name like “hostname” triggered a guided flow where the user picked an operator and then selected from a list of actual values. Multiple values could be selected at once. Once confirmed, the filter became a pill in the search bar that could be individually removed without disrupting the rest of the query.

Intelligent Recommendations

The search also provided intelligent recommendations. Typing “web” didn't just search for the term; it suggested key:value matches like appname: webapp or hostname: web1, letting users jump directly to structured filters without having to know the exact field names or syntax.

Interactive Demo

Try searching for a term like “error”, a field like “hostname”, or a partial value like “web” to see intelligent suggestions.

35 of 35 log entries

TimeAppHostLevelMessage

14:23:41.003webappweb1ERRORUncaught TypeError: Cannot read property 'id' of undefined at /app/controllers/checkout.js:142

14:23:41.017api-gatewayapi1WARNRequest timeout after 30000ms for POST /api/v2/orders

14:23:40.998payment-serviceapi2ERRORFailed to connect to payment processor: ECONNREFUSED 10.0.3.42:443

14:23:40.891webappweb2INFOGET /dashboard 200 OK 142ms

14:23:40.776auth-serviceapi1INFOToken refresh successful for user_id=8831, session extended 3600s

14:23:40.654workerworker1DEBUGProcessing job queue: 47 pending, 3 active, 0 failed

14:23:40.512webappweb1WARNSlow query detected: SELECT * FROM orders WHERE created_at > ? took 4200ms

14:23:40.403api-gatewayapi2INFOHealth check passed: all upstream services responding

14:23:40.287payment-serviceapi1ERRORTransaction declined: insufficient_funds for order_id=ORD-29841, amount=$142.50

14:23:40.176webappweb2INFOPOST /api/v2/cart/items 201 Created 89ms

14:23:40.054auth-serviceapi2WARNRate limit approaching for IP 192.168.1.45: 89/100 requests in window

14:23:39.932workerworker2INFOEmail delivery completed: batch_id=EM-4421, sent=1200, failed=3

14:23:39.811webappweb1ERRORWebSocket connection dropped for session abc123, attempting reconnect

14:23:39.698api-gatewayapi1INFORoute registered: PUT /api/v2/users/:id/preferences

14:23:39.587payment-serviceapi2INFOWebhook received from Stripe: event=charge.succeeded, amount=$89.99

14:23:39.476webappweb2DEBUGCache miss for key user:8831:preferences, fetching from database

14:23:39.354auth-serviceapi1ERRORInvalid JWT signature detected, possible token tampering for client_id=app-mobile-ios

14:23:39.243workerworker1WARNMemory usage at 78% on worker1, consider scaling horizontally

14:23:39.132api-gatewayapi2WARNUpstream service payment-service responding slowly: avg 2400ms over last 5 requests

14:23:39.021webappweb1INFOUser login successful: user_id=7742, method=oauth2_google

14:23:38.910payment-serviceapi1INFORefund processed: order_id=ORD-28103, amount=$34.99, reason=customer_request

14:23:38.798workerworker2ERRORFailed to process job JOB-8812: Redis connection lost, retrying in 5s

14:23:38.687webappweb2INFOGET /api/v2/products?category=electronics 200 OK 67ms

14:23:38.576auth-serviceapi2INFONew API key generated for org_id=ORG-441, scope=read:analytics

14:23:38.454api-gatewayapi1ERRORCircuit breaker OPEN for payment-service: 5 consecutive failures in 30s

14:23:38.343webappweb1WARNDeprecated API endpoint called: GET /api/v1/users, migrate to v2

14:23:38.232workerworker1INFOCron job completed: daily_report_generation, duration=12.4s, rows_processed=45000

14:23:38.121payment-serviceapi2WARNRetry attempt 2/3 for transaction TXN-9921: gateway timeout

14:23:38.009webappweb2INFOStatic assets cache refreshed: 142 files, total 8.3MB

14:23:37.898auth-serviceapi1DEBUGCORS preflight check passed for origin https://app.example.com

14:23:37.787api-gatewayapi2INFOTLS certificate renewed: expires 2020-06-15, SHA256=a1b2c3...

14:23:37.676workerworker2INFOImage resize queue drained: 0 pending, processed 89 in last 60s

14:23:37.554webappweb1ERRORUnhandled promise rejection in /app/services/notification.js: NetworkError

14:23:37.443payment-serviceapi1INFODaily settlement batch initiated: 342 transactions, total $28,491.33

14:23:37.332auth-serviceapi2WARNFailed login attempt 3/5 for user_id=6612, IP=203.0.113.42

Show Surrounding Logs

One of the most impactful UX improvements we shipped was the ability to right-click any log line and select “Show surrounding logs.” When a customer reported an error, developers would filter down to the specific error log but lose the surrounding context of what happened before and after. This feature pulled in every log that occurred within one minute of the selected entry, instantly reconstructing the timeline around the incident without clearing the filter or running a new query.

Interactive Demo

Right-click on any log line and select “Show surrounding logs” to reveal what happened in the minute around that event.

level is ERROR6 of 25 log entries

TimeAppHostLevelMessage

The Outcome

New Relic Logs helped retain a significant number of existing customers and close prospective customers who were looking for a complete observability platform, generating eight-figure revenue within the first year of release.

Beyond the standalone product, we also brought log data into context across the platform. Users could access log data for a specific host in the Infrastructure product or for a particular application error in APM without navigating away from their current page. This reduced context switching and time to resolution even further, reinforcing the value of having logs as part of a unified observability platform rather than a separate tool.

Different iterations of the Logs UI. The interface evolved through multiple rounds of design and usability testing as we refined the search experience and log line interactions.