Interfacing an audio codec with ESP32 – Part 1

The ESP32 is an extremely powerful microcontroller with an integrated 2.4GHz radio and a ton of other peripherals. The other digital communication interface modules on the chip include all popular interfaces such as I2C, SPI, UART, and I2S.
The I2S interface is pretty much the industry standard for digital audio codec interfacing and other similar high speed, continuous data transfer applications.

A little on the I2S bus

Not familiar with the I2S standard? Here is a summary for you.
The I2S bus is a simple serial bus that is used to transfer digital data. The special characteristic of this data bus, in comparison to the other buses such as SPI and I2C is that the communication is not really meant for control applications. The I2S bus is strictly for data streaming, such as digital audio.
Therefore, it is usual for devices to have a different additional control bus such as SPI or I2C to carry commands and status.
An important fact to note is that the I2S bus, being such a simple serial interface only, varies widely across manufacturers with respect to timing and frame specifications. It is up to you as the application developer to take care that you meet the format specifications outlined for your vendor device.

In a simple I2S network with 2 devices, one is always the host and the other is always the slave. The host is the device that drives the clock line, as illustrated in the figure below. At least three lines must connect the two devices - clock (SCK), word select (WS), and data in (DIN). A data out (DOUT) may be present for bidirectional data transfer. There may be multiple other signals, depending on the codec vendor implementation.

Read more about the I2S bus in the Philips I2S bus specification I2S bus specifications (Philips).

ESP32 I2S bus standard and timing
I2S bus basic configuration and timing

The SGTL5000 Audio Codec, NXP

The SGTL5000 is a low-power stereo codec with built-in headphone driver and designed with focus on economical, portable consumer electronics. The SGTL5000 suits applications that require a line-in, mic-in, line-out, headphone-out and digital I/O for the application. What we like most about the SGTL5000 is the capless headphone driver, which eliminates the need to design a low noise, high quality amplifier to drive the headphone output. Also, being capless with almost no external components means that you can integrate the SGTL5000 in your application within 0.25 sq. cm of board space! The codec system block diagram is shown below for your reference (Source: NXP).

The SGTL5000 also integrates a PLL to derive the accurate clock rate desired in audio applications. For example, if your MCU cannot provide a clock rate which is an accurate multiple of the sampling frequency, the internal PLL can do this! This prevents audio sync issues and time drifts in long playback sessions.

Interfacing the SGTL5000 with ESP32

The ESP32 SoC features two I2S modules, both capable of parallel 16-bit wide data transfer in addition to all the generic I2S transfer modes that are needed for this application.
The block diagram outlines the structure of ESP32 I2S module (Source: ESP32 Technical Reference Manual, as of June 2017).

ESP32 I2S block diagram
ESP32 I2S block diagram

The required pins, i.e. BCK, WS, DOUT, and DIN can all be routed to any GPIO via the GPIO matrix. However, the PLL clock output cannot be routed via GPIO matrix and has to be output at a specific pin using the IO MUX directly.

The next part of this article demonstrates with schematic and code how to connect the ESP32 to SGTL5000 using our AudioBit audio codec board.

Don't forget to SUBSCRIBE to the blog (scroll down to the footer) for staying updated on new articles!

2 thoughts on “Interfacing an audio codec with ESP32 – Part 1”

  1. Pingback: ESP32 | Andreas' Blog

Leave a Reply