Innings2
Powered by Innings 2

Glossary

Select one of the keywords on the left…

Introduction to AI > Transformers Library by Hugging Face: A Comprehensive Tutorial

Transformers Library by Hugging Face: A Comprehensive Tutorial

Hugging Face's transformers library is the go-to library for working with state-of-the-art AI models. It offers an easy-to-use API for applying pre-trained transformer-based models for various tasks.

Let's test your understanding:

Which of these is NOT a primary domain for transformer models?

Generate text responses for a chatbot
Analyze sentiment in customer reviews
Classify images of different products
Detect objects in surveillance footage
Convert speech to text in multiple languages
NLP Tasks
Vision Tasks
Audio Tasks

Installation and Setup

Before proceeding, let's verify the installation commands:

!pip install transformers torch datasets
# Check if transformers is installed correctly
import transformers
import torch
import datasets
 
print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Datasets version: {datasets.__version__}")

Which installation command should you use?

  • transformers: Main library for AI models.
  • torch: Backend framework for deep learning (can also use TensorFlow).
  • datasets: Hugging Face's library for loading datasets easily.

Pretrained Models and Model Hub

Hugging Face provides a Model Hub with thousands of pre-trained models. Models are organized into domains:

  • Text (NLP): GPT, BERT, RoBERTa, DistilBERT, T5, etc.
  • Vision: ViT (Vision Transformer), DeiT, BEiT.
  • Audio: Wav2Vec2, Whisper, etc.

Finding a Model

You can search for models here: https://huggingface.co/models

For example:

  • For text tasks: bert-base-uncased, t5-small.
  • For vision tasks: google/vit-base-patch16-224.
  • For audio tasks: facebook/wav2vec2-base.

Let's match these models with their primary use cases:

BERT - Bidirectional text understanding
ViT - Image classification tasks
GPT - Image generation
Wav2Vec2 - Speech recognition
T5 - Video processing

Working with Transformers - Text

Let's try some hands-on coding with sentiment analysis:

from transformers import pipeline
 
# Create sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
 
# Test with different texts
texts = [
    "Hugging Face Transformers library is amazing!",
    "This code is really hard to understand",
    "I'm not sure how I feel about this"
]
 
for text in texts:
    result = sentiment_pipeline(text)
    print(f"Text: {text}")
    print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")

What sentiment would you expect for "The transformers library makes AI accessible"?

Working with Transformers - Vision

The Hugging Face library also supports vision transformers for image-related tasks.

Code Example: Image Classification

To classify an image, use Vision Transformer (ViT):

from transformers import pipeline
from PIL import Image
# Load an image classification pipeline
image_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
# Load an image
image = Image.open("dog.jpg")  # Use any local image path
# Perform classification
results = image_classifier(image)
for result in results:
  print(result)

Working with Transformers - Audio

For audio tasks, Hugging Face offers models like Wav2Vec2 for speech recognition.

Code Example: Speech-to-Text

from transformers import pipeline
 
# Load a speech recognition pipeline
speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
 
# Load an audio file (must be a .wav file)
audio_file = "audio_sample.wav"
 
# Transcribe the audio
result = speech_recognizer(audio_file)
print("Transcription:", result['text'])

Match these audio tasks with their appropriate models:

Convert English speech to text
Transcribe multi-language audio
Identify speakers in a conversation
Detect emotion in speech
Speech Recognition
Audio Analysis

Fine-Tuning for Custom Tasks

from transformers import AutoTokenizer, AutoModelForSequenceClassification
 
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
 
# Example text
text = "This is a sample text for fine-tuning demonstration"
 
# Tokenize
inputs = tokenizer(text, return_tensors="pt")
print("Tokenized input:", inputs)

What's the first step in fine-tuning a model?

Applying Transformers to Real-World Problems

Conclusion

Test your overall understanding:

Transformers can handle multiple AI domains including text, vision, and audio
Pre-trained models can be fine-tuned for specific tasks
You need to train all models from scratch
The pipeline API provides the easiest way to start