Transformers Library by Hugging Face: A Comprehensive Tutorial
Hugging Face's
Let's test your understanding:
Which of these is NOT a primary domain for transformer models?
Installation and Setup
Before proceeding, let's verify the installation commands:
!pip install transformers torch datasets
# Check if transformers is installed correctly
import transformers
import torch
import datasets
print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"Datasets version: {datasets.__version__}")Which installation command should you use?
- transformers: Main library for AI models.
- torch: Backend framework for deep learning (can also use TensorFlow).
- datasets: Hugging Face's library for loading datasets easily.
Pretrained Models and Model Hub
Hugging Face provides a Model Hub with thousands of pre-trained models. Models are organized into domains:
- Text (NLP): GPT, BERT, RoBERTa, DistilBERT, T5, etc.
- Vision: ViT (Vision Transformer), DeiT, BEiT.
- Audio: Wav2Vec2, Whisper, etc.
Finding a Model
You can search for models here: https://huggingface.co/models
For example:
- For text tasks: bert-base-uncased, t5-small.
- For vision tasks: google/vit-base-patch16-224.
- For audio tasks: facebook/wav2vec2-base.
Let's match these models with their primary use cases:
Working with Transformers - Text
Let's try some hands-on coding with sentiment analysis:
from transformers import pipeline
# Create sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
# Test with different texts
texts = [
"Hugging Face Transformers library is amazing!",
"This code is really hard to understand",
"I'm not sure how I feel about this"
]
for text in texts:
result = sentiment_pipeline(text)
print(f"Text: {text}")
print(f"Sentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")What sentiment would you expect for "The transformers library makes AI accessible"?
Working with Transformers - Vision
The Hugging Face library also supports vision transformers for image-related tasks.
Code Example: Image Classification
To classify an image, use Vision Transformer (ViT):
from transformers import pipeline
from PIL import Image
# Load an image classification pipeline
image_classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
# Load an image
image = Image.open("dog.jpg") # Use any local image path
# Perform classification
results = image_classifier(image)
for result in results:
print(result)Working with Transformers - Audio
For audio tasks, Hugging Face offers models like Wav2Vec2 for speech recognition.
Code Example: Speech-to-Text
from transformers import pipeline
# Load a speech recognition pipeline
speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
# Load an audio file (must be a .wav file)
audio_file = "audio_sample.wav"
# Transcribe the audio
result = speech_recognizer(audio_file)
print("Transcription:", result['text'])Match these audio tasks with their appropriate models:
Fine-Tuning for Custom Tasks
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
# Example text
text = "This is a sample text for fine-tuning demonstration"
# Tokenize
inputs = tokenizer(text, return_tensors="pt")
print("Tokenized input:", inputs)What's the first step in fine-tuning a model?
Applying Transformers to Real-World Problems
Conclusion
Test your overall understanding: