first working

This commit is contained in:
2025-08-03 20:20:55 -04:00
commit cde56494ec
52 changed files with 1893 additions and 0 deletions

146
pyshark_poc/README.md Normal file
View File

@@ -0,0 +1,146 @@
# PyShark Proof of Concept for Airstream
This is an alternative implementation of Airstream using PyShark instead of Scapy. PyShark leverages Wireshark's powerful dissector engine for comprehensive protocol support.
## Key Advantages
### 1. **Full Wireshark Protocol Support**
- Automatically uses ALL installed Wireshark dissectors
- Supports 2000+ protocols out of the box
- Better decoding for complex protocols (PTP, IENA, Chapter 10)
### 2. **Custom Dissector Support**
- Any Lua dissector installed in Wireshark works automatically
- See `lua_dissectors/` for examples
- No code changes needed to support new protocols
### 3. **Advanced Filtering**
- Full Wireshark display filter syntax
- BPF capture filters for performance
- Protocol-specific field access
## Installation
```bash
# Install PyShark (requires tshark/Wireshark)
pip install pyshark
# On macOS
brew install wireshark
# On Ubuntu/Debian
sudo apt-get install tshark
# On RHEL/CentOS
sudo yum install wireshark
```
## Usage
```bash
# Basic PCAP analysis
./airstream_pyshark.py -p capture.pcap
# Live capture with filter
./airstream_pyshark.py -i eth0 -c 1000 --filter "tcp.port==443"
# Use BPF filter for efficient capture
./airstream_pyshark.py -i eth0 --bpf "port 80 or port 443"
# Export results to CSV
./airstream_pyshark.py -p capture.pcap -o results.csv
# Use PTP-specific statistics
./airstream_pyshark.py -p ptp_traffic.pcap -s ptp
```
## Architecture
```
airstream_pyshark.py # Main entry point (CLI)
pyshark_poc/
├── __init__.py # Package initialization
├── analyzer.py # PySharkAnalyzer class
├── models.py # Data models (FlowKey)
├── stats.py # Statistics classes
└── README.md # This file
lua_dissectors/ # Custom Wireshark dissectors
├── example_custom_protocol.lua
└── README.md
```
## Performance Comparison
| Aspect | Scapy | PyShark |
|--------|-------|---------|
| Packet Parsing Speed | Faster | Slower (XML overhead) |
| Protocol Support | Limited | Comprehensive |
| Custom Dissectors | Python only | Lua + C |
| Memory Usage | Lower | Higher |
| Dependencies | Python only | Requires tshark |
## When to Use PyShark
**Use PyShark when:**
- You need comprehensive protocol decoding
- Working with proprietary protocols
- Need Wireshark's advanced filtering
- Protocol accuracy is critical
**Use Scapy when:**
- Performance is critical
- Need packet crafting/modification
- Minimal dependencies required
- Simple protocol analysis
## Custom Protocol Support
To add custom protocol support:
1. Create a Lua dissector (see `lua_dissectors/example_custom_protocol.lua`)
2. Install in Wireshark plugins directory
3. PyShark automatically uses it
Example accessing custom fields:
```python
# After installing custom dissector
capture = pyshark.FileCapture('custom_protocol.pcap')
for packet in capture:
if hasattr(packet, 'custom'):
print(f"Message type: {packet.custom.msg_type}")
print(f"Sequence: {packet.custom.sequence}")
```
## Limitations
1. **Performance**: Slower than Scapy due to XML parsing overhead
2. **Dependencies**: Requires Wireshark/tshark installation
3. **Read-only**: Cannot modify or craft packets
4. **Platform-specific**: tshark paths may vary
## Future Enhancements
- [ ] Parallel packet processing
- [ ] Caching for improved performance
- [ ] Integration with existing frametypes
- [ ] Protocol-specific analyzers
- [ ] Real-time streaming analysis
- [ ] Custom field extractors
## Testing
```bash
# Test with sample PCAP
./airstream_pyshark.py -p "1 PTPGM.pcapng"
# List available interfaces
./airstream_pyshark.py -l
# Verbose mode for debugging
./airstream_pyshark.py -p capture.pcap -v
```
## Conclusion
This PyShark implementation provides a powerful alternative when comprehensive protocol support is needed. While it trades performance for functionality, it enables analysis of complex protocols that would be difficult to implement in pure Python.

20
pyshark_poc/__init__.py Normal file
View File

@@ -0,0 +1,20 @@
"""
PyShark-based proof of concept for Airstream packet analyzer.
This module provides an alternative implementation using PyShark
to leverage Wireshark's dissector capabilities.
"""
from .analyzer import PySharkAnalyzer
from .models import FlowKey
from .stats import MultiStats, BaseStats, OverviewStats, PTPStats, STATS_TYPES
__all__ = [
'PySharkAnalyzer',
'FlowKey',
'MultiStats',
'BaseStats',
'OverviewStats',
'PTPStats',
'STATS_TYPES'
]

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

187
pyshark_poc/analyzer.py Normal file
View File

@@ -0,0 +1,187 @@
import pyshark
from collections import defaultdict
from typing import Optional, List, Type, Union
import pandas as pd
from tabulate import tabulate
from .models import FlowKey
from .stats import MultiStats, BaseStats, STATS_TYPES
class PySharkAnalyzer:
"""Packet flow analyzer using PyShark for Wireshark dissector support."""
def __init__(self, stats_classes: Optional[List[Type[BaseStats]]] = None):
if stats_classes is None:
stats_classes = [STATS_TYPES['overview']]
self.stats_classes = stats_classes
self.flows = defaultdict(lambda: MultiStats(stats_classes))
self.packet_count = 0
def _get_flow_key(self, packet) -> Optional[FlowKey]:
"""Extract flow key from PyShark packet."""
try:
# Check for IP layer
if not hasattr(packet, 'ip'):
return None
src_ip = packet.ip.src
dst_ip = packet.ip.dst
protocol = packet.transport_layer if hasattr(packet, 'transport_layer') else 'IP'
# Get ports based on protocol
src_port = 0
dst_port = 0
if hasattr(packet, 'tcp'):
src_port = int(packet.tcp.srcport)
dst_port = int(packet.tcp.dstport)
protocol = 'TCP'
elif hasattr(packet, 'udp'):
src_port = int(packet.udp.srcport)
dst_port = int(packet.udp.dstport)
protocol = 'UDP'
# Check for extended protocol types
extended_type = None
if hasattr(packet, 'ptp'):
extended_type = 'PTP'
# Add more protocol detection here as needed
return FlowKey(src_ip, src_port, dst_ip, dst_port, protocol, extended_type)
except AttributeError:
return None
def _process_packet(self, packet):
"""Process a single packet."""
key = self._get_flow_key(packet)
if key:
# Get timestamp and size
timestamp = float(packet.sniff_timestamp) if hasattr(packet, 'sniff_timestamp') else 0
size = int(packet.length) if hasattr(packet, 'length') else 0
self.flows[key].add(timestamp, size, packet)
self.packet_count += 1
def analyze_pcap(self, file: str, display_filter: Optional[str] = None):
"""Analyze packets from a PCAP file."""
print(f"Analyzing: {file}")
if display_filter:
print(f"Filter: {display_filter}")
try:
# Use FileCapture for PCAP files
capture = pyshark.FileCapture(
file,
display_filter=display_filter,
use_json=True, # Use JSON output for better performance
include_raw=False # Don't include raw packet data
)
# Process packets
for packet in capture:
self._process_packet(packet)
# Show progress every 1000 packets
if self.packet_count % 1000 == 0:
print(f" Processed {self.packet_count} packets...")
capture.close()
print(f"Found {len(self.flows)} flows from {self.packet_count} packets")
except Exception as e:
print(f"Error analyzing PCAP: {e}")
def analyze_live(self, interface: str, count: int = 100,
display_filter: Optional[str] = None,
bpf_filter: Optional[str] = None):
"""Capture and analyze packets from a live interface."""
print(f"Capturing {count} packets on {interface}")
if display_filter:
print(f"Display filter: {display_filter}")
if bpf_filter:
print(f"BPF filter: {bpf_filter}")
try:
# Use LiveCapture for live capture
capture = pyshark.LiveCapture(
interface=interface,
display_filter=display_filter,
bpf_filter=bpf_filter,
use_json=True,
include_raw=False
)
# Capture packets
capture.sniff(packet_count=count)
# Process captured packets
for packet in capture:
self._process_packet(packet)
capture.close()
print(f"Found {len(self.flows)} flows from {self.packet_count} packets")
except Exception as e:
print(f"Error during live capture: {e}")
def summary(self) -> pd.DataFrame:
"""Generate summary DataFrame of all flows."""
rows = []
for key, multi_stats in self.flows.items():
row = {
'Src IP': key.src_ip,
'Src Port': key.src_port,
'Dst IP': key.dst_ip,
'Dst Port': key.dst_port,
'Proto': key.protocol
}
if key.extended_type:
row['Type'] = key.extended_type
row.update(multi_stats.get_combined_summary())
rows.append(row)
# Sort by packet count descending
df = pd.DataFrame(rows)
if not df.empty and 'Pkts' in df.columns:
df = df.sort_values('Pkts', ascending=False)
return df
def print_summary(self):
"""Print formatted summary of flows."""
df = self.summary()
if df.empty:
print("No flows detected")
return
print(f"\n{len(df)} flows:")
print(tabulate(df, headers='keys', tablefmt='plain', showindex=False))
if 'Pkts' in df.columns and 'Bytes' in df.columns:
print(f"\nTotals: {df['Pkts'].sum()} packets, {df['Bytes'].sum()} bytes")
def get_protocol_summary(self) -> pd.DataFrame:
"""Get summary grouped by protocol."""
df = self.summary()
if df.empty:
return df
# Group by protocol
protocol_summary = df.groupby('Proto').agg({
'Pkts': 'sum',
'Bytes': 'sum'
}).reset_index()
return protocol_summary
def apply_wireshark_filter(self, display_filter: str):
"""
Apply a Wireshark display filter to the analysis.
This demonstrates PyShark's ability to use Wireshark's filtering.
"""
filtered_flows = defaultdict(lambda: MultiStats(self.stats_classes))
# This would require re-processing with the filter
# Shown here as an example of the capability
print(f"Note: To apply Wireshark filters, re-analyze with display_filter parameter")
return filtered_flows

13
pyshark_poc/models.py Normal file
View File

@@ -0,0 +1,13 @@
from dataclasses import dataclass
from typing import Optional
@dataclass(frozen=True)
class FlowKey:
"""Flow identifier for network traffic analysis."""
src_ip: str
src_port: int
dst_ip: str
dst_port: int
protocol: str
extended_type: Optional[str] = None # For extended frame types like IENA, Chapter 10, etc.

145
pyshark_poc/stats.py Normal file
View File

@@ -0,0 +1,145 @@
from typing import Dict, List, Any, Type, Optional
import time
class BaseStats:
"""Base statistics class for packet flow analysis."""
def __init__(self):
self.packet_count = 0
self.byte_count = 0
self.first_timestamp = None
self.last_timestamp = None
self.packet_sizes = []
self.inter_arrival_times = []
self.last_packet_time = None
def add(self, timestamp: float, size: int, packet: Any):
"""Add a packet to statistics."""
self.packet_count += 1
self.byte_count += size
self.packet_sizes.append(size)
if self.first_timestamp is None:
self.first_timestamp = timestamp
self.last_timestamp = timestamp
if self.last_packet_time is not None:
delta = timestamp - self.last_packet_time
self.inter_arrival_times.append(delta)
self.last_packet_time = timestamp
# Call protocol-specific handler
self._process_packet(packet)
def _process_packet(self, packet: Any):
"""Override in subclasses for protocol-specific processing."""
pass
def get_summary_dict(self) -> Dict[str, Any]:
"""Get summary statistics as dictionary."""
duration = (self.last_timestamp - self.first_timestamp) if self.first_timestamp and self.last_timestamp else 0
summary = {
'Pkts': self.packet_count,
'Bytes': self.byte_count,
'Duration': round(duration, 3) if duration else 0,
'Avg Size': round(sum(self.packet_sizes) / len(self.packet_sizes), 1) if self.packet_sizes else 0,
}
if self.inter_arrival_times:
avg_delta = sum(self.inter_arrival_times) / len(self.inter_arrival_times)
summary['Avg TimeΔ'] = round(avg_delta, 6)
# Calculate standard deviation
if len(self.inter_arrival_times) > 1:
mean = avg_delta
variance = sum((x - mean) ** 2 for x in self.inter_arrival_times) / len(self.inter_arrival_times)
std_dev = variance ** 0.5
summary['Time 1σ'] = round(std_dev, 6)
else:
summary['Time 1σ'] = 0
else:
summary['Avg TimeΔ'] = 0
summary['Time 1σ'] = 0
if duration > 0:
summary['Pkt/s'] = round(self.packet_count / duration, 1)
summary['B/s'] = round(self.byte_count / duration, 1)
else:
summary['Pkt/s'] = 0
summary['B/s'] = 0
return summary
class OverviewStats(BaseStats):
"""Overview statistics for general packet analysis."""
pass
class PTPStats(BaseStats):
"""PTP-specific statistics."""
def __init__(self):
super().__init__()
self.ptp_message_types = {}
def _process_packet(self, packet: Any):
"""Process PTP-specific packet information."""
# Check if packet has PTP layer
if hasattr(packet, 'ptp'):
try:
msg_type = packet.ptp.v2_messagetype if hasattr(packet.ptp, 'v2_messagetype') else 'unknown'
self.ptp_message_types[msg_type] = self.ptp_message_types.get(msg_type, 0) + 1
except:
pass
def get_summary_dict(self) -> Dict[str, Any]:
summary = super().get_summary_dict()
# Add PTP-specific metrics
if self.ptp_message_types:
summary['PTP Types'] = str(self.ptp_message_types)
return summary
class MultiStats:
"""Container for multiple stats instances."""
def __init__(self, stats_classes: Optional[List[Type[BaseStats]]] = None):
if stats_classes is None:
stats_classes = [OverviewStats]
self.stats_instances = [cls() for cls in stats_classes]
self.stats_classes = stats_classes
def add(self, timestamp: float, size: int, packet: Any):
"""Add packet to all stats instances."""
for stats in self.stats_instances:
stats.add(timestamp, size, packet)
def get_combined_summary(self) -> Dict[str, Any]:
"""Combine summaries from all stats instances."""
combined = {}
for i, stats in enumerate(self.stats_instances):
summary = stats.get_summary_dict()
class_name = self.stats_classes[i].__name__.replace('Stats', '')
# Add prefix to avoid column name conflicts
for key, value in summary.items():
if key in ['Pkts', 'Bytes', 'Duration']: # Common columns
if i == 0: # Only include once
combined[key] = value
else:
# Only add prefix if we have multiple stats classes
if len(self.stats_classes) > 1:
combined[f"{class_name}:{key}"] = value
else:
combined[key] = value
return combined
# Stats registry for easy lookup
STATS_TYPES = {
'overview': OverviewStats,
'ptp': PTPStats,
}